You are on page 1of 335
SCHAUM’S oullines Second Edition POS ULAR Sau. a = Updated examples with the most current U.S. and world data Two complete self examinations = New chapter on Time Series Econometrics Perfect for pre-test review Use with these courses: A sisiics and Econometrics [7 Statistical Methods in Economics } Quantitative Methods in Economics (+! Mathematical Economies [ Micro-Eeanomics Macro-Econames Math for Economists: Math for Social Scienees BS RTS Theory and Problems of STATISTICS AND ECONOMETRICS SECOND EDITION DOMINICK SALVATORE, Ph.D. Professor and Chairperson, Department of Economics, Fordham University DERRICK REAGLE, Ph.D. Assistant Professor of Economics, Fordham University Schaum’s Outline Series McGraw-Hill hii Th Cara Hl Cogpeight ©2002 hy The MecranHill Compnins, fe. All ighisteservnd, Manufanared inthe Wied Steve Ameren cep | ered under te Ue ates Copyright Act 01 no pata! ens pasiicion may he repeadaced Cede my orm foc by any means oc ee in a danse o reival ssem, without the peice wrinen permis be publi 07-1 30566-7 ‘The mesa inthis eRe so apa inthe pein version this il: Oa 4852.2 All trademarks ae trademarks of tei respective owners Rather then pt atradeturk symbol afer evry accurece of sade marked same, use mars nan edie Cashion only and ote bene. he trae mark mer, with mo iment of ing ‘ment ode wader, Beere sch Jesignsions appear in Chis Dak, ey have en pine with ial eas ‘MCh Hill ks we avila a ia pean nse wc un pcoio daengsead ny i i np ‘ining pgm For move inti, pease come Geowpe Haare, Special Ses at geoige Nowe Pacyeaw Dillman 212) Saale), TERMS OF USE This Copyright week and The MeCraw-Hill Cormpies, Ine. (°MAcew-HiL") an isles veer al rights in and (he wok, Use of this wor i tet wo hese toma. Except as permit wer the Cpsyigh ACT of TTS a he right ste aun euieve cae copy ofthe Work you muy wt vom, issotnbe,reveie exper, reproduc, sxe cale deta ‘works based open tana dtr, disseminate, sl, 9 aublcen work or ay pat fice concent. Yu may use the week fr your ow mancoreerca nd pers se: any other use of he wok ie mictly pr le Your right no the wrk may be semintod i you fat comply wich those tema ‘THEWORK IS PROVIDED "AS IS", MeGRAWAIILL AND ITS LICENSORS MAKE NOGUARANTEES OR WARRANTIES, AS TO TH: ACCURACY, ADEQUACY 8 COMPLETENESS OF OR RFSULTS TO BF OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK 08 OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANT, EXPRESS OR IMPLIED, [MCLUDING BUT NOT LIM. ITED TO IMPLIED WARMARTIES OF MERCIAN TAMILTFY 8 FTTNESS FORA MAITICUEAR BUILPOSE- Merve tn iene donot watraace pune tha he Fanci contained athe wexk wil meet youreeguiement eth is oper: sion wil he uninterupted or era free. Nether MeGeaa-HIl mart lcensrs shal be Hable you ot anyone else for any im: ‘acy, em oc cto, regardies Oca, rhe WCE oe MH ages ren eTeOM. Mra Rae NO repo "Sil forihe conten of any infomation aveseed rough the work. Under no cumstances all McGraw-Hill nd its cen sense able ray inc, incident, peca pie. consegoeaia oe emir dares tha result fe he une €or nak ley w seat work, even ify of them hasbeen asc the posit of such damagex This ntation Vai sl apply to any claimer case whatsoever whether mach elaieer cause arises in ears, nto otherwise. DOK: 10.1m36u0TEDeS«S? This book presents a clear and concise introduction to statistics andl econometrics. A course in statistics tr casmunnctiice io wften ume uf the mest useful but abso ane of the ununtdillivalt ut Use reyined oun ses in colleges and universities. The purpose of this book is to help overcame this diliculty by using a problem-solving approuch, Each chapter begins with a statement of theory, principles, ar background information, fully lla strated with examples. Thies followed by numerous theoretical and practical problems with detailed, step-by-step solutions. While primarily intended as a supplement to all current standard textbooks of statistics andor ecomomettics, the haok can alse he uscd asan independent text. aswell as to urpplement class lectures, ‘The book is aimed at wollege students in economics. business aciministration, and the social sciences taking a one-semester or a one-year course in statistics andjor econometrics. It also provides a very iwseful source of reference for M.A. and M.B.A. students and For all those who use tor wold Hike to use) statistics and evonometries in their work, No prior statistical background is assumed, The book is completely self-contained in that it covers the statistics (Chaps. | to $) required for econometrics (Chaps. 6 0 11), It is applied in mature, and all proofs appear in the problems section rather than in the text itself. Real-world socioeconomic and business data are used, whenever possible, to demonstrate the more advanced econometne techniques and models. Several sources of online data are used, and Web addresses-are given for the student’ and researcher's further use (App. 12). Topics frequently cncoumered In econometrics, such as multicollincarity and autocorrelation, are clearly and concisely discussed as to the problems they create, the methods to test for their presence, and possible conection toclusigus. i this seam edition, we have expuanied the computer appliativis ty prusake a reneral introduction to data handling, and specific programming instruction to perform all estimations im this book by somputer (Chap. 12) using Microsoft Excel, Eviews, or SAS statist have also added sections on nonparametric testing. matrix notation, binary choice models, chapter on time sorics analysis (Chap. 11}, field of econometrice which has expanded at of late. A sample statistics and econometrics examination is also included. ‘The methodology of this hook and much of its coment has heen tested in undergrad graduate classes in statistics and econometrics at Fordham University. Students found the approach and content of the book extremely useful and made many valuable sugesstions for improvement. We have also received very useful advice from Professors Mary Beth Combs, Edward Dowling, and Damo- dar Gujarati. The following students carefully read through the entire manuseript and made many ‘useful comments: Luca Bonardi, Kevin Coughlin, Sean Hennessy, and James Santangelo. To all of them we ate deeply grateful, We owe a great intellectual det to our formar profesor of tatisies and econometrics: JS. Butler, Jack Johnston, Lawrence Klein, and Bernard. Okun ‘We are indebied to the Literary Executor of the Inie Sir Ronald A. Fisher, F. R.S., to Dr. Frank Yates, F. K.S.,and he Longman Group Ltd., London, for permussion to adapt and reprint 1apiss IL and IV from their hook, Statistical Tables for Biolagical, Agricultural and Medical Research. In addition 10 Statistics and Econometrics, the Schaum's Outline Serles in Economies includes Microeconomic Theory, Macroecanomic Theory, International Economics, Mathematics for Economists, sand Principles of Ecomrnies Dosmack SxLvarone Derrick Rescu New York, 2001 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, CHAPTER 1 CHAPTER 2 CHAPTER 3 CHAPTER 4 CHAPTER 5 Introduction LL The Nature of stausbes 12 and Econometrics 13 ‘Methadalogy of Econometries Descriptive Statistics 2A Frequency Distributions 22 Measures of Central Tenvleney 23° Measures of Dispersion 24 Shape of Frequency Distributions Probability and Probability Distributions 31 Probability of a Single Event 2 Probability of Multiple Events 33 Diserow Probability Distributions: The Binomial Dastriburion 34 The Poisson Distribution 35) Continuous Probability Distribstions; The Normal Distribution Statistical Inference: Estimation 41 Sampling 42 ie Distribution of the Mean 43° Estimation Using the Normal Distrib 44° Confidence Tntervals for the Mean Using the ¢ Distribution Statistical Inference: Testing Hypotheses SA Teating Hypotheses 52 Testing Hypotheses about the Population Mean and Proportion 3° Testing Hypotheses far Dillerencey between Two § Proportions SA ChisSquare Test of Goodness of Fit and Independence Analysis of Variance Nonparametric Testing ‘STATISTICS EXAMINATION CHAPTER 6 ‘Simple Regression Analysis 6.1 The Two-Varlable Linear Modet 62 The Ordinary Least-Squares Method ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, 1 1 1 67 67 a 69 87 87 87 9 92 124 128 128 128 CHAPTER 7 CHAPTER & CHAPTER 9 CHAPTER 10 CHAPTER 11 CHAPTER 12 CONTENTS 43 Tests of Significance of Parameter Estimates GA Test of Goodness of Fit and Correlation 65 Propertics of Ordinary Least-Squares Estimators Multiple Regression Analysis 7 The Three-Variable Linear Model ‘72 Tests of Significance of Parameter Estimates 7.3 The Coctficient of Multiple Determination 74 Test of the Overall Significance of the Regression 7S Partial-Correlation Coefficients 766 Matrix Notation Further Techniques and Applications in Regression Analysis 1 Functional Form 82 Dummy Variables 3.3 Distributed Lag Models Forecasting BS Binary Choice Models $846 Interpretation of Binary Choice Models Problems in Regression Analyst 91 Multicolineas 2 Heteroscedastici 93 Autocorrélation 94 Errors in Variables Simultaneous-Equations Methods 10.1 Simultancous-Equations Models Ww tasnuticauon 10.3 Estimation: Indirect Least Squaes Wa Estimation; Two-Stage Least squares Time-Series Methods ut 2 “3 14 Testing for Unit Rant ILS Cointegration and Error Correction 11.6 Causality Computer Applications in Econometrics 12.4 Data Formats 122. Microsoft Excel 130 ne 133 154 1st 158 1ST 158 158 181 181 182 182 133 184 185 266 266 267 vi CONTENTS 12.3 Eviews 124 5A5, ECONOMETRICS EXAMINATION Appendix: 1 Binomial Distribution ‘Appendix: 2 Poisson Distribution “Appendix 3 Standard Normal Distribution Appendix 4 Table of Random Numbers pendix § Student's ¢ Distribution ‘Appendix 6 ‘Chi-Square Distribution ‘Appendix 7 F Distribution Appendix 8 Durbin Watson Statistic ‘Appendis: 9 Wikeoxon Appendix 10 Kolmogorov-Smirnov Critical Values ‘Appcadis 1 ADF Critical Values ‘Appendix 12 Data Souroes on the Web INDEX 268 18 Introduction 1A THE NATURE OF STATISTICS ‘Statics refers to the collection, presentation, analysis, and utilization of numerical data to make inferences and reach decisions in the face of uncertainty in economics, business, and other social and. physical sciences. ‘Salisties is subdivided into descriptive and inferential, Deseriptive statistics is concemed with summarizing and describing: a body of data, Mnjerential seattsvies is the process of reaching general- izations about the whole (called the populatian) by examining a potion (called the sample). In order for this to be valid, the sample must be representative of the population and the probability of error also. must be specified ‘Deschiptive statsties is discussed in detail in Chap. 2. This is followed by (the more crucial statistical inference: Chap. 3 deals with probability. Chap. 4-with estimation, and Chap. 5 with hypoth sis testing EXAMPLE 1. Suppose that we fave data on the incomes of [000 US. families. This body of data cam be Summarized By foding the average family income and the spread of these family incomes above and below the iiverage The data also can be described by constricting a table, chart, or graph of the number or proportion of families fm each income clase. This i descriptive statictace. If those [00 Famili are representative of all US. families, we ean then estimate and test hypotheses about the average family ancome an the United States at a whos Since these conclusions are subject to error, we also would have to indicate the probability of error, This 1 saristeal Inference 1.2. STATISTICS AND ECONOMETRICS -Economiciricy refers to the application of economic theory, mathematics. and statistical techniques for the purpose of testing hypwoubwescs ancl id foreeasting eennomic phenomena. Feane- imetrics has become stvongly ilenified with regression analysis. This rolatcs a dependent variable to one ior more independent or explanutory variables Sines relationships arenag aecnomi: variahles are generally inexact, a disturkance or error term (with well-defined probabilistic properties) must be incluted (500 Prob 1 8) ‘Chapters 6 und 7 deal with regression analysis: Chap. 8 extends the hasic regression model; Chap. 9 deals with methods of testing and correcting for violations in the assumptions of the basic regression model and Chaps 10 and 11 4 two specific areas of econometrics, specifically simultancous- equations and time-series methods. Thus Chaps. | to 5 deal with the statistice required for sconometricr (Chaps. 6t0-11). Chapter 12 is concerned with using the computer to aid in the cileulations involved in tho previous chapters ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, for violations. IF the estimated relationship does not pass these tests, the hesized relationship mast be modified and reestimated untal a satisfactory estimated consumption relationship is achieved, Solved Problems ‘THE NATURE OF STATISTICS What isthe purpose and function of (a) The fel of study of statistics? (by Desriptive atx fisties? (03 Inferential stati (a) Statistics isthe body of procedures and techniques used tocollee, present, and analyze data on which terse decisions nthe face of enceruinty oF incomplete Inforenaion. Stata analysts Issa ody in practically every profession. The economist uss it to tos the eflcency of alternative prodaetion| techniques; the busiesaperson may us M46 test the product design or package tht maize sales; the sociologist to analyze the resek of « drug rehabilitation program: the industrial peycholgst £0 seats tera” saponin 19 plat antiroarnant th pe oting paltcny the phyician fo too he ellctveness of a naw drug; the chemist to produee cheaper friars ad so on () esuptive seatstis suena a Ray Aika wth nk ipa Eibt thls the whole data, Ialso refers to the presentation of hexty of data i the frm of tables, charts, graphs and athor forme of graph dip ical cient Foeecatt Gi ioc wT 18 independent oe explanatory varias (0) Statistical analysis applies appeopiate Hla oes to etic the ncaa aud uomenpesiaee tal clation ships among economic variables by utling relevant economic data and evaluating the results, Wht justificy the wchusion af a distusbamce or creur (erty in regression snialysis? ‘The inclusion of a frandom) disturbance or estor teem (with wellatined probabilistic properties) is required in regression analysis for three important reasons, Firs, sings the purpose of theory isto generalize and simplify, ceonomie relationships usually inlude only the most important farces at work, This means that nuimeraus other variables with slight ane repr effects are not ineluded. The error term can be viewed a representing the act elect ofthis large number of small and irregular forees at work. Second, the Imctusion of the error ferm ean be JUsihed -oTder to take mer onsiceration the Net eect oF possAbkesrTaT: im measuting the dependent variable, ar variable being explained. Finally, sinee human behavior usally ier im randim way under identical circumstances the Uisturlsnse or ceror (rm en be asc lo capes this inherently random human behavior, ‘This ersor term thus allows for inéiritual rarsiomn deviations from ths enact and deterministic relationships postlated by economic theory ng mathematical economics, Consumer demand theory states thatthe quantity demanded of commodity Dy, is a funetion of, or depends on, ils price Py, contumer’s income ¥ and the price of other (related) commodities, say, commodity Zi, Pz). Assuming that consumers” tastes remain constant during the period of anslytis, tate the preceding theory in (a) specie or explicit linear form or oquation and (6) in stochastic form. (c} Which are the coeficients to be estimates’? What are they salle? ow Dy=AythyPrt bbe us (ey DrshotbiPrthY+hPrta (a (9) The cooticiets to he estimated are hy, by dae ana by, They ate calle parameter THE METHODOLOGY OF ECONOMETRICS. 110 With reference te the consumer demand! theory in Prob. 1.9. indicate (a) what the first step is in econometric research and (4) what the a priori theoretical expectations are of the sign and possible size of the parameters of the demand funetion given by Eq. (1-4) (a) The first step in econometric analy is to express the theory of consumer demand in stochastic ‘equation form, as in Eq. (14), and indicate the a priori theoretical espestations about the sign and possibly the size of the parameters of the funetion (6) Consumer demand theory postulates that in Eq. (1.4) < 0 (indicating that price and quantity ase inversely related, by > 0 ifthe commodity is normal good (indicating thax consumers purchase more of the commodity at bigher incomes), by =i X and Z are substitutes, and by <4 X and Z are complements Indicate the sccond stage in econometric research (a) in general and (4) with reference to the demand function specified by Eq. (1.4, (a) The second stage in econometric research involves the caleetion of data on the dependent variable and ‘on each of the independent or explanatory variables of the model and utilising these: data for the ctipivieal eitimatlon of the pacaineters of the wiodel. The i URUslly davse with multiple regression analysis (diseussed in Chap. 7) (oy tis wrdee to stints the desman fection given by By. (1), data must be solleeted ou (Up the quantity demanded of commodity ¥ by consumers, (2) the prive of ¥Y, (3) consumer's incomes and (A) the price of commodity 7 per unit of time (ie, por day, month, oF yeas) and aver a number TT 12 INTRODUCTION lomar. t of days. months, or years. Bata on Py. Vand Py are then regressed against data on Diy and estimates of parameters by by bane By obtained, How doos the iype of data required to estimate the demand function specifiod by Fa, (1.4) difler fear the type of ata eat wail Be teqired ta estimate the consumption function for a gecsp af families at ane pons in rie In onder to estimate the demand function given by Eq, (1.4), numencal values of the vanables are required over a period of time. Fer example, ifwe want to estimate the demand finetion for coffee, we need she numerical value ofthe quantity of coffee demanded, say, per yeas, over a numberof years say, ram 1960 ter 1980, Similarly, we need data om the average peice of colle, canstmers income, and the ries, of say, tea (a: aubatitute for coffec} per gear from 1960 to 1980, sta that give nimerical wales for the warinbles of 8 function from pertod to period are walled tinw-serics data. However 1 estimate the consumption funtion for 4 group of Families at one point in tec. we ced crorssectional data (L., numerical valucs foe the consumption expenditures and dispacable incomes of each Family in the group at particular point in time, say, mn 192 CHAP. 1) INTRODUCTION 7 Las (0) The econometric criteria are used to determine i the assumptions of the ceanometric methods used are catiled inthe ecimation of the demand fnetion of Eq. (11) Only i thew aerumptione are ratified ill the estimated coefficients have the desirable properties of unbiasedness, consistency, fficeney, and sa forth (s98 See. 64 Qe way to test the forcxasting ability of the demand model given by Eq. (1.4) isto use the estimated Faction to predict the value of Diy For a periad mat included in the cample and checking that this predict! value s "sufficiently close tothe actual observed value of Dy foe that perise 15 stages Of econometric research 4 Mathematical riod 1 oonomettic (stochastic) model Stage 2: Collection of approprints data 4 Entimation of the parameter of the model ‘Stage 4: Evaluation of the model om the basis af sconemie, atistical, and seonometric critecia I C74 Accent theory Reject theory Revise thenry if compatible if incompatible if incompatible with data wits data wwith data L Prediction Confrontation of revised theory vont new dana Supplementary Problems THE NATURE OF STATISTICS ut (a) To hich field of study is statistical analysis important? (6 What are the most important Functions of Sescripeive statistics? (¢} What is che most important function of inferential statistics? Ars. (a) Toccanomics, business, and other social and physical sciences (By Summarizing and describing | body of data. (0) Drawing inferences abst ths characteristics of 4 population from the comesponding characteristics of a sample drawn from the popallation. (a [s statistical inference associated with deductive or inductive reasoning? (8) What are the conditions required in order fr statistical inference to be wali ‘Ans. (a) Unduetive seasoning (b) A representative sample and probabiity theory STATISTICS AND ECONOMETRICS [Express in che form oP an explicit Incr equation the statement that she Level of investment sponding F bx inversely related 10 rate of interest R dn J y+ byR with by postulated to be negative us INTRODUCTION lomar. t 1.4 What is the answer to Prob, 18 an example of? dng. Aneconomne theory exproted io {enact or deerennitis) evatheratia! form 1.2m Express Bq. (1. in stochastic form, ss. Tet 4b Ro U6 1.21 Why isa stochastic form required in econometric analysis? sing. Becavse the rbationshis among economic variables are inexact and somewhat erratic as opposed to the exact and deterministic relationships postulated by economic theory und matherutical economics THE METHODOLOGY OF ECONOMETRICS 1.2% What are wager (a) ome, (4) two, and (4) thies in oomometaie research? Ans. (a) Spesiication ofthe theory in stochastic equation form and ification of the exposted signs and posse since of estimated paramtrs (8) Collertinn of dats on the warnbles ofthe movil ana timation Othe coofcients ofthe Function. (ch Eeonoeni, statistical, and cconometic evaluation ofthe estsmated rameters 1.28 What isthe frst stage of esonometic analysis for the investment theory in Prob. 118? Ans. Stating the theory iv the Form of Ea. (2.6) and pricing by ~ 0 1.24 What is the sosond stage in esonometric analysis forthe investment theory in Prob. 1.18 Ans, Colfsstie of time-series data on / and and estmation of Ea. (8) 1.26 What is the third stage of ssonometic analysis for the investment theory in Prob, 18? dus, Determination thatthe estimated coeficient of 8, ~ 0, that an “adsquate” proportion of the variation in Fover ome 6 explatned” by changes in R, that 6) is“satistically significant at eastornary levels” and that the econornetsic assumptions of the madel ate satistied Descriptive Statistics 2 FREQUENCY DISTRIBUTIONS frequency distribution, This breaks upp s the number of abservations in each class. The number of sfisiribution is obtained by dividing the number The sum of the felative frequencies equals |. A histogram isa bar graph of a frequency distribution, where classes are measured along the horizontal axis and frequencies along the vertical axis. A frequency polygom isa line raph of a froquency distribution resulting from joining the frequeney of each class plotted at the class midpoint, A. cumecative frequemey distribuste cach class, the total number of observations in all classes up to and including that class. W. this gives a dlstribution curve, or ogive tis often useful 1o organize or arr the data into: groups ar classes anal sh classes is usually between Sand 15. A relative frequenc plott EXAMPLE 4. A student rescived the following grads (measured from 0 to 10).on the 10 quizses he took during 3 semester: 6,7, 6,8, 5, 7,6, 9, 10, and 6, These grades can be arranged into frequency distributions asin Table 3 | and shown graphically as in Fig. 2-1 Table 21 Freqsensy Distributions of Grades Grades ‘Absolute Frequency Relative Frequency t 1 ‘ oa 2 2 U l L el o eo io Lo Fig. 24 9 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 10 DESCRIPTIVE STATISTICS [oHar, 2 EXAMPLE 2. The cans in a sample of 0.cans of fruit contain net weights of frit ranging fram 19:3 to 20.90%. a5 piven in Table 22, If we want to group there data into & claster, wo git eforr éntoreak of O.Fox [(2L0-192)/6=03ed. The weights given in Table 2. can be arranged into the frequency distributions gven in Table 9 ¥and chown praphically in Fig. 9-9 ‘Vale 2.2 Net Weaghe i= Ounces of Feat 7 199 m2 199 m0 26 1 m4 1D 20d 201 9S MY M3 2S 199 WO He 19 198 ‘Table 2.3. Frequency Disertnutlon of Wels rr 192194 195197 19200 Dotan ma me mo7209 Panel A toga ae: Reve epee gain ® é «| a. a z ‘ a ea Panel ive anal; Prequeney peiyzoa " i 3 ‘weghie a ciate Fig 22 cua, DESCRIPTIVE STATISTICS u 2.2 MEASURES OF CENTRAL TENDENCY, Central tendency refers to the location of a distribution. The most important measures of central tendency are (1) the mean, (2) the median, and (3) the made. We will be measuring these for Populations (i... the collection of all the elements that we ars describing) and for samples drawn from populations, as well a Tor srouped and ungrouped data 1. The artiimietic mean of average, of a population is represented by ys (the Greek letter muy and. fora sample, by F (read “X bar"). For ugrouped data, ys and Y are calculated by the following, formulas: St am THEE (res) ¥ * where OX refers to the cum of all the obsarvations, while Nand m refer to the number of observations in the population and sample, respectively. For groped data, ye and Y are caleulated by oe and H (22a,0) +e where 7 roe to the sum of the Trequeney of exeh elass mes the chs mapornt 2. The median for uogrouped data is the-valuc of the middle item when all the tems are arranged in either ascending oF descending order in terms. of values: N4I Median = the ( im item in the data array 4) where’ refers to the number of items in the population (n for a sample). The median for _groupedt dava is given by the formula nfl—F Median = L425 Se (4) Whore J =lower limit of the median class (i¢., the elass that contains the middle item of the distribution = the number of abservations in the data set F = sum of the frequencies up to but not including the median class Jue = frequency of the median elas ¢= width of the elass interval 3. The mode is the value that occurs most frequently in the data set. For grouped data, we obtain (25) Where J. = lower hmit of the modal class (2. the class with the greatest Irequency) dy = frequency of the modal class minus the frequency of the previous class dy = frequency of the modal class minus the frequency of the following class = width of the eas rv ‘The mean is the mort commonly used measure of central tendency. The mean, however, is affected by extreme values im the data set, while the median and the mode are mot. Other meusures of central tendency are the weighted moan, the genmerric- moan, and the harmonic mean (soe Peobs. 2.7 to 29), 2 DESCRIPTIVE STATISTICS lemar. 2 EXAMPLE 3. The mean grade for the population en the 10 quizess given in Example 1, sing the Formula for nmogrouped data, ie LX _LO+TH64 8454746494106 _ 70 we 10 @ ‘To find the median forthe ungrouped data, ve fist arrange the 10 grades in ascending ovder: 5, 6,6, 6,6, 7.7.8, 1, Then-we find the grade of the (¥ + 1)/20r (10-+ 11/2 = $.Sth itr, Thus the median is the average ofthe Sth ‘nd 6th item in the array. ar (6-+ 72 =63. The made for the ungrouped data is 6 (he value that occurs most Frequently in the data set} sins EXAMPLE 4, We can estimate rhe mean for the grouped data given in Table 2.3 with the aid of Table 2.4 Ste 2 at [ns calcuration cous ne simpined coming, (8 Hrobs 2.0 Y= M08 0% Table 24 Caleulatlon of the Sample Mean forthe Data: in Table 2.3 Chass Frequency Weight, on | Midpoiae * pe 1294 193 193 195.197 196 3a ret) 19 8 vn 20..-203 m2 4 sas, m4 208 as 3 ee 20.1209 208 2 416 wid Te = 98402676 = oar Mod = 198+ Unbere £ = 19.8 = lower limit of the median class tic. the 198-2040 class which contains the 10th and 1th obscevations) f= 20 number of observations or terns r sum of frequensies up to bet not inchading the median class fre = 8= frequency of the median class 603 — width of class intersal Similarly Modest n= 8s¥o ssa 9tec As noted in Prob. 2.4, the mean, snadian, and mode for grouped data are estimates used when only the grouped data ble-ar to reduce calculations with a large wngrowped data ext cua, DESCRIPTIVE STATISTICS B 23 MEASURES OF DISPERSION Dispersion refers to the variability or spread in the data. The most important measures of disper- sion are (1) the aveeage deviation, (2) the variance, and (3) the standard deviation, We will mea sure these for populations and samples, as well as for grouped and unerouped data. |. Auerage devaateon. The average devianion (AD), also called the mean atieolute deveatton (MATD}, is given by ‘for populations (26a) nat for samples (ey where the two: vertical bars indicate the absolute value. or the walues oenitting the sign, with the other symbok having dhe same meaning as in See. 2.3. For grouped data ap LAX =o for populations (27a) sot ap-E2™=T pe ampts em where f refers to the frequency of each class and to the class middpoints, Variance. The population variance o? (the Greck letter sigina squared) and the sample variance # for ungrouped data are given by > Step) rw-xy 5 oe ond gf ES (28a) For grouped data eB od eo (290. 3, Standard deviasion, The population standard deviation ¢ and sample standard deviation s are the positive square root of their recpective variances, For ungrouped data a poy [ou = ul? uy - FF ae a Eni gg y= YEAS (2a. The most widely used measure of (absolutey dispersion is the standard deviation, Other measures (besides the variance und average deviation) are he range, Uhe Orrerguarcle range, and the guarate deviation (see Probs, 2.11 and 2.12). 4. The conffcient of variation (8°) measures relative dispersion: and (2.100) For grouped data or populations (2.120) and v=4 for samples (2.12) EXAMPLE 8. The average deviation, variance, standard deviation, and coefisint of variation For the ungrouped ata givon io Kxample 1 can be found with the aid of Table 2.5 (je = 7; eae Example 3k “ DESCRIPTIVE STATISTICS lomar. 2 "Palle 2:5 Custos he Dut bn Examgie 1 Grade | Yawn [Nal Wen? 6 |7 T 7 7 7 |? o ° ° 6 |? “1 1 1 s |r \ 1 1 5 7 2 4 1/7 ° 6 6 |? 1 1 9 |? 2 4 w |? 3 3 ’ 6 |? “1 1 1 Elteal=0 | DW am EXAMPLE 6. The average deviation, variance, standard deviation, and eoeficient of variation for the frequency distribution of weights (grouped data) piven in Table 2,3 can be found with the aid of Table 26 (1° — 2008 a; see Brample O31802 225 9 star quid [ELL OY POS _ Vous = 0.84202 # 0.3982 02 ae 0.0196, or 1.56! Yo War oz * [Note that in the formula for ? and ¢,a— I rather than m is used inthe denominator ieee Prob, 2.16 forthe reason} [Pr the fiers fv oor a Biv tis ssl thers may he esi that wl sey scars for a large body of data (soe Probs. 2.17 to 2.19 for their derivation and application} Table 246 Calculations om the Data in Table 24 a a we | edna | ER] efi} eam ae} Towa | rn we ie some | 0 4 |e | ome] on as ones manana | ans + | am | nae | one as De ava | 0 2 | aw | oz] on La si Eysss> Lae Foe cuar, DESCRIPTIVE STATISTICS 1s 24 SHAPE OF FREQUENCY DISTRIBUTIONS: The shape ofa distribution refers to. (1) its symmetry oF lack of it (skewness) and (2) its peak: edness (kurtosis) 1. Skewness. A distribution has zero skewness if it is symmetrical about its mean. For a symmetrical (unimodal) distribution, the mean, median, and mode are equal, A distribution is positively skewed if the right tail is longer. Then, mean > median > mode. distribution is neastvely skewed if the left tail is longer. Then, mede > median > mean (see Fig. 2-3). Mean Mode Mean ae on Pu A Syma Pama Rose shew na ent avd fg 23 Skewness can bo measured by the Pearson coeficien of skenness: sx = %A= met) for populations 23a) and se Em bop samples (2b) Monn and variance ary the first and second moments ofa distribution, respectively, Skowmeas an also be measured by the third moment [the numerator of Eq. (2.14a.b)] divided by the cube of the standard deviation: sea ZL or popattons (2 and SELEY compte eum For symmetric distributions, Sk = 0. 2 Kurtosts, A peaked curve is called leprolerric, as opposed to a flat one (plarykurric), relative te fone that is mesokurtic sce Fig. 2-4). Kurtosis can be measured by the fouth emament [the numerator of Eg. (2.154.01] divided by the standard deviation raised to the fourth power. The kurtosis for a mesokurtic curve is 3. Lepeokutic Meese EAMMrLE 1. WE Gun DRG the PEurson COEMSIED OF aKEARES 10E Ie stay BE ERAS By uN 1 1, sara a fi aa Pa ATP Te el BE la la i Fata ton use Fe 2 Similarly. by using ¥ = 20.080, med = 3hOGiax (sce Exariple 4), and Pearson svefcient of skewness forthe frequency destribation of weights 347 — med) _ 30 239. (see Example 6), we can fod the Table 2.3 as follows: Sk 28015 toe Fi Le), For kurtosis, see Prob, 223, Solved Problems FREQUENCY DISTRIBUTIONS: ZL Table 2.7 gives the grades on a quis for a cass of 40 students, (a) Arrange these grades éraw data set) into an array from the lowest grade to the highest grade. (B) Construct a table showing class Intervals and class midpolats and the atsolute, ratlve, and cumulative frequencles for each grade, (@) Present the data in the form of a histogram. relative-frequency histogram, frequency polygon, and ogive, Taille 2.7 Grades on u Quite for x Class of 40 Statens (a) See Table 28. Table 28 Data Array of Grades > 2 2s 3 3 @ @ @ @ 4 5 5 5 § 5 6 6 6 6 Boe FF a 8 os 8 8 9 9 9 9 wo 2 DESCRIPTIVE STATISTICS lomar. 2 A sample of 25 workers in plant receive the hourly wages given in Table 2.10, a) Arrange thet caw data into ai aivay fiom the lowest to the highest wage, (2) Group the ata isto classes. (o} Present the data in the form of a histogram, relative-frequency histogram, frequency polygon, aud ogive. Table £10 Hoarly Wages is Dottars TAS M7e O8F 998 400 410 435 RSS ORE nme Sad 390 426 378 39S gOS ame 41S 380 ans 388 393 40d 4a dos (See Table 2.11 300 68 7S 378 380 SRS BAS ORAS 395 398 198 3.96 400 405 ans 405 406 48 40 413 48 42S 4.26 (@)Thshourly woges in Table 2.10 range from $3.55 to $4.25, This can the conveniently subslvided imio ® cqwal classes of $0.10 cach. ‘That is, {$8.30 ~ £3.50]/8 = 8080/8 = 80.1, Note that the range was extended from 3,50 to $4.30 s9 thatthe lowest wags, $3.55, falls win the lowest cass and the largest wage, $4.26, falls widhiv the largest class. Tt is also convenient (and needed for Plotting the frequency polygon} to find the class mark or midpoint of each class These are shou in Table 2 ‘Table 212 Froqucacy Distribution of Wages [Hourly Wage] Class ‘Absolute | Relative | Cumulative 5 ‘Mispoint, $ | Frequency | Froqueney | Frequcr ‘na sa) 360) 3.69 3 o.08 370-3.79 o.00 330 3,89 a0 4004.09 om 410-419 an 420-429 uns Loo) cuar, DESCRIPTIVE STATISTICS ro Panel Ac Hisgram Pan Neate rogue sention gE : 1 fe =. ‘ gon ass = 3 olay Precl De Ogre MEASURES OF CENTRAL TENDENCY 24 Find the mean, median, and mode (a) for the grades om the quiz for the class of 40 students given in Table 2.7 (the ungrouped data) and (6) for the grouped data of these grades given in Table 29, (a) Since we are dealing with aif grades, we want the population smear DN TES 46445 MO ey x cr “ay = SPH ‘That ix, jb obtained by adding together all the 40 grades given in Table 27 and dividing by 40 [the three centered dats flips) were pat i 19 sNoid repeating the 40 values in Table 2.7] ‘The median i siren by the values of the [(W 4 1)/2th tem in the data array in Table 28 Therefore, the median ix the vale of the (40-4 1)/3 oF 20.5th, oF the average ofthe 20th and 2Ist item. Since they are both qual tn 6, the metinn is, The mind is 7 (the vale that qssare mot frequently in the ata set) (6) We can find the paputarian mean for the grouped data in Table 2.9 with the aid of Table 2.13 This isthe some mean we found for the ungronped data, Note that the som of the frequencies, $f. equals the number of observations in the population, N, and EN = 5°70. The median for the grouped data of Table 2.13 is given by = 554067 =617 20 mM DESCRIPTIVE STATISTICS lomar. 2 whore L.— $.5— lower limit of the median class (ie the 5,564 elass, whieh contains the 04h stad 298 obser vate) = 40 = number of observations F = 16 =su of observations up to but aot including the enedian. cass Frequency of the median class seith of class interval ‘The made for the grouped data in Table 2.13 i given by +74 avd Where £= 6.5= lower limit of the modal clas fue, the 6.5-7.4 class with the highest frequency of 8) i —2 = frequency of the maa clas, 8, mins the Frequency of the previons clas, 6 sh 4— frequency of the modal clas, 0, minus the frequency of the following class, # = L = wiih of the olass interval Note that while the mcan calculated from the grouped data is in this case identical to the mean saloslated for the ungrouped data, the median and the mode are only (goad) approximations ‘Table 2.13 Cakulaton of the Population Mam forthe Groped! Data in Table 29 Grade [Class Midpoint x] Frequency aa Z 2534 3 3544 4 as Sa s 5$64 6 5ST T 7584 ‘ es o4 ® 95-1 Find the moan, median, and mode (a) for the cample of hourly wage received by the 25 workers recorded in Table 2.10 (the ungrouped data) and (d) for the grouped data of these wages given in Table 2.13, oe yp EX _ sas 4 sizes 9.68 SEAM or S98 8 Medion = $3.95 the value of the fn 1/3 (25 | 1) = 13th fe in the data array in Table 2.11} Moge ~ §3.95 and 54.05, since there are three of each of these wages, Thus the distriution is iste Ge at hs tuo ates (6) We can dnd the sarmple mean for the grouped data im Table 2.12 with the aif af Table 214: Note that in this ease 5 fil = 98,75 # SO’ ='998.65 (found in part a) since the average of the cobrervation: in sack clace ic not equal ta the clacs midpoint for all classes [ar im Prob, 2.38 cur, 25 1 DESCRIPTIVE STATISTICS 2 ‘Thus T cabcuated from the grouped data is only a very good approximation for the trie value of F calculated for the ungrouped data. nthe neal workd, we often feave only the grouped data, or if we have a very lasge Body of usgeouped data, i will save on calentions to estimate the meat by fest cermping the atm Te 1 compared with the true median of $4.95 found from the ungrouped data (sce part). age HOT 5H Mode = 1+ (0.10) = $400 + 80.028 = $4028 or S403 1s compared ‘ith the true modes of 5395 andl $4.05 found from the ungrouped data (see part a Swvaetinin Un re senor given asthe anidpwnnt of te wa tas ‘Table 2.14 Caleutation of the Sample Mean for the Grouped Data in xt = Compare the advantages and disadvantages of (a) the mean, (6) the median, and (c} the mode as measures of central tendency. (ah Te aug Une vnc ase CF iC Gains an sleet by vinhslly everyones (2H lle observations in the data are taken into account, and (31 it & used in performing many other statistical procedures and tests. The disadvantages of the mean are ()) itis afested by xtreme Values, (2) it is time-consuming to compute for a large body of ungrouped data, and (3) if cannot be calculated shen the lst clate of grouped data ie opemended (Le, it inchudes the lower limit of the last class “and aver”) G8) The ausaniages of themmalias a's €1) ibis uw alfeted by cuisine valuss, (2) i iscaily netstat (Gc. hal the data are smalles than the median-and half are greater, and (3) it ean be calculated even whan the Inst olast 9 open-ended and shen the data are qualitative rather than quantitative, The slsadvantages of the mean ars (1) it does not use much of the information available, and (2) ib recpires that obearsations be arranged into an amray, which ie time consuming for a Harge badly of ‘ungrouped data, (0) The enlvantayss wf the enous are the sans as theme For iis snsaion, The analsantagss uf ahs mode are (J) as for the median, the mde docs not use much of the information available, and (2) sometimes no walns of the data is repested mons than ones, ao that there is no mode, while al other times there may be many maces. In general, the mican i the most frequently used measure of| central tendency and the mode ic the beat wiod 26 aT 28 DESCRIPTIVE STATISTICS lomar. 2 Find the mean forthe grouped data in Table 2.12 by coving (ie, by assigning the value of x = 0 te the tho Sth esses ai — —1, yo = —2, eRe eae lower elas and j= Hy jem 2, oe 80 cach Larger class and thon using the formula Terst ce, (210) where Xp is the midpoint ofthe class assigned j = O and cis the width of the hiss intervals}, Ses Table 2.4, ‘Table 2.18 Calculation of the Sample Mean by Coding forthe Grouped Data in Tabi 212 Waously Wage, » | Clas wapomet, > | Codey | Prequeney ) 7 ee 3s 3 T se 360 aes 3 : mM 2 ars 1 2 380-389 3.88 0 4 390-399 sas 1 : 400-409 408 2 6 400-419 aus 3 3 420-43 4 z = 5395 Et sassy Sn in) sans “F for the grouped data formed by coding is identical to that found in Prob. 2.48 without coding. Coding eliminates the problem of having to deal with possibly large and inconvenient class rmidpoints; thus it may simplify the calculations. A firm pays a wage of 54 per hour to its 25 unskilled workers, $6 to its 15 semiskillod workers, and 3810 is IU skilled workers, What is the wergiied arerage, oF weighted meu, wage pais by this fim? In find the weightet mean, ox weighted average. of a poptlation, j4., oF sarmple. T. the weights, w, have the same function as the Frequency in finding the mean for the grouped dala. Thus Lew or a= ee (207) ‘For this problem, the weights are the number of workers employed at each wage, and Ss equals the sum of all the workexs (S425) + (56) (15) we wie ie This weighted average compares with the simple average of S6 (S44 S6-+ $8)/3 = S6] and i a betier imeasare ofthe average wages, Anation faces a rate of inflation of 2% in ome year, 5% inthe sevond year, ane 12.5% inthe third your. Find tho geametrio meun of tha inflation rates (the geometnie man, op Ng, of oat oF n Positive numbers is the mth root of their product and is used mainly to average rates of change and index numbers XN, (2.8) cur, 29 DESCRIPTIVE STATISTICS 2 where Nj Xy).00) Ny refer to the w (or N) abservations. He = Y/CVSVUTS = WTB = 3% This compares with = (24+ $+ 12.5)/3 = 19.5/3 = 6.5%, Whew all the musbees are equal, jg equa otherwise jy smaller than j. In practic, 1g i ealculated by logarithms: Slee N ‘The scometsis mean is wied primarily i the mathematics of finance and Finansial managsmeot op ho = (ny A commuter drives 1Omi on the highway at 60 mi/h and 10mi om local streets at 1Smi/h. Find. the harmonic mean, The harmonic mean jx is used primarily to average ratios: N Bu = Spe) a (1/60) + O15) (1 4)/60 10 sean Tos amie sscanpeted with je =O VIN = (14 16)/9.= 14/9 = 37 Sanith Note that if ris ecnnter had aereapied 30.5 mifh it would have taken her (20 on/37-Sanij6O min = 32min to drive the 2 mi. Insicad she drives Gimin om the highway (10 ai at 6@ mish) and 40 min oe local streets (10 mi at LS mii or a total of Sin, and this is the (comreet) answer we get by using jy = 2M igh. That i (20rni/24i/h) x 60 min = Sein. (a1 Por the ungrouped data in Table 2.2, find the first, second, and third quartiles and the third deciles and siatieth percentiles. (6) Do the same for the growped «ata in Table 2.12, (Quarriler divide the data into 4 parts, deciles into 10 parts, and percentiles into 100 parts) Go) Q) Uist quartile) =.4 (the average of the 10th and 11th vahies in Table 2.8) 2; (second quartile) = 6 = the valve of the Sth item = the median 2 (thied quastie) — 7.5— the value of the 20.2 itn Dy (third decile) = 5 the value ofthe 125th item Fa (sistiath percentile) = 7= the value of the 28.5 inom nis F af = 24 msassmnses (220 (ey Beals nit aa * (90,18) = 53.90.4807 = 8897 = median (22%) =" (sa10) = 5.00 sn0792 = $4.08 (227) 2 DESCRIPTIVE STATISTICS lomar. 2 (224) = $4.00 + (80.10) seis SH + $0067 = S402 1225 MEASURES OF DISPFRSION Ru 243 (a) Find the range for the ungrouped data in Table 27, () Find the range for the ungrouped data in Table 210 and for the grouped data in Table 2.12. 4c) What are the advantages and disadvantages of the rangs? (@) The range for ungrouped data is equal to the value of the largest observation rminus the value of the smallest observation in the data sxt. The range forthe ungrouped data in Table 27 is from 210 10, 0r8 points, (8) Tassie far th ageonped ata is Table? Inde feeen 814St0 $4 26, 08 STL TE Fae grange sata, the range extends from the lower lint ofthe smallest lass to the-upper Imi ofthe largest class, Fo the srouped data in Table 2.12, the range extends from $3.50 10 5.29 (©) The-advantapes of the range-are that it i easy to find and understand, Its disadvantages are that it ‘cso the lowest nl highest valves of adistriition, ee ereally illinsea by-exterme abies sand it cannot be found for aper-ended distributions. Bectuse af these disadvantages, the range is of tel usefulness (except in quality control. Find the interquastile ange aval Ue quantile deviation (2) fox the wrod it Fable 27 and (4) for the grouped data in Table 2.12 (w) The interquartile range is equal tothe difference hetwcem the tind and frst quartiles; - 21-9 1226 For the ungrouped data in Table 2.7, [R = 7.5 —4 = 35 points ftilizing the values of Q; and Q« found in Prob. 210 (a) Note that he antrguartl ange iv aot afte By careme values becane a lies cooly the mide Kalf ofthe data, Its thus better than the range, but ite no as widely used. the other measures of cispersion, For the quartile deviatio o = (22) QD Therefore, QD = (9.6 4)/2= 3.6/2. ‘one-fourth of the da (R= Q, ~ 0, = SA8 ~ $3.82 = $0.25 otilering the values of Qy and Qy Found ip Prob > 10(6¥ p= 21-21 _ $4.08 S383 1.78 points, Quartile devindon measures the average mange of 02s Find the average deviation for (a) the ungrouped data in Table 2.7 and (B) for the grouped data in Tabls 29. (a) Since ps = 6 [see Prob. 2a). Eu DHLSOFAH2ETSOS1ESESEAE IE LEIS DOE TEED EE $ASISOFIES424EG42EIES42FOS 1424340404 34441 n ap DL. Lspointe cur, 2d 1 DESCRIPTIVE STATISTICS 28 [Note that the average deviation takes every ebscrvation into aecount. It measures the average of the abvolute deviation of each abusrvation from the mean. It taker the absalute value (indicated by the to vertical bars) Because SO(¥ — 2} =O (see Example Sh. (oy We sae fal rstnes evant fv Une sane rpm da wits Une abd of Table 216 DA wl 72 Ap ND the same as we Found for the wngroupod: data, ‘Table 216 Calewtalons forthe Average estat for the Grouped Data im Tabbe 29 Clans Midpoint r Frequency. | Moan v—p| fra 2 3 6 a 3 3 6 3 4 4 5 6 2 0 . . «6 | 1 5 6 6 6 o fo ® 8 4 6 2 | 2 8 ° 4 6 sf 3 2 S104 0 2 6 a | a 8 Dyeve@ Elr-a=7 Find the average deviation for the grouped data In Table 2.12, ‘We can dnd the average deviation for the grouped data of hourly wages in Table 2.12 with the aid of Table 217 (F = 3:95, ee Prob, 2.463): Note thatthe average deviation found forthe srouped data sm estate of the “rus” average deviation ther comid be wad ke the agent ata Th sally es saat fers tbe Fran average devitin because we use the estimate af the mean for the grouped data in our ealculations [compare the values of T Found in Prob. 2.0) and (6) ‘Table 2.17 Calculations forthe Average Deviation for the Grouped Data in Table 2.12 Hourly Wage, [Class Midpoint] Frequency [Mean J ¥—¥,]|— HL] f= ¥h, s XS f 5 si] os s Sa-h60 hos 040 | 030 bs 30-478 335 120 | 020 oa 380-389 385 4 1 | 010 pap 30-398 398 5 o.08 | 0.00 boo 400-409) 4.05 6 ow | a0 a0 410-419 48 3 20 | 020 050 420-429 4 2 0.30 | 030 ba Lfaaas Eri T = 300 26 DESCRIPTIVE STATISTICS lomar. 2 AS Pind the warianoe and the standard deviation for (a) the ungrouped data in Table 2.7 and (@) the grouped data ia Table 29. (°) What is the advantage of the standard deviation over the variance? fa Te and 6 Goce Prob. 234) SUV Wh UGTA OFS ELA OS TE IE OE WS TELS E OFTHE ESE LG HOPLAOFLS OAS IG H4 SFOS ESO E TEA O FETE OS ICH =i 2h .8 points squared Eww _ (_ ae, on Pe a pe VEE 219 pons (6) We can find the variance and the standard deviation for the grouped dats of grades with the aid of Tale 218 SEyiy =u _ 92 Poet ints. square ° w ay = 48 points squared and om Var = VER 219 points the same as we Found for the wngrouped data “Table 2.13 Calculations for the Variance and Standard Deviation for the Data in Table 2.9 Frequency f tm?) fora? ” 2 16 36 2 Tifa = py = 192 (6) Tisacvantags af me stand deviating wer the waa is thatthe stata oval is mepesiel the same units a& the data rather than in “the wideh squad,” which is how the variance is expressed ‘The standard deviation is by for the most widely used measure of (absolute) dispersion. {E10 Find the variance and the stangard deviation for the grouped data in Table £10 ‘We san find the varie andthe standart deviation forthe groped data hourly wage withthe nit of Table 2.19 [¥ = $3.95; soe Prob, 2416): obs aT and IT 18 3,803.89 inom 400-400 4.10 4.19 4049 ote that in the Formula for and s,m — I rather than 9 i used in the denominator. The reason for this is that if we take many samples from a polation, the average of the sample varianees does not ead to qual population variance, 0°, unlce we we» 1 i the donominator of the Formula for «(mora wll be sald oF this im Chap, 5). Furthermore, ° and s for the grouped data are estimates for the true Fane £ thot com be found foe the grouped data because ae ie the coimate of W from the grote eat i our ealeulations, Starting with the formula for a” and s' given In See, 2.3, prove that (a and rear (2.280,6) ® and (2.290,6) co Dw— wt _ Tt aye!) _ Catan " ¥ ¥ ¥ EM yas Ee ER aE We can get by simply replacing wih Tang 4 sth im the numerator and WHR A — 1 mn abe denominator of the Formal for p ELF ma? EO = twa DAF = A al (ey Efe a! Ese ates 2 AE ya DI e N N ‘We can pet inthe same way as wedid in pat. The preceding forme wil simplify the call ions foe of abd = for a large Body of data, Cig also halps (ae Prob. 2.6 Find the variance and the standard deviation for (a) the ungrouped data in Table 2.7 and (0) the groupe lata in Tate 2.9. wine rhe style canpuarianal fowmnulas in Prob 217 cur, 220 1 DESCRIPTIVE STATISTICS » ‘Table 2.21 Calewlations for the Variance and Standard Deviation forthe Grouped Data i “Vane 212 Hourly Css ‘Wape,'S | Midpoint x8] Frequency |X, $ a a saess9 | ass 1 338 1200 s03.09 | 365 2 730 265450 amo3% | 375 2 7.50 28.12 asos9 | as 4 15.40 9.2000 s903.99] 39s 5 19.35 7a0128 4oo-409 | 40s 6 ux | 164025] geal aioars | us 3 1s 51.8675 amar | 42s 2 aso [isms] 361280 rE. EAY= 78 300.8828 Find the coefficient of variation V for the data in (a) Table 27 and (6) Table 2.12. fe) Whats the usefulness of the cocificient of variation’? (a) with je~ 6 and 2.19 (se Prob, 2.19) 219 points eo Gpeints 885, or 6.35% (6) With T= 99.95 (©) The coefficient of variation measures the relatiw dispersion in the data and is expressed as a pure number without any units. This ss to be contrasted with standard deviation and other measures af ‘absolute dispersion, which are expressed én the unite of the problem. Thue the eoeficient of variation ‘cam be used to-compare the relative dispersion of two oF more distributions expressed in diferent wits, swans lia he ee ts val ifr, Fs esata we wa ay Un lenge o e aa i Table 2.7 is greater than that in Table 2.12. The vellcient of variation also can be used to compare the ve meri fsa ss ne F neck war thane 30 DESCRIPTIVE STATISTICS lomar. 2 2:22 Using the formula for skewness based on the third moment, find the coeflicient of skewness for the data in (a) Table 2.9 aad) Table 2.12, (@) We can find the eoelciont of skewness for the data in Table 2.9 using the formula based on the thind moment with the aid of Table 2.22: 2 “Tamm = 4 This indicates that this distribation is negatively skewed. but the dogree of skewness is measered differently than in Prob, 2.71 ‘Table 22% Calcuations for Skewness for the Data in Table 2.2 Grade Frequency [Mean fara] fa? isa z 3 6 ; 3 3 6 ass 4 6 2 assa s 6 | $ ssa 6 6 0 0 e578 T 6 L 1 8 1384 t 6 2] 8 2 asoa ® 6 af|on 108 osm % 6 a] ot E () See Table 2:23, [Note that regarutess of the mensure of skewness sed, te 398 | ~0.30 | 0081 onie: 370-339 373 ao 0.0016 sons 3.80-389 388 4 a9s | nin | ooo oon 3590-399 395 5 ass | 0 fo o 00-09 403 é sas | oo | 000 0006 410-439 4s 3 as | om | aoe sons 49 2 ass | 930 | ooost Sole EAN T= 00570 R DESCRIPTIVE STATISTICS lomar. 2 2:24 Find the covariance between hourly wage ¥ and education Y, measured in years of schooling in the data in Table 2-26 Table 2.26 Employee Hourly Wages and Years of Schraling Employee Hourly Years af Number | Wage x,3 | Schooling : 1 a0 n > now uu 3 2.00 0 4 20 R s 11.00 6 7 25.0 18 8 1.00 18 » 650 R io 825 0 From the calculations ip Table 2.77, cow(, ¥)~ (108.55/14) ~ 10.388, When 1 and 9° are both above a tuclow their means, eavariance # imereased. Wher X and Y move in apposite diectians relative to that cans (empress 9), cownriance i decreased Sinee in this ease eaw(N, V} >> 0. ¥ and ¥ mawe together to thetr means. Table 227 Employes] Howly | Years of ~ oo Number | Wage X,S [Schooling r] (x —F) iw-Tor-7} 1 2 327s | 18 5595 3 ta | -2775 | 38 lasts 4 1050 2 175 | -18 2.05 5 1.00 | -orrs| 22 1.705 6 1500 6 3aas | 22 avs 7 25.00 42 S548 8 10 4a os ° 650 13 9.495 0 828 38 13.398 suas Bix — THY — ¥y= W338 sere 2.28 Compute the covariance from Table 2.26 using the alternate formula, Computations are given in Table 228 eovy, ¥) = (17388/10) (11.728)(13.8) = 172.88 162.495 0.355, ‘Table 2.28 Caketations for Covartance with Altenate Fortsala Employee Supplementary Problems FREQUENCY DISTRIBUTIONS 1226 Table 2.29 gives the frequency far gasoline pricesat 48 stations ina town. Present the data in the Form a bistogram, « elativedrequency histogram, a frequency polygon, and an ogive. Table 2.29 Prequeney Distibation of Gasoline Prices rice, Frequency Toot 7 1.01.09 6 Liga a Lis-49 1s Lae. 5 Ls.29 i 227 Table 2.30 gives the frequency distribution of family incomes for sample of 100 families ina sty. Graphs the data into a hietogramm, a relative frequency bisogramn, a Frequensy polygon, and an ogive 38 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 Fig. 32 3. Rude of multipbearion for dependlens events, Two events are dependem if the occurrence of ome is connected in some way with the occurrence ofthe other. Then the joint probability of A and B is PUA and B= PLA) PLBy AY (36) This reads: “The probability that Aorh events and # will take place equals the probubility of event A times the probability of event 8, given that event A has already occurred.” P(B/A) = conditional probability of B, given that A has already occurred (3.7) and P(A and 8) = PB and A) Ga) Dee rob, 5.1(6) and (a). 4. Rule of madtiptication for independent events. Two events, A and B, are independent if the ovcurtence of A is not connected in any way to the oocwrrence of B, [P{8/a) = P(B)). Then P(A and B) = #4) PB) (9) EXAMPLE 5. Ona single tossofa dic, we can get only one of six posible oateomes: 1,2, 3,4, 5,0" 6. These are routualy exchstve vents, W'the di is fait, P{1) = P(2) = P(3) = P(8) = 213) = #16) = 1/6. The probability of setting a2 ov a 3 on a single toss af the dic is PQQ oF 3) = PI) + P13) = Similarly {2 oF 3 oF 4) = Pi2) + Fi8) + Fea) = EXAMPLE 6. Picking at random a spade or a king o0 a single pick from a wellabufled card deck does not constitute two mutually exchuive events because we could pick the king of spades. This 1 L_w_4 PIS or K) = PIS) + PIK) ~ PIS and K) = Using set hey, the pec statement can be reuse i an euivlen way as 4 SUK) = FS) +AK)— PISOK) = B+ 3-B- Sak ‘where the symbol Ui (read “union”) replaces wv and 7 tread “intersection”? replaces and, EXAMPLE 7. The outoomcs of tao svocessine tosses of « hakanced coke ar inipondens cvents. The outcome of the first toss im no way affects the oirtsome on the Keeond tous, Tha PUH and Hy) — PHM) — PIR) EH) Similac, AH and H and Hi = PH HH) = 1H) POE Pt) = 3-4 EXAMPLE 8 The probally that onthe ist pick fom dak we gt the king. ood is ri, ‘Table 2.1 Probability Distribution of Heads in Two Tesses of a lanced Coin Nurnber of Heads Poste Ouicomcs Probaity a 7 a TH ur 9.50 1 2 HH 40 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 in o Nur hee Fig. 321 Probability Distetbution of Heads in Two Tosees x Balanced Coin EXAMPLE 10, Using the binomial distribution, we can find the probability of 4 heads in 6 Nips of a balanced ein as allows: . 4 ro) Xap sey = xs Hay a8 ta 54 43.2 me) uw: 1/64) ‘When w and 1 arc lazge numbers, mgthy calculations to find peobabiisics can be avoided by using App. 1. The expected mumber of heads in 6 fips =u = ap = (6)(1/2) = ¥ heads, The standard deviation of the probability Aistibuion of fis on vip aah — TAREE) — EA VTS 129 heads Bocanse p =0 5, thit probability disribation is symmetrical If we were not dealing with a coin and the trials were not dependent (asin sampling without replacement), we would’ have hid tee the hyporgecmeeri distribution (see Prob. 3.27, 34 THE POISSON DISTRIBUTION ‘The Poisson dsirsburion is another diserete probability distribution. Tt is used to determine the probability of a designated number of successor per ult of rimw, when the events of successes are independent and the average number of suscesses per unit of time remains constant, ‘Then Mes po = (ra) where X= designated number of successes P(X) = probability of ¥ number of suocesses = (Greek letter lambda) = average number of successes per unit of time = base of the natural logavithaie system, oF 2.71828 Given the valve of 4 (the expected value or moan and variance of the Poisson distribution), we oan find e~* from App. 2, substitute in Eq, (3.13), and find PLY). CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS: 41 33. CONTINUOUS PROBABILITY DISTRIBUTIONS: THE NORMAL DISTRIBUTION “A continioss random variable X is one that can assumean infinite number of values within any given interval. The probability that 2° falls within any interval is given by the area under the probability distribution (or density function) within that interval. The total area (probability) under the curve is 1 (see Prob. 3.31). ‘The normal distribution is a continuous probability distribution and the most commonly used. dis tribution in statistical analysis (ee Pea 339) The normal curveis hell-sha ped and symmetrical aout lts mean. Teextends indefinitely in both dircetions, but most of the.area (peabability) is clustered around. the mean (see Fig. tof 68.26% of the area (probability) under the normal curve js ineluded within one standard deviation of the mean (.¢., within je 19), 95.44% within ye 2a, and 99.74% within ys 30. XE Nore carve +++ + + 4 FRE Stand mel curve = 6826 = = sae + tt, Fig 84 ‘The starudard normal distribution is a normal di OFT Ge, je Oand a= 1) Any normal distribution (A scale in Fig. 3-4) cam be converted into a standard normal distribution by letting js = 01 and expressing deviations from yin standard deviation units ( scale) ‘To find probabilities (areas) for problems involving the normal distribution, we first convert the 1” value into its corresponding = value, as Follows: : (Bus) Then we look up the value in App. 3, This gives the proportion of the area (probability) included under the curve between the mean and that 2 value. EXAMPLE 12. The area (probabiitv-under the standard normal curve between z= 0 and z~ | Sis obtained by looking up the value of 1.96 in App. 3, We move down the: eolumon in the table to 1.9 and then across nil we are below the coluins headed 0.06, The ¥alue that we get 0.4750. This means that 47.80% ofthe total area of 1,.ar 100%) under the curve lis between 2 =d and z = 1.96 (the shaded arca im the figure above the table}. Because of symmetry, the area between 2 = 0 and 2 =1.96 (aot given in the table) is also 0.4750, or 47.50%, EXAMPLE 13. Suppose that Vis a normally distributed random variable with i = 10 and a* = 4 and we want to find the probatelity of assuming a value between Sand 12. We fist calculate the = values corresponding to the values of € and 12 and then look up these ©-values it App. 3 n PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 For == 1. we get 0.413 from App. 3. ‘Then, z = +1 equals 20.3413), or 416826, This means that the probability ‘oF X assuming a vale between 8 and 12, or PAB < W < 12), ie 68.28% Koes Fig, 3 EXAMPLE 14 Suppose again that is a normally distributed rangom variable with = 10 and o! probability that X will assume a value between Tand 1M can be found as follons: The 18, we Hook up 1.80 in App. 3 and get 04322. For 3 =2, ue get 04772, Therefore, POT << 14) = 04332 404792 = A194, oF 91.04% (see Fig. 15), ‘Therefore, the probability of 1’ assuming fa value smaller than 7 or dargee than |4 (the nsbaded tail aews in Fig. 3-5) is | 09104 = M0896, 896%, The ‘normal distribution approximates the binewsial distribution when » = 3 and both up > $ and nil —p) > §, and it approximates the Fonson distribution when A > 11 Sse FTODS, 857 and 3.881, Another continuAKIs probatsty stistribution isthe exponential distro (see Prob. 3.39) Chebyshev 'stheceam, oF inequality, states that repardless fof the shape of a dlstriberlon, the proportion of the observations or arca falling within K standard deviations af the ‘mam is atleast |= 1/K, for X > | (see Probs, 340 and 3.72), cd ~ ar a eae + tf “ Yee Pip 38 Solved Problems PROBABILITY OF A SINGLE EVENT 31 (a) Distinguish among classical or a priori probability, relative frequency or empirical probabil- ity, and subjective or personalistic probabsity. (b) Whatis the disadvantage of each? (e) Why lo we study probability theory? a) According to classical prababily, the probability of an event A is given by Pid) = ¥ ‘where P(A) — probability that event 4 sill o¢eur re = number of ways event 4 can oecur N = total pumber of equally possible eutcoenes By the classical approach, we can make probability statements about balanced coins, fakr dice, and standard card decks a prior, ar-withowt tossing a coin, rolleng a cie, ar drawing a card. Relate eesucmy st erpirioal petabaiy i given by the eat of the wusnbe of ties ae vent cextrs the ‘otal number of actual outcomes or observations, As the ptumber of experiments ar trials fsach as the ‘ooring of a coin) increacer, the relative Frequency or erapirical probability approachec the laescal ora CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a 32 a3 priori probability. Subjective or persnnatstc probability refers to the degree of betle/of an individual ‘that the event wall oceur, based on whatever evidence i available tothe individual () The classical ora priori approach to probability can only be applied to games of chance (such as tossing ss Traut, rns Fait iss oe pishins wards fiona stanadund aovh wf sao} lies we wae determine a prior, or without experimentation, the probability that an event will oscar, Ia realk ‘world problems of ceonamies and business, we afte cannot axdgn probabilities « price aad the classical approach cannot be used, The relaive-frequency or empirical approach eversomes the sicvantopes of the clastcal approach by sing the rvlalive frequenries of maasl ceewrrences as probabilities, The diliculty with the relativefrequency or empirical approach és tat we get different probabilities (relative Frequencies) for different numbers of trials or experiments, These probabilities stabilize, oF approach a limit, as the numer of tris or experiments increases. ecause this may be expensive and time-consuming, people may end Up using it without a “suflesent” aumber of trials of experiments. The disadvantage of the subjective or personalistic approach to probability is that sffrent people faced with the same situation may come up with completely different probabilities, (o) Most of the decisions me face in economics, business, seiece, and everyday’ life invatve risks and probabilities, These probabilities are easier fo understand and illustrate for games of choice bocanse Objective probabilities can easily be assigned to various events, However, the primary reason for studying probaly theory i 10 help us make intelligent decisions in economics, busines, selene, ant everyday Me when sk and uncertainty ase mvolved, What is the probability af (a) A head in one toss af a balanced coin? A tail? A head or « tail? (6) A 2 in one rolling of a fair die? Nota? A2ornota 2 (or (by. Sinee each of the 6 sides of af ic is equally likely to come up and a 2 is one of the possi Pi) = ‘The probability of not rolling 2 that is, #42") i given by cia 1-P Pays ei) = (iy a spade, (c) the King of spades, Cd) ner the king af spades, ar (0) the king of spades or not the king of spades? ah Since there are 4 kings K an the 9Z-earas oF the sangre acok a (6) Since there are 13 spades Sin the SE cards, P(S) = 18/52 = 1/4 (©) There is only one king of spades in the deck, thetefone PCRs) = 1/32 (ai The probability of not picking the king of spades is PUK) = 1 ~ 1/52 = S1/S2 o) (RS) | PORES) = 1/52 1 51/30 = 53/30 = 1, or exctainty “ 3s 36 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 ‘An urn (vase) contains 10 halls that are exactly alike except that 5 are red, are blue, and 2 are gueen. What is the probability that, in picking up a single bal, the ball is (a) Red? (i) Due? (e) Green? (d) Nanblue? (e) Nongreen? (f) Green or nongreen? ¢g) What are the odds of picking a blac ball? (h) What are the ookls of wot piching « blue ball? Nn _$ « ny ho nas w « “ rip) 1B) 1-03-07 ) FG) 1 F(G)= 1-02-08 wn HG) + PG) 02408 = (e) Theodds of picking a he ball are piven by the ratio oF the mumber of ways of picking a blue bal to the ‘numberof ways of not picking & Hue ball, Since there are 3 Hue balls and 7 nonblue balls, the oddsin favor of picking a blue ball are 3 to 7, of 3:7 (ih) The odds of not (against) picking a blue ball are 719 3, or 7:3 Suppose that a 3.comes up 106 tlmes In 600 tosses of dle. ar) What Is the retanlve frequency of the 3? How does this differ from classical ora priori probability” (by What would you expect to be the relative frequency or empirical probability if you increased the umber of times the die is rolled? (a) The relative frequency or empirical probability of the 3 is given by the ratio of the number of times 3 comes up (106) out ofthe total number of times the dic is rolled (600). Thus the rekative froaucasy o7 empirical probability of the is 16/600 0.177 in 600 rolls. According to the classical ar a prion approach fand without rolling the die at alll, P(3) = 1/6 0.167. the die i fais, we expect the 3 10 ‘cme up 100 times in 600 rolls ofthe die as compared with the actual, observed, or empirical 106 times (b) Ifthe mumber of times te sane dic is roted is increased trom 60, we expect the relative frequeney empirical probability to approach (i, to becameles+ unequal with) the classical ora priosi peabalility The production process results in 27 defective items for each 1000 items produced. (a) ‘What is the relative frequency or empirical probability of a defective item? (b) How many defeeti do you expect out of the 1606 items produced each day? (e) The relative Frequency or empirical probability of defective item is 27/1000 = 0.027 () By muleplying the number of thems produced cach day (Ie00) by the relive fequency or emnplrieal probability of a defective stern (0,027), we get the number of defective items we expect omt of each day's ‘vutput, This is (1600}40.027) — 43, te the aearest ise. PROBABILITY OF MULTIPLE EVENTS a7 Define and give some examples of events that are (a) mutually exclusive, (b) not rhutually exclusive, (e) independent, and (af) dependent. (e) Two oF more events are mutually exelusve, or dinjoint, if the cectrsence of one of them precludes prevents the occurrence of the ethers). When one event takes place, the others) will not. For cuample, In-a single Mp of a coin, we pet elthor a head oF a tall, but nox both, Heads and calls are therefore mutually exchusive events. In a simple tous ofa dic, we get one and only one oF six possible watodnis, 1,2, 3,4, Sea 6. The oulscnies ant iefove swaRUally exclusive, A cas picked At éasons san be of only one sui: diamonds, hearts, clubs. orspades. A child is hom either a boy ara gi items produced an an assembly ine ic sither good or defective CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS: 48 39 (6) Two or more events are nos nautuaty excfustve if they may occur atthe same time. ‘The oesurrence of ‘one does not preclude the eocurrence of the other(s). For example, a card picked at randkown from 2 deck of cards can be both ant ace and a club. Therefore, aces and clubs ave not mutually exclusive vente. herr: we crmldl pick the ace of elnbs Resance wr eon have inflation and reeession at the same time, inflation and recession are not mutually exchisive events (2) Two or more events are inepondont if the oscarrence of one of them in no way afte the oceurrence of the other(s). For example, two successive fups of halanced coin, the outcome of the sacon Hip im 0 way dopeads ow te term of the fst fig Ths Sue is tre fay raw sures tasers a a fief dice or picks of two cards fram a deck with replacement, (Two oF more events are dependent if the securrones of onc of thom offsets the probability of dhe ‘ecurrence of the others) For example, if ae pick a card from a deck and do not replace it, the peababulity of packane the same card ae the second piek is Allother prokabiitirs alo are affected since there are naw oaly SI cards in the dock. Similarly. af the proportion of defective item is greater for the evening than for the morning shift. the probability that an item picked at random frem the evening satput is defective is arcater than for the morning oatput Drawa Venn diagram for ta) mutually exclusive events and (5) not mutually exclusive ever (c) Are mutually exclusive events dependent ar independent? Why? (a) Figure 3-6 illuctrates the Venn diagram for events 4 and é which ave enucuslly exclusive (6) Fagure 3-7 usteates the Venn diagram for events 4 and dF which are mot mutually exctusive. OO) Fig. 26 Figa7 (eo) Mutalty exchosive events are depsndent events, When one crsnt secure, the probability of the other occurring is. Thus the oecusrence of the fist allects (precludes) the escurrence of the other. What is the probability of getting (a) Less than 3 on a single roll of a fair die? (6) Hearts or clube on a cingle pick from a well chuilled standard deck ofearde? (s) A red or a blue ball from an urn containing 5 red Balls, 3 blue balls, and 2 green balls? (df) Mere than 3 on a single rol a fair dis? (a) Geting tess than 3 on a single roll of a (air dic means geting a | ora 2. These are mutually exclusive events. Applying the rule of addition for mutually exclusive events, We get Pier Fy +r) Using set theory, P(L or 2) can be cewrtten in am equivalent way as P{Q/U2}. where U is read “anion” and stands for a. (8) Getting s heart or a stub 96 a single pisk from a welkshufed desl of cands alse constitatcs two: maually exshisive events. Applying the rule of addition, we get PH or C) = PIMC) = © POR of B) = P(RUB) Mor Ser6)=rausus) =a mols ms) bed FFor example, in caleulating F(A or C) in part a, the ace of clubs is counted UL AAU AL Il LRM UROLEI CALI A both cvents will accur simultaneously is, and no deuble counting is invobved is why the ruke of sddition for mutually onchusive ovents dees net contain a negative tem, What is the probability of (a) Inflation [ oF recession R i the probability of inflation is.3, the probability of recession is 0.2, and the probability of inflation and recession is 0.06? (é) Drawing an age, a elub, or a diamond on a single pick trom a deck? (a) Since the probsbility of inflation aud recession is net O, inflation and reccsion are not mutually exclusive events. Applying the rake of addition, me get PAL or Ry = Pi) 4 PAR} ~ {Land Ry or PLR) = PUD) PERS = PLE By and PiLor RY = PUR) = 03-492 ~ 06 — 0.44 () Getting an ace, a club, ora diamond does not constitute mutually exclusive events because we could pet the ace of tubs or the ace of diamonds. Applying the rule of addition for events that are not mutuals exclusive, We get P(A or © or D) = MYA) + #(C) + PIDI— PA aml C) — PLA and D) 4,01 10 PLA oF © a 2) What is the probability of (a) Two Os on 2 rolls of a die? (6) A Gon each die in rolling 2 dice once? c) Two blue balls in 2 successive picks with replacement from the urn in Prob. 3.4? (a) Thrce girls in a family with 3 ehibdren? (eo) Getting 4 6 on each of 2 rolls of a die constitutes independent events, Applying the rule of sis plication for independent cvents, we get l ee 36 (6) Getting a 6.0m each die in rolling 2 dice once also constiies independent events, Therefore rtd 66% () Since we replace the frst ball picked, the probability of geting a bu ballon the second pick is the same fc 09 the fet pick. The events ara independent. Therefore (6 and 6) = Pieris) = Pi6) (6) FUG and 6) = PIG) = PIB) PLO}= CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a Aa 8 PB aed = we (a) The probability of a girl, G, on each birth constitutes independent events, cach with a probability of 0.3. Thewtore PUG and G and G) = PEGG FG) = PIG) - PYG) - PIG) = (05) -(05) - (0.5) = 0.125 ‘oF J chance in 8 (a) List all possible outcomes in rolling 2 dice simultaneously. (6) What is the probability of petting a total of 5 in rolling 2 dice simultaneously? (ch What is the probability of gctting a total of 4 ar less in rolling 2 dice simultaneously? More than 4 us 6pmsible and quail fikely ouicomes, and the wuicome on cachlicis independent. Since cach ofthe 6 ouenmesoa the first die can be associated with each of the 6 oatcomes on the second dic, thore are a total of 36 possible autsomes that bi, the sample space Nis 24, (In Table 3.3, the ist ‘uiibor refs tthe oatconse om the Bist die, and the sooond aumnber refers to the sozond dee, The dist can be disinguished by diffewet colors.) The total of the 36 possible outcomes also-car be shaven by 2 roe (or sequential diggram, as in Fig. 8 (a) Bac Table 32 Outcomes in Reiling Two Dice Sinultaneousy 48 Bus. PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 Oxon oo scone on Ge inte the econ ie 6 Fig. 38 Tree Diagram for Rolling Two Dice Simultaneously e)_ Since ? balls, one of which was red, were already picked and not replaced, there remains a total of 8 alls, of which 4 are red, in the urn. ‘The (conditional) probability of picking another red ball i AARUR and Re’) = F(R/R' and R) = 4/8 = 1/2. What is the probability of obtaining (a) Two rod balls from the urn in Prob 3.4 in 2 picks ‘without replacement? (b) Twoaces from a deck in 2 picks without replacement? (e) The acs of ‘hubs and a spade in thar order in 2 picks from a deck without replacement? (df) A spade and the see of chuls ov that order im 2 picks fear a ddeck without replacement? (2) Throw ve halls from ‘the urn of Prob. 3.4in 3 picks without replacement? (f) Three red halls fromm the same urn 30 3 picks with placement? a) Applying the rule of multiplication For dependent events, we get 6) @ CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS ” 316 uy a Sand Ae) = ASN Ach = AS) Ae} = Beha = eRe a8 (Pend Rand R) = ARAROR) = AR) -#iR/R) ARIK ae) S43 ot oo Wee (7) With replacement, picking, three balls from an um constitutes three independent events. Therefore POR and R and R) = PIR) P(R)- PIR: io 10 10 Past experience has shown that for every 100,000 items produced in a plant by the morning shift, 200 are defective, and for every 100,000 items produced by the evening shift, $00 are defective During a 24-h period, 1000 item are produced by the morning shift and 649 by the evening shift What is the probability that am item picked at random from the total of 1400 items produces during the 24h period (a) War produced by the morning shift and ix dafective? (5) Was produced by the evening shift and is defective? (c) Was produced by the evening shift and is net defestive? (a) Te dofeetive, bother produesd by the morning or the evening shift? (a) The probabilities of picking an item produced by the morning shift MI and evening: E are 000 00 iM 0625 and PE) = SE ‘The probabilities of picking a defeetive item D from the morning and evening outputs separately are 20 00 2a a DIM) = sary = 8D — and FID /E) = ET = 0m ‘The probability that an item picked at random from the total of 16 Hems produced during the 24-h period was produced by the mening shif und ip defective ie XM and BD) = PM) #(D/M) = (0.6289(0.002) = 0.00125 ( P(E apd D) = PCE) A(D/E) = (0.375}(0.005) = 0.001875 %, G and D') = P(E)- A(D'/E) = (0.3% =asrsns @ PE and D!) = PIE) -F(D/E) = (0.15) SE = 03731 (a) The expected amber of defective itemsfrom the morning sift is equal to the probability of a defective item from the morning output times the mumber of items prodvocd by the momning, shift that i, (0,002 0 From the evening shift we expect (00005)(60M) = 3 defective items. Thus we expect $ defective items from the 1600 items prosiaced during the 24-h period, IF there are indies 5 defective items, the probability of picking al randar any of the S defective lems out of a total of 1600 items is $1600 1/320 or 0.003128, (a) From the rule of multiplication for dependent events Band , derive the formula for P(4/B) in terms of P(A} and P(R) This is Known as Raves" rhearem and is used to revise probabilities when additional relevant information becomes available. (b) Using Bayes’ theorem, find the probability Gtat a defective item picked at random from the 24h output of 1600 items in Prob 4.16 was produced by the morning shift; by the eve (a PiBand aj = FB) -PLA/BI By dividing both However, PR and 4) = PIA and By, exe Prob. 8.15(opand (i). Therefore 0 PROBADILITY AND PRODABILITY DISTRIBUTIONS cua. 3 P54) 8) ay r44/ Bp PE es toro 85) FR) FR (6) Applying Bayes” theoreen to the statement in Prob. 3.16, lets 4 sip the morning, shit Mand 8 sigmty defective D, and utiizing the results of Prob. 3.16, we get FIM) P/M) _ (0.625002) _ 0H? POn/D) = DY —aansiis~ a0sTs 04 Thatis, the probability that a defective teen picked al randora Gow the total 24h output of 1600 eens war produced by the morning shift i 40%. Similarly i9.375)(0005) _ 9.0m1s75 B/D) = rip rey = OTT ANNS! _ OemtES = 0.6, oF 60, Dyes’ theorcen can te generalized, for example, to find the probability that a defective item 2 picked at random was produced by any of w plants (4ie/= 1... ..n), as follows: Pid.) PB SAT PITA where 5) refers tothe summation over the plants (the only ones producing the wuipat), Bayes! orem is apphied im Hesiness decision theory, DU Is sekJom Wed IN the eG of oN, (Mewever, ‘rayesian econometrics is beeoming increasingly amportan.) Pai) = (48) 318 Acclub has § members. (a) How many diflerent committees of 3 members cach can be formed from the club? (Two committees are different even when only one member is different.) () How many commitices of 3 members each can be formed from the club if each commitice is to have a president. a treasurer. and a sccretary? (a) We are imerested here in finding the number of eombinasions of $ people taken 3 at a time without ccancern forthe onder ! SOF In genera, the number of arrangements of things taken ata time-without eoner for the onder isa combination given by = (aaa tar) where al tread w fastorial) =e-fn 1) fa —3)-—-3-2-1 and OF = 1 by definition, (6) Since cach committee of 3 has to havea president, a treasurer, and a seeretary, we ane mow interested in nding the number of purmmutations of 8 people taken 3 at atime, whem the order x éportane ee oe =a ial In general. the number of arrangements i define ode, of n things taken 1 ata time ism peomutae tion given by ” a (4s) Permutations and combimations foften referred to as counting teinigues) are helpful in counting the saeiher OF ally Whely ways eve a ode cela te Une Lat of alps aid ual likely ‘oatcomes, Combinations and permutations were not used in previous problems because those pro blame ware simple enough without ther CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 5 DISCRETE PROBABILITY DISTRIBUTIONS: THE BINOMIAL DISTRIBUTION 319 30 Define what is meant by and give an example of (a) a random variable, (b) a diserete random: variable, and (e) a discrete probability distribution, (a) What is the distinction between a probability distribution and a reativesfrequency distribution? ab A rondo wate isa variable host vahucs are aasecated with som protatility of hing sbacred Fr enatple, oe 1 roll ofa fat die, we have 6 mutually exclusive outenmes (2 3, 4,5, 0° 6), each aatociated witha probability ccurtcace of 1/6." has the eutcome from the rll ofa die Wa random “arable. CO) A cdssreie renin raninble is ou Haat cau asses ouly Guile or distinet values, For esas the ‘outsomes from rolling a dic sonstitutsdisrete random variables bscaruse they arc limited to-the values 12,44, 5, and 6, Thie to be contrasted with continous vorlahter, which san accome an infinite number of values within any given interval [see Prob. 3.31fa) (0) Addoceeie probably asieauion veer te he 961 of all puss values uf a (uixercleh random variable land their associated probabilities The sct of the 6 outcomes in rolling a die and their associated Peohabiltcs in an example of a disorate probability dsteibution. ‘The sum af the probabiliion ania ciated with all the valies that the diseste random variable can assume alway’ equals | (a) A probabiiy diserbusion reters wo the classe ora prfart probable associated with ll the values that 1 random variable can assume. Because those probabilities arc assigned a priosi and without any sapirimentation, a probability distribution is oftsn referred to as a ehevvencul (lative) fPequensy sdstribution, This differs from an empicical (relative) frequency distribution, which refers 1o the ratio of the number of timer exch outcome actually occurs to the total mumber of actual trial or observations. Far example, in actually rolling « die a number af times, we are not likely to get ch outcome exactly 1/6 af the times. However, at the number of tolls increases, the empirical (elative) frequency distribution stabilies atthe (uniform) probability ar sheoreticl relativefreq wency distribution of 16 Derive the formula for (a) the mean js ar expected valwe EC¥’) and (b) the variance for a sdscrete probably st ration. (a) The Fortuila for the arithmetic mean far grouped population data [Eq. (2 2a] is ret ante where 55 ffs ihe sum of the frequency of each class f thnes the class mikpolnt W and.” = 5 7, whieh te the number ofall observations or frequencies. In dealing with probability dstibutioms, the mean ye ‘soften soterced tows the “eajtesl Nabe” £(). Ths fovaula fos ye or EA) fou a shancste poobalty sistribution can be derived by starting with Eq, (22a)and keting f = PL). which isthe probability of och of the possible omtoomer W, ‘Thon, 32 f¥ — S5MDUN), which ic tho cum af the valve of each outcome times its probability of eccurence, and N= Ef = 5.A(X), which is the sum of the prob abilities of each evtasune. which is 1 Thus Fer) =e EP (n (6) The formula for the varinnce of grouped popolation data [Eg. 2] is Ev - i" u (ray ‘Qnoe again letting f = PLY’) = probability of cach outcome and the formula for the variance of a discrete probability distribution Erebrun we cam got Var Xa of = = E(YIPPC) = SPP EG = BY (2 22 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 321 Table 3.3 gives the number of job applications processed at a small employment agency during the past 100~day period. Determine the expected number of applications processed and the variance and standard deviation, ‘Table 3.3 Number of Jub Application: Procesced during the Pact 100-Day Period a it) 0 » : M4 4 ‘To the extent that we believe that the experience ofthe past 100 days is typical, ws can find the relative frequeney distibution and equates probability dist®bution, This and the other calculations to find) and Var Y are shows in Table 3.4 VarX =o} =) A0X) —[SENPUXIF = 116— (10.6y = 116 — 112.36 = 3.64 applications squared SDN = oy = ye} = W369 & 1.91 applications ‘Table 34 Caleuations to Fin the Expected Vatue and Variance lumber, Days, rin) Erin x “erin 7 1 on Oo ” 1 8 w ou 08 o 64 w 20 02 20 100 204 un 20 03 33 12 363 1 » va 24 ry ake 4“ Ww on a 196 4 NeSsreto | Daye | xy = 106 y= 6 BUYS = DPD 106 applications 3:22 (ap State the conditions required to apply the binomial distribution. (8) What is the probability ofS heacls in 5 flips of « balancod-coin? (c) What is the probability of less than 3 heads in 5 flips of a balanced evn? (@) Theinomial distribution i used to find the probability of 1” number af occurrences oF soocesses of an seat, PA, aw Winks ofthe sia eopesinget when (2) trace sul)? auutually ealuseve oulkonnes, (@) them trials are independent. and (3) the probability of vccurrence.or succes, p, remains constant in-each trial CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 2 (on FX) aap = PF = (Pol at = a at a See Ege. 3.10) and (3.17). Ia some Books, 1 — p (the probability of failure) iedefised at. Here we — 5 No=3,p=1/2,and 1—p= 1/2. Substituting these values into the presediag equation, we get PO)= ge gg tat (Ua? = (1/2 = 191/32) = 92125 i) PIX -<3) = PION PI) + PD) ET 5 as. PD) = peg UF RY = 35 = 0.125 Thus PUN <3) = POON PI) + PQQ) — 03125 +9.15625 40.3125 — 0.5 323 (a1 Suppose that the probability of parents having a child with blond hair is 1/4. ‘there are 6 chikdren in the Family, what isthe probability that half of them will have blond hair? (bt I the probability of hitting a target on a single shot is 0.3, what is the probability that in 4 shots the target wil be hit at Teast 3 times? (a) Meee 6.8 —3.y— 1), and 1p 3/4, Substituting these values inte the binomial formals, we st 8s apraiay Phe teayanven 85432 ae PSN ap UNC =a (LOHNTION) =F (2/4096) Nga son thy Here n= 3, and 1p PIX> 3) PI) +A) PB 0.3"(071 Thos 3.24 (a) A quulity inspector picks a simple of 10 tubes al random from a very large shipment of tubes knows to contain 20% defective tubes. What is the probability that no more than 2 of the tubes picked are defective? (b) An inspection engineer picks a sample of 15 items at random from a manufacturing process known to produce 85% acceptable items. What is the probability that 10 of the items picked are acceptable? (0) Heron = 10, 22, pehd,and 1 p05: s PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 AN S21 PLO PL) + PRR) 10! ‘oro —07) = 0.1074 ooking up m= 10,0 0, and p= 02 in App. 1) Pil} = 0.2684 (looking up m= 10,1 = 1, and p= 02 in App. 1) P{2} = 1.3020 (looking up w= 10,1 = 2, and p= 02 in App. 1) Thus PIN S21 P(OD-+ PL) + PLZ) —O.1074 + 0.2684 + 0.3000 = 6778 (8) Here m= 15, ~ lip 8.85, and | p= 0.15. Since App. | only gives binomial probabilities For up 10.0.5, we should transform the problem. The probability of = 10 acceptable items with equals the probability of = 5 defective items with p=4.15. Using a = 15, ¥'= S defective, (of sbjcctive) = 0.15, we pet 0.0849 (from App. 1). Pio} (o2)"os)" 25 (a) IE balanced coins are tossed simultaneously (or 1 bakaneed coin is tossed 4 times), compute ‘the entire probability distribution and plot it. (6) Compute and plot the probability distribution for a sample of 5 items taken at random from a production process known to produce 30% defective items. ta) ; V=0H, IM, 2H, 3H, or 4H; P= 1/2; and App. 1, we get POOH) = 0.0625, 3180, POH) = 4.2400, P(aH) 00635, and En PUI) = 0.3500, PH) thas POOH) + #(0H) + PIM) + PCED + PAH) (0625 + 02500 + 0.3790 + 02500 + 90625 = 1 ‘See Fig. $1 Note that = 0.3 and the probability distribution in ig. 3-9 is symmetrical, z 2 ass an a qu ans ‘a Senter le Number f eter fers Fig, 34 Probability Distribution of Heads in ‘Tosting Foor Balanced Coins Fig, 310 Probability Distribution of Defective ems (81 Using n= 5 4 4.4, or $ dof five; ar p= 0.3, we got pf) = 0.1681, #1) = 0.9602, #(2)—= 0.3087, #3) = 0.1523, 2(8) = 0.02H, AS) = 00028, Therefore PQ) + #1) + PI) + PCR) + PIA) + PIS) = 0,168] + 0.3602 +.0.3087 + 0.19234 00384 40.0024 = ‘See Fig. 3.10, Note that p<. and the probability distribution in Fig, 3-10 is skewed to the righ; 3.26 Calculate the expected value and standard deviation and determine the symmetry or asymmetry of the probability distribution of (a) Prob, 3.2%(a), (6) Prob, 3.24by (c) Prob. 3.240), and (d) Prob. 3.2406). %) EL) = po up = (6)(1/4) = 3/2 = 1S blond children SDY = ye@pT =i = YETTA = VTR7TR = VTE & 1.6 blond children Becaure p < 0.5, the probatility distribution of blond children ic ckewed to the + CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 58 or T= op = (410.3) = sox = vty = YARTHET < = vO & uyens Becatse p05, the praabilitydivebution is skewed to the right (eh BUY) = c= mp = (10)(02) = 2 defective tubes SD-X = Vinpll — p) = V{1OO2NO.8) = VIG = 1.26 defective tubes: ecause p< 0.9, the probability dixtrbution ts skewed to the sight cr zixt= = (1510.85 = 12.7 accemtabie items SD. = api =p) = VTSORS|@1S) = vASTTE 1.38 avoepeable ems Because p> 0.5, the probability distibution is skewed tothe et 3.27 When sampling is done fiom a finite population wishous replacement, the binomial distribution cannot be used because the events are not independent, Then the Aypergewmetric distribution is wed. Thit ie given by CQ) hhypergeometrie distribution (an Te measures the number of suosesses in a sample size taken at random and without replace ment from a population of size N, of which ; items have the characteristic denoting success, (a) Using the Formula, determine the probability of picking 2 men in a sample of 6 selected at random without replacement from a group of 10 people, Sof which are men. (6) What would the result have been if we had (incorrectly) used the binomial distribution? an (@ (") 7 re aa (al ag Pua o Pa) [should be noted that when the sample ie wery small in relation to the population (sa, less than 3% of ‘the population), sampling without replacement has ile effect on the probability of sueves in each tial and the binomul distribution (which is easier to use) #64 good approaimation for the hyperscometcic istribution. This is the season the binomial distabution was used in Prob, 3.2Ka), THE POISSON DISTRIBUTION 3.28 (a) What isthe difference between the binomial and the Poisson distributions? (b) Give some examples of when we can apply the Poisson distsibution, (ce) Give the formula for the Poisson distribution and the meaning of the various symbols. (d) Under what conditions can the Poisson distribution be used as an approximation tw the binomial distribution? Why can this be useful? (@) Whereas the binomial distribution can be used to find the probability of a designated number of suvseases im ins, ths Poitaon distibution is used to funk the probability of designates uuaibec of successes per wn ef tine, ‘The other conditions required te apply the binomial distribution also ars reuited to apply the Poizvan dictation: that i (1) there must bo only te matallywxchicive oot any 30 (6) The Poisson distribution is ofen used in operations research in solving management problems Some samuples ate the aber of telephone alls to te poles pat hous, Hae wunibes of castonnaes aciving ata ‘gasoline pump per howr, and the sumber of trafic accidents at an intersection per week (6) The probability of a designated number af successes per anit of time, Pi), can be found by Met oT ix ‘where X= designated number of successes he averse neimber af sueeesies wes a specie ime perio he base of the natural logarithes system, oF 2.70828 Given the value of, we can find «* from App. 2 substitute it nto the fom, and-ind P(X). Note ‘hats the mean and variance ofthe Poison distribution, (We can use the Poisson distribution 85 an approximation to the binomial distibation when w, the srumber of tak, i large and p oF Up is small (are events}.A good rake of thumb isto use the Poisson distribution when 20 and np or n{l-~ p< S. Whenm is large, it cam be very time consuming to wse th binomial distribution and tables for binomial probabliiss, for very small vals of p may pot be availble. Ifa(l ~p) < 5, soosess and faire shut be redefined so that ap < 5 to snake the approximation ascarate. Past experience indicates that an average number of 6 customers per hnur stop for gasoline at a gusoline pump. (a) What is the probability of 3 customers stopping in any hour? (b) What is the prehahility of Tcustomers or less in any hour? (0) What is the expected value, or mean, anc standard deviation for this distribution? fe*_ (2 \ ® any — GINO _ OSES _ gap é oy Fin ray) P2) 4) fe ayaa Se (O08 gory Ge _ (360.0248) £3)= 00898 fo ut) Ths 5 3) Onn Com oc S28 =o.onss = 00ds6 (2) The sepsgted walvs, of moan, of this Poisson distribution is A — 6 cistomers, and the standard devis tion is VA = VB 2.45 eustoners Past experience shows that 1% of the lightbulbs produced in a plant are defective. Find the probability that more than | bull is defective in a random sample of 30 bulbs, using ta) the binomial distribution and 4b) the Poisson distribution (@) Here 30, p = 0.01, and weare asked to find P(V > 1}. Using App. 1, we pet POO) + Fi) + PLA) + = MORE + O.O031 + 8002 = AMET, oF 61% (6) Since oe 90 aad np — (3RY(0.MT) —03, We san use He Poissow appeosination of the binonal Alstibution. Letting = ap = 0.3, we have tofind PN > I} = 1 — PLY 1), where ¥ is the mamber of Gofective lls. Using Tg, (3.13), we got CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 7 = 0.74082 46 + Ta = 0.968066 ‘As becomes large the apyicosimation besomies even clotce. CONTINUGUS PROBABILITY DISTRIBUTIONS: THE NORMAL DISTRIBUTION 3a3 fuss sual gt icine Grain Te 0 tet ane 8 {hed to approximate other distributions, soch asthe binomial and the Pokson distributions sce Prob, 3.7 and 3.38) Disinbutions of sample means and proportions are often notmal, regardless of the distibution of the parent population (Se See. 4.2), (e) The standard normal distribution i¢ a normal distribution sith j= and ‘Any normal disttbution (defined bya particular value for y and o°) can be transformed into a standard normal distribution by letting ¢— 0 and expressing deviations from y+ in standard deviation units, We often can find areas (probabilities) by converting Y values into corresponding > values [that ix, (= )/o} an looking up these = values in App. 3 Find the area under the standard normal curve (a) between z+ 1,242, and 243; (6) from 2S Dluz = O88 () hows = 1.0 lue = 2.55, (a) Ww lheboll uf: = 1.60, (@) lu the aight of 2 2.55; (A) (0 the left of z= =1,60 and to the right of z = 2.55, a) Thearea (probability included under the standard normal curve between = 0 and z= 1 is obtained bby looking up the vale of 1.0 ix App. 3. This is accompbabed by moving down the z column en the tableto 1.0 and then across until we-are below the columa headed 00. ‘The value that we get i 0.3413 This means that 34.13% of the total area (of 1 of 100%) under the eurve lis between z= 0 and = 100. Because of symmetry, the arca between 7 —0 and 2 ~ —1 is also @3413, 07 1.13% the area. betwee Land z= 1 8 68.25% (see Fig. 3). Similarly, the area between ig 4092, of 41.12% (by Hooking up r= =u) im the eablep, 30 hat the area between, Pe 2 is 95.44% (ove Fig. 3). The area between 7 3 = 978% (sen Fig. 3-4), Note thatthe table sly ass tailed valucy fre ay hy 2.99 Benne Une a wes Ue ete wale «3 i wali () Thearea between = Oand 2 = 0.8 is obtained by looking up 0.88 in the table. This is 0.3106, (©) Thearea between z= O.and = ~1.60 is obtained by booking up z= 1.60 in the table, This is 0.4452. ‘Thearea between z= 0 and : = 2.58 is obtained by looking up 2 = 2.55 in the table. This is 0046. Thas the area under the standard normal carve from z= =I-60 and 2 = $5. cquals 0.4452 phas D546. This is 0.9598, or 93.8% (see Fig. 311). Ima probleme of this nature itis helpful ta sketch a figure i) Weknow that the otal arca under the normal curve is oqual fo 1. Bocauseof symmetry, 0.$0Fthe area s on either side of =O. Since O.A8S2 extends from 2 = 0 to 2 = ~ 1.60, 0.5 ~ 0.8482 = 0.0548, or 5.48%, is the area in the left tll, to the left of 1 6D (ave Fig. 3-11) fe) 0.5~ 0.4049 = LOSS, oF 1.54%, is the area in the right tail, to the right of = 2.85 (see Fig. 3-10. (Fr Thearca to the left of z = —1.60 and tothe right of : = 2.55 is equal to-1 ~ 049998 (sce part ch. This is 1.0802, o 6.02% of the tal. “ PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 uum of Prob, 3.457 dd) A grestsand a white ball ie shat ode in 2 picks turn? (6) Thos green balle in 3 picks without replacement (toon the ura? Ans. (a) 13/2682 a 1/208 (8) GI132 oF 1/22 feb 122 Ae) YEE fe) 6/1320 oF 1,220 jout replacement from the same SM Suppose thatthe probabity of rtm on a given day i 0.1 and the probability of my having a-cxr accident is (9005 on any day acd L012 seein aye (a) What ce svuhl Vase to cabal the platy that oa a tiven day it will rain anc will have a car accident? (8) State the rule asked for in part a, sting A signify acciddant and R signify rain. (8) Calculate the probability acked For in part a dns. (a) The rule of multiplication for dependent exents (6) A(R und A) = F(R) F(A/R) (2) 002 388 _ (@) What rule or theorem should { use to calculate for the statement in Prob. 3.54 the probability that it was sings lige: Fad a car atsntset? (2) Stale the cule-ov thecesns applable be pat ae (e) Ansties the question i part fans. (a) Baye’ theorem (5) BR/A) = A(R) FA/R)/PIAY 4s) O24 438 In how many ciflewent ways can @ qualified individuals be assignod to. (a) Three trainee positions available if the positions are wentical? 48) Three wainee positions eventually ifthe positions cifer? ) Six trainee roils avails ithe pit lifes? Aus. (a) 20 (8) 130) 720 DISCRETE PROBABILITY DISTRIBUTIONS: THE. BINOMIAL DISTRIBUTION, 3ST The probability distribution of lunch customers al a restaurant is given in Table 3:5. Caleulale (a) the ‘expected number of hunch customers, (8) the varianec, and (c) the standard deviation ‘Table 35 Probability Distribution of Lanch Customers at 4 Restaurant Nasu of Castine 100 10 us 120 12s Ans, (a) 113.1 customers (6) 65.69 customers squared (¢¥ 8.10 customers 358 What is the probability of (a) Getting exactly 4 heads and 2 tails in 6 tosses of a bullaced coin? () Getting 3 sixes in 4 rolls of a fair diet Ans, (a) 923 (6) O0LS4R21 380 (a) 120% of the seadents entering college deop out fore secelvingthels diplomas, find the pesbabilcy that ‘ut of 20 stucents picked at random from the very langs numberof students entering college, less than 3 drop fut (8) If 0% of the bulks produced in a plant are acceptable, what isthe prabahulity that out of 10 bulls, Picked at random from the very large outpot of the plant, 8 are acceptable? ns. (a) 9206 tb) 0.1937 ‘380 Caloulase the expected valve and standard deviation and éewermine the symmetry or asyrametry of the Probability distribution of (a) Prob. 3.5842), (8) Prob. 3.59{a), and (ey Prob, 3.3%) Aeon (a) E(A) — 1 els, SEN — 1.22 haul, aod theistabution és ayunneteical (2) ECE) —4 stntents, SD ¥ = 1.79 students, and the distribution is skewed to the right, (e) (1°) = 9 bulls, SD. = 0.95 balks, and the dietribution ie el:ewed to the let CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS oe 261 What is the probability of picking (a) Two women in a sample of $ drawn at random and without replacemant From a group of B people, 4 of whom are womma? (8) Eight men in a eample of 1 drawn at randoot and without replacement from poputation of L000, half of which are men, Ams (a) Ahont C171 dosing the hypergecmesrie slisritusion) (hy Abst O39 (using the hincwnial approximation to the hypergcometric probability) THE POISSON DISTRIBUTION sa Past experience shows that there are to traffle accidents at an lnrerscsion per week. What isthe probe ability of: (a) Four accidents during a randomly selected week? (8) No accidents? {cy What is the sxperted vahts and standand deviation of the distribution? Aus; a} About 0.36 (6) About 14 (@) BLA} A— 2 accidents, and SD. — VR— Ll accidents Past experience shows that 00% of the national labor force get seriously ill during a year, If 1000 persons are randomiy selected from the national labor Force: (a) What is the expected mumber of workers that oil get sek during a year? (8) What i the probability that S workers will get sick during the year? Ars. (a) 3 workers (6) About 0.1 (using the Poisson approximation to the binomial distribution) CONTINUOUS FROBABILIFY DPSTRIBUTIONS: THE NORMAL DISTRIBUTION as aT am Give the formas: (a the probability that eontinuoys variable X falls berween As and Vs. (8) the normal Slistrinution, (c) the expected valve and variance of the normal distribution, and “{d) the standard normal distribution, fe} what i the mean abd Variance of the standard Hormal disteibution? Ans. (ab PLM) OSD, thea _ 2 [Nan OA rem TeV 30 = 4, witht the jite correction factor instead of op = EXAMPLE 4, The probability that the mean of a random sample V of 36 elements from the popalation in [Example 3 falls between 18 and 24 units i compited as Fallows 18 ang oF Looking up 2; and 2) im App. 3, we get rise <2) =08 13 + MATT2 = O.RTRS, oF BLASS cuar. 4) STATISTICAL INFERENCE, ESTIMATION @ Soe Fig. 42, se ca Sica io 7 cele Fig. a 43 ESTIMATION USING THE NORMAL DISTRIBUTION ‘We can get a point or an interval estimate of a population parameter. A poi estimate is a single umber. Such & point estimate is wibiased if in repeated random samplings from the poputation, the expected oF mean value of the corresponding statistic is equal to the population parameter, For example, is an unbiased (point) estimate of because pg = p, Where jy #8 the expected value of The sample standard deviation sfas defined in Eqs, (2.20b1and (2,1Jbi] is an unbiased estimate of {sce Prob. 4.13(6)). and the sample proportion jis an unbiased estimate of p (the proportion of the population with a given characteristic). ‘An interval extimare rofors to a range of values together with the probability, or confidence level, that fhe interval includes the unknown papnlation paramcter Given the population standard deviation ar its estimate, and given that the population is normal or that a random sample is equal to or larger than 4, we can find the 95% confidence interval for the uakinown popubation mean as PUL — 1.960ry < p< 8 + 1.9604) = 0.98 (4) This states that in repeated random sumpling, we expect that 95 out of 100 intervals such as Eq. (4-4) include the unknown population mean and that our confidence interval (based on a single random, sample) is one of these. A confidence interval can be constructed similarly for the population proportion (see Example 7) where (the proportion of suscesses in the population) 43) (the standard error of the proportion) (668) EXAMPLE 5. A random sample of 144 with a mean of 100 and a standard deviation of @) is taken from a population of 1000, The 95% confidence interval for the unknown pptlation mean is £1 mep since n > 30 £196. since m > 0.058 = 100 4 1.9622, OORT sing sat an extmate of © aa 1000 = 7 — 100+ £9048) (093) = 1040.1 Thus sis between 9,89 and 109.11 witha 95% degree of eonfidenes. Other frequently used confidence intervals are the 80 and 99%; level, corresponding ta the 7 value: of 1,64 and 2.5%, respsctively (ose App. 3 70 STATISTICAL INFERENCE, ESTIMATION [omar 4 EXAMPLE 6, A manager wishes to estimate the mean number of minutes that workers take to compli particular manufacturing process within 43 min and with 80% confidence, From part experience, the manager Knows that the standard deviation o is 15min, The minimum required sample sie (w > 30) is found as follows: x oF sop =X we 1a aecaming 1 805N 1s ret 8 ra re SL 3 ince the total confidence interval, fe 3 min 167.24, of 68 (rounded to the next higher integer) EXAMPLE 7, A suste clucation departarent finds that ina random sample uf 100 persons why aitendal college, sérrcceived a college degree. To find the 9% confidence interval for the proportion of college graduates out of all the persons whe altended college, we precoed as follows. Firat, we note that this problem tnveives the binomial distribution (sce See. 3.3}, Since. > 30 and both op > $ and {1 — p) > S, the binomial distribution approebics the normal distribution (which ix simpler to use: sce Sec. 15). Then an papery assuming a < 005) 59,0008) 258) SE sing as an estimate of p was 7 58(0.05) oat o13 ‘Thus pis between 0.27 and 0.53 with a 99% level of confidence 44 CONFIDENCE INTERVALS FOR THE MEAN USING THE ¢ DISTRIBUTION ‘When the population is normaly distributed but ¢ is not known and w < 30, we cannot use the noriial distribution for determining cosfidence intervals for the wiknown population mean, but we can Use the Faistbubion, Tus is symmetrical about sts zeta mean Du i Haller than the standard normal distribution, so that more of its area falls within the tails. While there is a single standard normal stistrbution, there diferent J distribution for each sample size, x, However, asm becomes larger, the 4 distribution approaches the standard normal distribution (sce Fig. 4-3) until, when > 30, they ate approximately equal Appendix 5 gives the values of 10 she right of which we fine 10, 5. 2.5, 1, and 0.5% of the total area wunder the carve for various degrees of freedom. Degrees of freed (4) ate defined in this case as a — | Standard normal dissin > 2X rineation, 93 cuar. 4) STATISTICAL INFERENCE, ESTIMATION a (or the sample size minus I for the single parameter j we wish fo estimate). The 95% confidence it for the unkiwwn population wican when the ¢ distribution is used is given by o(e- 2 0.056, Noe thal We LOOK alf the postin diferent samples of size 2 that me cout take from our ite population of $ mmibers, Sampling from an infinite parent papalation (or from a finite parent ‘population with replaccment) would have required taking an iil number of randem samples of sie ‘frome the parent population (an abuiously impossible task), By taking oly a fonited number of random samples, theorem I would hold only approximately (iss. yy ™ wand vy % yA with the approximation besoming better as the number of random samples taken is increased, In this cass, the tarepling distribution of the eampbe mean gerurated ig refereed to athe ompé (the ae, A population of 12.000 elements has a mean af 100 and a standard deviation of ¢#, Find the mean and standard crror of the sampling distribution of the mean for sample sizes of (ab 100 wk hy 00. (al a) up aa Since a sample of $00 is more than 5% of the population size, the finite correction factor mst be wsed jnthe formala fr the eiandaed error cuar. 4) STATISTICAL INFERENCE. ESTIMATION 78 60 [iano 900 60 | om TMT 1 = 30) To 294.9% oe 20,982) a 1.2 net oe Without the correction factor, - word have been equal to 3 instead of 1,92, (a) What i the chape of the theoretioal sampling distribution of the moan if the paront popula tion is norma? Ifthe parent population is not normal? (2) What is the importance of the answer (© part a? (a) Ifthe parent population is normally distributsd, the theoretical sampling distributions of the mean are also normally dstribuied, regardless of sample size. According (o the centru lime dhearem, even if the parent population is not normal, the theoretical sampling distributions of the sample mean approach Normality as simple size increases (Le.,asm— co), Thisapproximation is sufficiently good for samples of at east 30, (6) The contrabimit theorem is perhaps the most important theorem in all of statistical inference. Te alloms us to use sample statistics to make inferences ahout population parameters without knowing. anything about the shape of the parent population. This will be dane an this chapter and in Chap. 5. (a) How can we calculate the probability that 2 random sample has a mean that fall within a given interval if the theoretical sampling distribution of the mean is normal or approximately: normal? How ie thit different feom the procoss of finding the probability that a normally dis twibuted random variable assumes a value within a given interval? (2) Deaw a noemal curve in. the ¥ and zecales and chow the percentage of thearea under the curve within 1, 2, and 3 standard. deviation units of ite mean, () [the theoretical sampling distribution ofthe mean is normal or apprositmatcly normal, we can find the probability that a random ample has a racan that falls within a given interval by calculating the sorresponding 2 values in App, 3. This is analogous to what was done i See, 3.5, where the normal and the standard normal curves were introduced. The aly diflerence rs that aow we aze dealing ‘sith ‘ue distribution of the 1+ rather than with ihe distribution of the 1s. In addition, Before (X= nie, while now 2 = (4 — ue) /ee=(X —al/or, sinoe ap (6) In Fig 4.5, we have a normalcurve in the 1 scale and a standard normal curvein the rseale. The area Kectle a a er Heelers: a st ort ar Fig. 45 16 STATISTICAL INFERENCE. ESTIMATION [onar, 4 ‘Find the probability that the mean of a random sample of 25 elements from a normally diss twibuted population with a mean 90 and a standard deviation of OD is larger tha 100, ‘Since the parent population is normally distributed, the theoretical sampbing distribution of the mean is ako normally distributed and op = 7/ /m because w-< O0SN. For X= 100 kop www or elyn GOVE TE ‘Looking up this vabue in App. 3, we pet 083 PCE > 100) = 1 ~ (0.5000 +0.2967)= 1 ~ 0.7967 = 0.2033, o¢ 20.33% See Fig. 46, Atk Euale Fir 4 4.12 A small local hank has 1450 individ wl sivings accounts with an average balance: of $3000 and a standard deviation of $1200, If the bank takes a random sample of 100 accounts, wit is the probability that the average savings for these 100 accounts will be blow $2800? ‘Since w= 100, the theoretical sampling ditibutioa of the mean is approximately normal, but since > WUBIN, the finite sorrection factor must be wsed fo find rp. For X= 82800 N-up =u 280 — 5000 m2 sy ao Nan 1200 flaso. 19 [380 OR VaYN-1 JinoV 0 oy ia 73 im App. 3, we pot PCY < 82800) = 1 — (0.5000 + 0.4582) ‘Looking adit, on 4.18% See Fig, 47, ESTIMATION USING THE NORMAL DISTRIBUTION 413 What is meant by (a) A point estimate? (@) Unbiased estimator? te) An a) Because of cost, time, and feasibility, population parameters arc frequently estimated from sample statistics, A sample viaistic used to estimate a popwlation parameter i called an exsimaror, and specific observed value is called an estimaie, When the estimate of an ueknown population parameter is piven by a single number, itis called a poiw eseimate, For example, the sample mean is an feslmator of the population mean, and a single valve of fsa point estimate of Similarly, she ‘sample standard deviation scan be Gsed as an estimator of the poptlation standard deviation @ and single valus of» isa point estimate of u. The sane proportion psa be used as an estimate Fr the population proportion p, and a single value of 7 is & point estimate of » (ue. the proportion of the popelation with a given characteristic) val estimate? CHAP. 4) STATISTICAL INFERENCE, ESTIMATION 7 (6) A estimator is anblesad if in repeated random sampling from the population the corresponding ttatictic frora the theoreiical campling divteibution x equal to the population parameter, Another Way of stating this is that an estimator i unbiased if its expected Value (see Probs. 3.20 and 3.31) is qual to the popmlatinn parameter being estimated For example, ¥, « [einer in Fas. (106) ant @.NB)), and pare unbinsed estimators of w,.0, and p, espectively. Other important criteria for a good estimator are discussed in Sec. 6 (0) A fetcratessimate refers to the range of values wsed to estimate an yaknown population parameter gether with he peababiliy, oF candice level thatthe interval dis doch the va kas BopMsTc parameter. This ic known as. eowfdence inerval and is usually centered around the unbiased point sstimate, For example, the 95% coniidenos interval for ur is given by FN = 1960 < ps Nop 360.9) = 0.95 The two mimbers defining confidence interval ars called confidence fits, Because an interval ctiate also expresses the dogsse af accuracy o¢ cowialence we have it the estionate, st i SUpEHOr te 2 point estimate, 4.14 A random sample of 64 with a mean of 50 and a standard deviation of 20 is taken from a. population of 800. (a) Find an interval estimate for the population mean such that we ire 93% confident that the interval includes the population mean, (8) What does the result of part a tell us? G2) Since n> Hi, 20 can woe the 2 wulue of 1.56 from the standard normal cstsibution to construct the 98% confidence interval for the unknown popalation and we can we sas an estimate for the unknown oo en see he RA" (anes a tn an Ze a Sf EE eens om tan ow WAT ta hi rsh a [N=a ~ 8 Oy ogo 24 a - 2 a ao aN =17 yea soo 1S (24) oe $0.4 4.30, Tia is Beton the Hower contidenes nt of 46.2 nthe upper condense limit of $4.7 witha 98% level of sontders. () The result of par tells us that if we take from dhe population repeated random samples, cach of sine 11 = 64, and construct the 88% contidenes interval for cich of the sample means, 98% of these cone fidence intervals will contain the trae unknown popaation mean, Dy assoring tha cur confidence interval (based on the single random sample that ws have actually ken) esone ofthese 95% sonidence that include p, we take the calculated rik of being wrong Sof the tne A random sample of 25 with a mean $0 is taken from & population of 1000 that is normally disteibuted with a standard deviation of 1 Find (a) the @0%, (8) the WM%_and (6) the 9% confidence intervals for the unknown population mean. (aly What dogs the difference im the results to parts 2, b, and ¢ indicate? @ W=N-L16top seh youn i normally dt Jee E64 se <1 anes 04m i i were vis = wats so+ 98d

You might also like