You are on page 1of 335
SCHAUM’S oullines Second Edition POS ULAR Sau. a = Updated examples with the most current U.S. and world data Two complete self examinations = New chapter on Time Series Econometrics Perfect for pre-test review Use with these courses: A sisiics and Econometrics [7 Statistical Methods in Economics } Quantitative Methods in Economics (+! Mathematical Economies [ Micro-Eeanomics Macro-Econames Math for Economists: Math for Social Scienees BS RTS Theory and Problems of STATISTICS AND ECONOMETRICS SECOND EDITION DOMINICK SALVATORE, Ph.D. Professor and Chairperson, Department of Economics, Fordham University DERRICK REAGLE, Ph.D. Assistant Professor of Economics, Fordham University Schaum’s Outline Series McGraw-Hill hii Th Cara Hl Cogpeight ©2002 hy The MecranHill Compnins, fe. All ighisteservnd, Manufanared inthe Wied Steve Ameren cep | ered under te Ue ates Copyright Act 01 no pata! ens pasiicion may he repeadaced Cede my orm foc by any means oc ee in a danse o reival ssem, without the peice wrinen permis be publi 07-1 30566-7 ‘The mesa inthis eRe so apa inthe pein version this il: Oa 4852.2 All trademarks ae trademarks of tei respective owners Rather then pt atradeturk symbol afer evry accurece of sade marked same, use mars nan edie Cashion only and ote bene. he trae mark mer, with mo iment of ing ‘ment ode wader, Beere sch Jesignsions appear in Chis Dak, ey have en pine with ial eas ‘MCh Hill ks we avila a ia pean nse wc un pcoio daengsead ny i i np ‘ining pgm For move inti, pease come Geowpe Haare, Special Ses at geoige Nowe Pacyeaw Dillman 212) Saale), TERMS OF USE This Copyright week and The MeCraw-Hill Cormpies, Ine. (°MAcew-HiL") an isles veer al rights in and (he wok, Use of this wor i tet wo hese toma. Except as permit wer the Cpsyigh ACT of TTS a he right ste aun euieve cae copy ofthe Work you muy wt vom, issotnbe,reveie exper, reproduc, sxe cale deta ‘works based open tana dtr, disseminate, sl, 9 aublcen work or ay pat fice concent. Yu may use the week fr your ow mancoreerca nd pers se: any other use of he wok ie mictly pr le Your right no the wrk may be semintod i you fat comply wich those tema ‘THEWORK IS PROVIDED "AS IS", MeGRAWAIILL AND ITS LICENSORS MAKE NOGUARANTEES OR WARRANTIES, AS TO TH: ACCURACY, ADEQUACY 8 COMPLETENESS OF OR RFSULTS TO BF OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK 08 OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANT, EXPRESS OR IMPLIED, [MCLUDING BUT NOT LIM. ITED TO IMPLIED WARMARTIES OF MERCIAN TAMILTFY 8 FTTNESS FORA MAITICUEAR BUILPOSE- Merve tn iene donot watraace pune tha he Fanci contained athe wexk wil meet youreeguiement eth is oper: sion wil he uninterupted or era free. Nether MeGeaa-HIl mart lcensrs shal be Hable you ot anyone else for any im: ‘acy, em oc cto, regardies Oca, rhe WCE oe MH ages ren eTeOM. Mra Rae NO repo "Sil forihe conten of any infomation aveseed rough the work. Under no cumstances all McGraw-Hill nd its cen sense able ray inc, incident, peca pie. consegoeaia oe emir dares tha result fe he une €or nak ley w seat work, even ify of them hasbeen asc the posit of such damagex This ntation Vai sl apply to any claimer case whatsoever whether mach elaieer cause arises in ears, nto otherwise. DOK: 10.1m36u0TEDeS«S? This book presents a clear and concise introduction to statistics andl econometrics. A course in statistics tr casmunnctiice io wften ume uf the mest useful but abso ane of the ununtdillivalt ut Use reyined oun ses in colleges and universities. The purpose of this book is to help overcame this diliculty by using a problem-solving approuch, Each chapter begins with a statement of theory, principles, ar background information, fully lla strated with examples. Thies followed by numerous theoretical and practical problems with detailed, step-by-step solutions. While primarily intended as a supplement to all current standard textbooks of statistics andor ecomomettics, the haok can alse he uscd asan independent text. aswell as to urpplement class lectures, ‘The book is aimed at wollege students in economics. business aciministration, and the social sciences taking a one-semester or a one-year course in statistics andjor econometrics. It also provides a very iwseful source of reference for M.A. and M.B.A. students and For all those who use tor wold Hike to use) statistics and evonometries in their work, No prior statistical background is assumed, The book is completely self-contained in that it covers the statistics (Chaps. | to $) required for econometrics (Chaps. 6 0 11), It is applied in mature, and all proofs appear in the problems section rather than in the text itself. Real-world socioeconomic and business data are used, whenever possible, to demonstrate the more advanced econometne techniques and models. Several sources of online data are used, and Web addresses-are given for the student’ and researcher's further use (App. 12). Topics frequently cncoumered In econometrics, such as multicollincarity and autocorrelation, are clearly and concisely discussed as to the problems they create, the methods to test for their presence, and possible conection toclusigus. i this seam edition, we have expuanied the computer appliativis ty prusake a reneral introduction to data handling, and specific programming instruction to perform all estimations im this book by somputer (Chap. 12) using Microsoft Excel, Eviews, or SAS statist have also added sections on nonparametric testing. matrix notation, binary choice models, chapter on time sorics analysis (Chap. 11}, field of econometrice which has expanded at of late. A sample statistics and econometrics examination is also included. ‘The methodology of this hook and much of its coment has heen tested in undergrad graduate classes in statistics and econometrics at Fordham University. Students found the approach and content of the book extremely useful and made many valuable sugesstions for improvement. We have also received very useful advice from Professors Mary Beth Combs, Edward Dowling, and Damo- dar Gujarati. The following students carefully read through the entire manuseript and made many ‘useful comments: Luca Bonardi, Kevin Coughlin, Sean Hennessy, and James Santangelo. To all of them we ate deeply grateful, We owe a great intellectual det to our formar profesor of tatisies and econometrics: JS. Butler, Jack Johnston, Lawrence Klein, and Bernard. Okun ‘We are indebied to the Literary Executor of the Inie Sir Ronald A. Fisher, F. R.S., to Dr. Frank Yates, F. K.S.,and he Longman Group Ltd., London, for permussion to adapt and reprint 1apiss IL and IV from their hook, Statistical Tables for Biolagical, Agricultural and Medical Research. In addition 10 Statistics and Econometrics, the Schaum's Outline Serles in Economies includes Microeconomic Theory, Macroecanomic Theory, International Economics, Mathematics for Economists, sand Principles of Ecomrnies Dosmack SxLvarone Derrick Rescu New York, 2001 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, CHAPTER 1 CHAPTER 2 CHAPTER 3 CHAPTER 4 CHAPTER 5 Introduction LL The Nature of stausbes 12 and Econometrics 13 ‘Methadalogy of Econometries Descriptive Statistics 2A Frequency Distributions 22 Measures of Central Tenvleney 23° Measures of Dispersion 24 Shape of Frequency Distributions Probability and Probability Distributions 31 Probability of a Single Event 2 Probability of Multiple Events 33 Diserow Probability Distributions: The Binomial Dastriburion 34 The Poisson Distribution 35) Continuous Probability Distribstions; The Normal Distribution Statistical Inference: Estimation 41 Sampling 42 ie Distribution of the Mean 43° Estimation Using the Normal Distrib 44° Confidence Tntervals for the Mean Using the ¢ Distribution Statistical Inference: Testing Hypotheses SA Teating Hypotheses 52 Testing Hypotheses about the Population Mean and Proportion 3° Testing Hypotheses far Dillerencey between Two § Proportions SA ChisSquare Test of Goodness of Fit and Independence Analysis of Variance Nonparametric Testing ‘STATISTICS EXAMINATION CHAPTER 6 ‘Simple Regression Analysis 6.1 The Two-Varlable Linear Modet 62 The Ordinary Least-Squares Method ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, 1 1 1 67 67 a 69 87 87 87 9 92 124 128 128 128 CHAPTER 7 CHAPTER & CHAPTER 9 CHAPTER 10 CHAPTER 11 CHAPTER 12 CONTENTS 43 Tests of Significance of Parameter Estimates GA Test of Goodness of Fit and Correlation 65 Propertics of Ordinary Least-Squares Estimators Multiple Regression Analysis 7 The Three-Variable Linear Model ‘72 Tests of Significance of Parameter Estimates 7.3 The Coctficient of Multiple Determination 74 Test of the Overall Significance of the Regression 7S Partial-Correlation Coefficients 766 Matrix Notation Further Techniques and Applications in Regression Analysis 1 Functional Form 82 Dummy Variables 3.3 Distributed Lag Models Forecasting BS Binary Choice Models $846 Interpretation of Binary Choice Models Problems in Regression Analyst 91 Multicolineas 2 Heteroscedastici 93 Autocorrélation 94 Errors in Variables Simultaneous-Equations Methods 10.1 Simultancous-Equations Models Ww tasnuticauon 10.3 Estimation: Indirect Least Squaes Wa Estimation; Two-Stage Least squares Time-Series Methods ut 2 “3 14 Testing for Unit Rant ILS Cointegration and Error Correction 11.6 Causality Computer Applications in Econometrics 12.4 Data Formats 122. Microsoft Excel 130 ne 133 154 1st 158 1ST 158 158 181 181 182 182 133 184 185 266 266 267 vi CONTENTS 12.3 Eviews 124 5A5, ECONOMETRICS EXAMINATION Appendix: 1 Binomial Distribution ‘Appendix: 2 Poisson Distribution “Appendix 3 Standard Normal Distribution Appendix 4 Table of Random Numbers pendix § Student's ¢ Distribution ‘Appendix 6 ‘Chi-Square Distribution ‘Appendix 7 F Distribution Appendix 8 Durbin Watson Statistic ‘Appendis: 9 Wikeoxon Appendix 10 Kolmogorov-Smirnov Critical Values ‘Appcadis 1 ADF Critical Values ‘Appendix 12 Data Souroes on the Web INDEX 268 18 Introduction 1A THE NATURE OF STATISTICS ‘Statics refers to the collection, presentation, analysis, and utilization of numerical data to make inferences and reach decisions in the face of uncertainty in economics, business, and other social and. physical sciences. ‘Salisties is subdivided into descriptive and inferential, Deseriptive statistics is concemed with summarizing and describing: a body of data, Mnjerential seattsvies is the process of reaching general- izations about the whole (called the populatian) by examining a potion (called the sample). In order for this to be valid, the sample must be representative of the population and the probability of error also. must be specified ‘Deschiptive statsties is discussed in detail in Chap. 2. This is followed by (the more crucial statistical inference: Chap. 3 deals with probability. Chap. 4-with estimation, and Chap. 5 with hypoth sis testing EXAMPLE 1. Suppose that we fave data on the incomes of [000 US. families. This body of data cam be Summarized By foding the average family income and the spread of these family incomes above and below the iiverage The data also can be described by constricting a table, chart, or graph of the number or proportion of families fm each income clase. This i descriptive statictace. If those [00 Famili are representative of all US. families, we ean then estimate and test hypotheses about the average family ancome an the United States at a whos Since these conclusions are subject to error, we also would have to indicate the probability of error, This 1 saristeal Inference 1.2. STATISTICS AND ECONOMETRICS -Economiciricy refers to the application of economic theory, mathematics. and statistical techniques for the purpose of testing hypwoubwescs ancl id foreeasting eennomic phenomena. Feane- imetrics has become stvongly ilenified with regression analysis. This rolatcs a dependent variable to one ior more independent or explanutory variables Sines relationships arenag aecnomi: variahles are generally inexact, a disturkance or error term (with well-defined probabilistic properties) must be incluted (500 Prob 1 8) ‘Chapters 6 und 7 deal with regression analysis: Chap. 8 extends the hasic regression model; Chap. 9 deals with methods of testing and correcting for violations in the assumptions of the basic regression model and Chaps 10 and 11 4 two specific areas of econometrics, specifically simultancous- equations and time-series methods. Thus Chaps. | to 5 deal with the statistice required for sconometricr (Chaps. 6t0-11). Chapter 12 is concerned with using the computer to aid in the cileulations involved in tho previous chapters ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, 2 INTRODUCTION lomar. t EXAMPLE 2. Consumption thoory tells ws that, in general, peop: inexease their eonsumpéion expensiture C as thete dicporable (after-tax) necome ¥, increater, bul aot by at mach ag the dneveace in thairdisporalie income. This ean be stated i explicit incur equation fost 38 cathe wa ‘where by and dare unknown constants called parameters, The parameter hy is the slope coulicient representing the frarginal propensity to constume (MPC) Since even people with icatical disposable ineowe are Likely to have somemhat different consumption expenditures, the theoretically exact and deterministic relationship represented by Eq, (11) must be madifed to inchude a randam disturbance or error term, 1, making tt stochastic: c ba heee uw 13 THE METHODOLOGY OF ECONOMETRICS Econometric research, in general, involves the following throe stages: |. Specification of the model or maintained hypothesis in explicit stochastic equation form, together with Uwe a primi theoretical expectations about de sign and size of Ure pranuncters of the function, 2 Collection of data on the variables of the model and estimation ofthe enefficients of the function ‘with appropriate econometric techniques (presented in Chaps. @ to 8). 3. Evaluation of the estirsated coefficients of the Funetion om the basis of economic, statistical, and eeanamettic criteria, EXAMPLE 8. The frat stage in coummctic sorcarch wu cannumption theuny ie alate the than i exp stochastic equation form, as in Eq. (1), with the expectation that fy > 0 (ae, at Ty = 0, C>Oas people dssave fandjor harrow) and 0. = ET secand sfage involves che collstion of data on consumption sxpondicure and Sispacable income an estimation of F.(1 1). The died rage in econometric research involves (1) hing ose if theesticnated vale of be O aed by 1: Ohdeterminine a “satisfactory” peapartion af the variation ia Cs explained by changes in Vand i hy andy ae “statistically significant at acceptable level [se Prob. 1.15) and See, 5.2 and (3) testing to see the assumptions of the basi regression model ae satistied oF. if not, how to correct for violations, If the estimated relationship does mot pass these tests, the hypothesized relationship mnsst be modified and reestimated until a satisfactory estimated consumption relationship is ahseved. Solved Problems ‘THE NATURE OF STATISTICS 1.4 What is the purpose and function of (a) The ficld of study of statistics? (by Descriptive sta fisties? (03 Inferential statistics? (@) Statistics is the body of procedures and teriques sed to calle, presen, and analyze data on wish totinedcesions Inte ice ofuncermny or incomplctsiformation. Sac ana Isused voy in practically every profeson. The economist weit to test the eicency of akernative prodution techniques the buitesaperoh ay nae ft test the pot eng or package that mses aes the sociolosist to analy the res of a drug habitation program; the instil peychologit to tczorina borkart ruepouit to plant etre sree the pola etn 1 foes Woting pale the physi tows the efciveness of new drug; the chemist to produce cheoper frien and so (0) Design stasis suhasicd «bik of dat with nt eek W fai tha cas tne whole data. abo telorsfo the prosetation of boy of dala he rm of tabs, chars graphs another Forme of graphic dp CHAP. 1) INTRODUCTION 3 12 13 La (©) Infeccotial statistics dhoth estimation and hypothesis testing) refers to the drawing of generalizations about the properties of the whole (called a prpufarion} trom the epeific or a eample drawn from the population. Inferential statistics thus involves inductive reasoning. (This is to be eoatrasted with rletuetive reasoning, which asesibes properties to the spestie starting with the whe) (a) Ate descriptive or inferential statistics mone important today? (B) What is the importance of a representative sample in statistical inference? (c) Why is probability theory required? (a) Statics started as a purely descriptive scence, but it grew émto a powerful too! of decision making as its inferential branch was seveloped. Modern statistical analysis vefers peicnaily 4 inferential o¢ inductive statistics, However, declucive and insuctive statistics are complementary, We must stusly hhow to generate carnplae from poptlationc before we can beara to gooeraline from expe to popati (Uy Gis oidee for statistical aatnnve tir be val iL aimst be based om a siaiple that Fully safety the characterstis and properties of the population feom which is drawn. A sepeesentative sample is soriced by random rampling, whershy ach sloment of the population hae an aqual chance of baing Included! in the sample (see See. 4.1). (0) She the puossbiliy uf eis enink i staintialinfercem, elinaten ye teks oa pupa prone y characteristic are given together with the chance or probabikty of being wrong. Thus probability thoory i an essential slomont in statistoal infarc How can the manager of a firm producing lightbulbs summarize and describe to a board meeting the results of testing the Hife of a sample of 100 Tightbwihs produced by the fin? Providing the (raw) data on the le ofeach in the saeple of 100 Kightbulbs prod oc by’ the firm would be very inconvenient and ime-consursing or tne tard members to evaluate. Instead, the manager might summarize the data by indicating that the average life ofthe bulbs tested is 360h and that 95% of the bulbs tested tea Uetest 320 nd A001, Byung Ui Une nana io ridin te wicoen fini sativs (ee average if and the spread i te average lif) that characterize the life ofthe 100 bulls tested. ‘The manager taka might want to describe the data with a table or chart indicating the murber or proportion of bulbs tested that lasted within cach IO-Nclasification, Such a tabular oF araphic representation of the data is abso seny usefil for gaining a quick oversiew of the data stimmaririna and deseabing the data inthe ways indicated, the manages is engaging in deseriptive statistics. It should be noted that descriptive statistics can be used to summarize and describe any how of data, whether it sa sample (as above) ora population when all the elements of the population arc known and its characterstics can Be calculated) (a) Why may the manager in Prob, 1.3 want to engage in statistical inference? (6) What would this involve and require? a} Quality control requires that the manager have a Fairly good idea about the average life and the spread ‘in the life of the lightbulbs produced by the firm. However, testing all rhe lightbulbs produced would destroy the entire output of the firm. Even when isting does not destroy ths produc, testing the entire output is usually probibitively expensive and famecconsurring. The usual procedure i ta take asample ff the output and infer the properties and characteratice of the entire ousput (population) from the conesponding charsetenstiss of a sample drawn [rom the population, (6) Statistical inference requires frst of ll that the sample be repsesentative of the population being sampled. If the frm produces lightbulbs in diferent plants, with more than one workshift, and ‘with raw materials from different suppliers, these must be represented in the sample in the propertion in which they contribute to the tolal output of the firm, From the average life and spread in the if of the bulbs in the sample, the fim manager might estimate, with 98% probability of being correct and 1% probability of being wrong, the average Ife oC all the lightbulhs produced hy the frm to be berween S20and 400 {oee Sec. 4.3). Instead, the manager may use the sample information ta test, with 95%. feovtabaity of being corvest and £0% plokublity of bung Weomg, that the average life of the population of all the bulbs produced by the firm is greaier than 320h (see See. 5.2) In estimating or testing the average fora population from sample information, the manager engaping im ctateical inference INTRODUCTION lomar. t STATISTICS AND ECONOMETRICS. 1s 7 What is meant by (a) Econometrics? (b) Regression analysis? {c) Disturbance or error term? () Simultaneous-equations models? («) Exonomeiries is the integration of economic theary, mathematics, and statisical techniques for the ‘parpov: of teting hypothotor about aconamic phenomena, extimaling eveliconts af economic relation ‘tips, nd forecasting oe prodieting funure values of economic Variables or phenomena. Econometrics is suhtivided into theoretical and applied econometrics Thvweetical accaowmieenis neters to the methaels for measurement of economic relationships in general. Applied econameteics examines the problems encountered and the findings in particular elds af economics, such as demand theory, peaduction, investment, consumption, and other fells of applied eeonomie rewearch, In any case, econometrics is partly art and portly a ssience, because oRen the intuition and good judgment of the ssonometrician plays a crosial role (6) Regression analysis studios the causal relationship between one economic variable to be explained (the Aepenlent variable) und one ar mare independent or explanutary variables, When there is only one ‘iwdependent of explanatory variable, We have simple regression. la the wore usual case of tase that ‘one independent or explanatory variable, we have mullple regression. (©) A frandom) disturbance or error must be included in the exact relationships postulated by economis theory and mathematical esonamnis in order ts make them stochastic (ic, in onder 1 reflect the fact that in the real world, ccanomic reathonships among econoraic variables are inexael and somewhat ertatic). (df) Simultaneous equations models refer to relationships among economic variables expressed with mone than one equation and such that the ssonomic ¥artables in the various equations imeract, —Simuta- ncous-oquations rodclsare the most coraplex aspect of economnetsics and are discussed in Chap. 10. (a) What are the unctions of coonometnes? (0) What aspects ol ecomomets (and other social sciences) make it basically different (rom most physical sciences? (4) Beonometies has basically theee closely interelated functions. The first sto west coonomie theories or hnypothesee. For example, is sansuraption directly relied tn income? Ts the quantity demanded of a commodity inversely related to its price? The second function of econometrics is to provide numeral estimates of the coefficients of economic relationships. These are csscntial in decision making. For ‘xample,a government polieyemaker needs to have an aocurate estimate ofthe svefisient of the relation- ship between consumption and income in order to determine the stimulating (i. the multiplier) effect fof proposed tax reduction. A manager needs to know i a price reduction increases or reduces the total sales rexenues of the firm and, if so, by how mach, The thd function of econometrics is the foresasting of events. Is, 109, i Mesessary a. orver for polkeymakers to faRe apprOprae cArrecteNe action ifthe ratz of unemployment or inflation is predicted 10 rise an the future. () There are two basse differences between econometrics (and other socialsciences) on one handl, and most physical sciences feuck as physic) oa the other. One is that (as pointed out eather relationsheps among economic vanabies are ensxact ars somewhat erratic. Ihe sesond 1s that most economis [Phenomena coeur sontemporancensy, s that Iahoratery experiments eannot be conducted, These differences require special methods efanalyss (sich as the incisslon ata disturbance or error sre with the cxaet relationships postulated by economic theory! and multivariate analysis (each ae multiple regression analysis), Ths Istts issltss the affect of sash indspondent or explanatory variable on the dependent variable in the face ef contemporancous change in all explanatory variables, In what way and for what purpose are (a) economic theory, (S) mathematics, and (¢) statistical analysis combined to form the field of study of econometrics? |w)Peonometes presupposes the existence ofa body of economic theorkes or hypotheses requlring texting. [the variables suggested ly econarnic theory do nat pravade a satisfactory explanation, the researcher nny copra it alternative rsialions anid Vaniables suggested by paved Lats oe carols theories, In this may, economeiti research can lead to the acceptance. ection, and reformulation of sconomie theories CHAP. 1) INTRODUCTION = 18 (6) Mathematics is used to express the verbal statements of economic thearies in mathematical form, expresiing an exact or daterminiatic fanciional relationship between the dependent and one oF more independent or explanatory variables, (0) Statistical analysis applies appeopiate Hla oes to etic the ncaa aud uomenpesiaee tal clation ships among economic variables by utling relevant economic data and evaluating the results, Wht justifies he inlusion af disturbance or ceeur (erty in regrension analysis? ‘The inclusion of a frandom) disturbance or estor teem (with wellatined probabilistic properties) is required in regression analysis for three important reasons, Firs, sings the purpose of theory isto generalize and simplify, ceonomie relationships usually inlude only the most important farces at work, This means that nuimeraus other variables with slight ane repr effects are not ineluded. The error term can be viewed a representing the act elect ofthis large number of small and irregular forees at work. Second, the Imctusion of the error ferm ean be JUsihed -oTder to take mer onsiceration the Net eect oF possAbkesrTaT: im measuting the dependent variable, ar variable being explained. Finally, sinee human behavior usally Gifers im random way under idcnlical circumstances dhe disturbs or ceror term eam be uoed wo expluse this inherently random human behavior, ‘This ersor term thus allows for inéiritual rarsiomn deviations from ths enact and deterministic relationships postlated by economic theory ng mathematical economics, Consumer demand theory states tat the quantity demanded of'a commodity Dy isa function of, or depends an. ils price Py. consumer's income and the price of otber (related) commodities, say, commodity Zi, Fz). Assuming that consumers" rast remain constant during the period. of analysis, tate the preceding theory in (a) spociic or explicit incar form or equation and (6) in stochastic form. (c} Which are the costtcients to be estimated? What are they called? @ Dy=By4b\Pr by) + bP sn oo Dra hy thiPrth¥ ther te a (e) The cooticents to be estimated are by by, and by, They ar called paranster THE METHODOLOGY OF ECONOMETRICS. 110 With refercnee to the consumer demand theory in Prob. 1.9, indicate (a) what the frst step is in econometric research and (4) what the a priori theoretical expectations are of the sign and possible size of the parameters of the demand funetion given by Eq. (1-4) (a) The first step in econometric analy is to express the theory of consumer demand in stochastic ‘equation form, as in Eq. (14), and indicate the a priori theoretical espestations about the sign and possibly the size ofthe parameters of the Function. (6) Consumer demand theory postulates that in Eq. (1.4), < 0 (indicating that price and quantity ase inversely elated, by = 0 ifthe commodity is a normal pood (incieating thax consumers purchase more of the commodity at bigher incomes), by =i X and Z are substitutes, and by <4 X and Z are complements Indicate the sccond stage in econometric research (a) in general and (4) with reference to the demand function specified by Eq. (1.4, (a) The second stage in econometric research involves the caleetion of data on the dependent variable and ‘on each of the independent or explanatory variables of the model and utilising these: data for the ctipivieal eitimatlon of the pacaineters of the wiodel. The i URUslly davse with multiple regression analysis (diseussed in Chap. 7) (oy tis wrdee to stints the desman fection given by By. (1), data must be solleeted ou (Up the «quantity demanded of commodity ¥ bby consumers, (2) the prive of ¥Y, (3) consumer's incomes. and. (8) the price of commodity 7 per unit of time (ie, par day, month, oF yeas) and aver a number 12 INTRODUCTION lomar. t of days. months, or years. Bata on Py. Vand Py are then regressed against data on Diy and estimates of parameters by by bane By obtained, How doos the iype of data required to estimate the demand function specifiod by Fa, (1.4) difler fear the type of ata eat wail Be teqired ta estimate the consumption function for a gecsp af families at ane pons in rie In onder to estimate the demand function given by Eq, (1.4), numencal values of the vanables are required over a period of time. Fer example, ifwe want to estimate the demand finetion for coffee, we need she numerical value ofthe quantity of coffee demanded, say, per yeas, over a numberof years say, ram 1960 ter 1980, Similarly, we need data om the average peice of colle, canstmers income, and the ries, of say, tea (a: aubatitute for coffec} per gear from 1960 to 1980, sta that give nimerical wales for the warinbles of 8 function from pertod to period are walled tinw-serics data. However 1 estimate the consumption funtion for 4 group of Families at one point in tec. we ced crorssectional data (L., numerical valucs foe the consumption expenditures and dispacable incomes of each Family in the group at particular point in time, say, mn 192 What is meant by (a) Lne third stage am econometnc analysts! (b) A pnori theoretical en teria? (c) Statistical criteria? (al Feonometric criteria? (e) The forecasting ability of the moet? he evahvalion of the estimated matel on the sis of ty of the model (6) Thee print economic criteria fer to the sign and sas af the parometers of the model paatulated by csonomie thoory. Ifthe estimated cooflcicats do wot conform to those postulated, the mods! must be revised or rgected (©) The statistical crteria eefer ta (1) the proportion of variation inthe dependcat variable “explained” bby changes inthe independent or explanatory wariables and (2) verifention that the dispersion or spread of eich estimated evellicicot around the true parameter is suficiently microw Lo give us "eon ‘dence in the estimates The ecowomeerc criteria reler to test that the assumptions of the basic regression model, and particu: larly these about the disturbance or 0 (WT W a normal good), and wf; > O ( Za substitute for A, 28 postulated by demand theory, () The statistical criteria are satisfied only if a “high” proportion of the variation in Dy. ovce time is “explained by changes in Py, Vand P, and ifthe dispersion of etmated 4, an By aro the {rue parameters are “slficiently narzoww.” There is no generally atcepled answer as to what sa “high” ‘proportion ofthe variation in Dy “explained” by Fy, P.and Py. However, beause of eommon trends in imesseries data, we would expect more than $0 0 70% of the varlation In the dependent varlabie 10 bbe explained by the independent or explanatory variables for the model to be judged satisactory. ‘Silty, in eee fit sich estnasted cacnnt to La Staisacally signi” wre Wahl eae the Alspersion of cach estimaicd cosflcient about the true paramcier measured by is standard devi as Seo, 21) to be panoraly lee than half the ertimatad salve of the eoalficiont CHAP. 1) INTRODUCTION 7 Las (0) The econometric criteria are used to determine i the assumptions of the ceanometric methods used are catiled inthe ecimation of the demand fnetion of Eq. (11) Only i thew aerumptione are ratified ill the estimated coefficients have the desirable properties of unbiasedness, consistency, fficeney, and sa forth (s98 See. 64 Qe way to test the forcxasting ability of the demand model given by Eq. (1.4) isto use the estimated Faction to predict the value of Diy For a periad mat included in the cample and checking that this predict! value s "sufficiently close tothe actual observed value of Dy foe that perise 15 stages Of econometric research 4 Mathematical riod 1 oonomettic (stochastic) model Stage 2: Collection of approprints data 4 Entimation of the parameter of the model ‘Stage 4: Evaluation of the model om the basis af sconemie, atistical, and seonometric critecia I C74 Accent theory Reject theory Revise thenry if compatible if incompatible if incompatible with data wits data wwith data L Prediction Confrontation of revised theory vont new dana Supplementary Problems THE NATURE OF STATISTICS ut (a) To hich field of study is statistical analysis important? (6 What are the most important Functions of Sescripeive statistics? (¢} What is che most important function of inferential statistics? Ars. (a) Toccanomics, business, and other social and physical sciences (By Summarizing and describing | body of data. (0) Drawing inferences abst ths characteristics of 4 population from the comesponding characteristics of a sample drawn from the popallation. (a [s statistical inference associated with deductive or inductive reasoning? (8) What are the conditions required in order fr statistical inference to be wali ‘Ans. (a) Unduetive seasoning (b) A representative sample and probabiity theory STATISTICS AND ECONOMETRICS [Express in che form oP an explicit Incr equation the statement that she Level of investment sponding F bx inversely related 10 rate of interest R dn J y+ byR with by postulated to be negative us INTRODUCTION lomar. t 1.4 What is the answer to Prob, 18 an example of? dng. Aneconomne theory exproted io {enact or deerennitis) evatheratia! form 1.2m Express Bq. (1. in stochastic form, ss. Tet 4b Ro U6 1.21 Why isa stochastic form required in econometric analysis? sing. Becavse the rbationshis among economic variables are inexact and somewhat erratic as opposed to the exact and deterministic relationships postulated by economic theory und matherutical economics THE METHODOLOGY OF ECONOMETRICS 1.2% What are wager (a) ome, (4) two, and (4) thies in oomometaie research? Ans. (a) Spesiication ofthe theory in stochastic equation form and ification of the exposted signs and posse since of estimated paramtrs (8) Collertinn of dats on the warnbles ofthe movil ana timation Othe coofcients ofthe Function. (ch Eeonoeni, statistical, and cconometic evaluation ofthe estsmated rameters 1.28 What isthe frst stage of esonometic analysis for the investment theory in Prob. 118? Ans. Stating the theory iv the Form of Ea. (2.6) and pricing by ~ 0 1.24 What is the sosond stage in esonometric analysis forthe investment theory in Prob. 1.18 Ans, Colfsstie of time-series data on / and and estmation of Ea. (8) 1.26 What is the third stage of ssonometic analysis for the investment theory in Prob, 18? dus, Determination thatthe estimated coeficient of 8, ~ 0, that an “adsquate” proportion of the variation in Fover ome 6 explatned” by changes in R, that 6) is“satistically significant at eastornary levels” and that the econornetsic assumptions of the madel ate satistied Descriptive Statistics 2 FREQUENCY DISTRIBUTIONS frequency distribution, This breaks upp s the number of abservations in each class. The number of sfisiribution is obtained by dividing the number The sum of the felative frequencies equals |. A histogram isa bar graph of a frequency distribution, where classes are measured along the horizontal axis and frequencies along the vertical axis. A frequency polygom isa line raph of a froquency distribution resulting from joining the frequeney of each class plotted at the class midpoint, A. cumecative frequemey distribuste cach class, the total number of observations in all classes up to and including that class. W. this gives a dlstribution curve, or ogive tis often useful 1o organize or arr the data into: groups ar classes anal sh classes is usually between Sand 15. A relative frequenc plott EXAMPLE 4. A student rescived the following grads (measured from 0 to 10).on the 10 quizses he took during 3 semester: 6,7, 6,8, 5, 7,6, 9, 10, and 6, These grades can be arranged into frequency distributions asin Table 3 | and shown graphically as in Fig. 2-1 Table 21 Freqsensy Distributions of Grades Grades ‘Absolute Frequency Relative Frequency t 1 ‘ oa 2 2 U l L el o eo io Lo Fig. 24 9 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 10 DESCRIPTIVE STATISTICS [oHar, 2 EXAMPLE 2. The cans in a sample of 0.cans of fruit contain net weights of frit ranging fram 19:3 to 20.90%. a5 piven in Table 22, If we want to group there data into & claster, wo git eforr éntoreak of O.Fox [(2L0-192)/6=03ed. The weights given in Table 2. can be arranged into the frequency distributions gven in Table 9 ¥and chown praphically in Fig. 9-9 ‘Vale 2.2 Net Weaghe i= Ounces of Feat 7 199 m2 199 m0 26 1 m4 1D 20d 201 9S MY M3 2S 199 WO He 19 198 ‘Table 2.3. Frequency Disertnutlon of Wels rr 192194 195197 19200 Dotan ma me mo7209 Panel A toga ae: Reve epee gain ® é «| a. a z ‘ a ea Panel ive anal; Prequeney peiyzoa " i 3 ‘weghie a ciate Fig 22 cua, DESCRIPTIVE STATISTICS u 2.2 MEASURES OF CENTRAL TENDENCY, Central tendency refers to the location of a distribution. The most important measures of central tendency are (1) the mean, (2) the median, and (3) the made. We will be measuring these for Populations (i... the collection of all the elements that we ars describing) and for samples drawn from populations, as well a Tor srouped and ungrouped data 1. The artiimietic mean of average, of a population is represented by ys (the Greek letter muy and. fora sample, by F (read “X bar"). For ugrouped data, ys and Y are calculated by the following, formulas: St am THEE (res) ¥ * where OX refers to the cum of all the obsarvations, while Nand m refer to the number of observations in the population and sample, respectively. For groped data, ye and Y are caleulated by oe and H (22a,0) +e where 7 roe to the sum of the Trequeney of exeh elass mes the chs mapornt 2. The median for uogrouped data is the-valuc of the middle item when all the tems are arranged in either ascending oF descending order in terms. of values: N4I Median = the ( im item in the data array 4) where’ refers to the number of items in the population (n for a sample). The median for _groupedt dava is given by the formula nfl—F Median = L425 Se (4) Whore J =lower limit of the median class (i¢., the elass that contains the middle item of the distribution = the number of abservations in the data set F = sum of the frequencies up to but not including the median class Jue = frequency of the median elas ¢= width of the elass interval 3. The mode is the value that occurs most frequently in the data set. For grouped data, we obtain (25) Where J. = lower hmit of the modal class (2. the class with the greatest Irequency) dy = frequency of the modal class minus the frequency of the previous class dy = frequency of the modal class minus the frequency of the following class = width of the eas rv ‘The mean is the mort commonly used measure of central tendency. The mean, however, is affected by extreme values im the data set, while the median and the mode are mot. Other meusures of central tendency are the weighted moan, the genmerric- moan, and the harmonic mean (soe Peobs. 2.7 to 29), 2 DESCRIPTIVE STATISTICS lemar. 2 EXAMPLE 3. The mean grade for the population en the 10 quizess given in Example 1, sing the Formula for nmogrouped data, ie LX _LO+TH64 8454746494106 _ 70 we 10 @ ‘To find the median forthe ungrouped data, ve fist arrange the 10 grades in ascending ovder: 5, 6,6, 6,6, 7.7.8, 1, Then-we find the grade of the (¥ + 1)/20r (10-+ 11/2 = $.Sth itr, Thus the median is the average ofthe Sth ‘nd 6th item in the array. ar (6-+ 72 =63. The made for the ungrouped data is 6 (he value that occurs most Frequently in the data set} sins EXAMPLE 4, We can estimate rhe mean for the grouped data given in Table 2.3 with the aid of Table 2.4 Ste 2 at [ns calcuration cous ne simpined coming, (8 Hrobs 2.0 Y= M08 0% Table 24 Caleulatlon of the Sample Mean forthe Data: in Table 2.3 Chass Frequency Weight, on | Midpoiae * pe 1294 193 193 195.197 196 3a ret) 19 8 vn 20..-203 m2 4 sas, m4 208 as 3 ee 20.1209 208 2 416 wid Te = 98402676 = oar Mod = 198+ Unbere £ = 19.8 = lower limit of the median class tic. the 198-2040 class which contains the 10th and 1th obscevations) f= 20 number of observations or terns r sum of frequensies up to bet not inchading the median class fre = 8= frequency of the median class 603 — width of class intersal Similarly Modest n= 8s¥o ssa 9tec As noted in Prob. 2.4, the mean, snadian, and mode for grouped data are estimates used when only the grouped data ble-ar to reduce calculations with a large wngrowped data ext cua, DESCRIPTIVE STATISTICS B 23 MEASURES OF DISPERSION Dispersion refers to the variability or spread in the data. The most important measures of disper- sion are (1) the aveeage deviation, (2) the variance, and (3) the standard deviation, We will mea sure these for populations and samples, as well as for grouped and unerouped data. |. Auerage devaateon. The average devianion (AD), also called the mean atieolute deveatton (MATD}, is given by ‘for populations (26a) nat for samples (ey where the two: vertical bars indicate the absolute value. or the walues oenitting the sign, with the other symbok having dhe same meaning as in See. 2.3. For grouped data ap LAX =o for populations (27a) sot ap-E2™=T pe ampts em where f refers to the frequency of each class and to the class middpoints, Variance. The population variance o? (the Greck letter sigina squared) and the sample variance # for ungrouped data are given by > Step) rw-xy 5 oe ond gf ES (28a) For grouped data eB od eo (290. 3, Standard deviasion, The population standard deviation ¢ and sample standard deviation s are the positive square root of their recpective variances, For ungrouped data a poy [ou = ul? uy - FF ae a Eni gg y= YEAS (2a. The most widely used measure of (absolutey dispersion is the standard deviation, Other measures (besides the variance und average deviation) are he range, Uhe Orrerguarcle range, and the guarate deviation (see Probs, 2.11 and 2.12). 4. The conffcient of variation (8°) measures relative dispersion: and (2.100) For grouped data or populations (2.120) and v=4 for samples (2.12) EXAMPLE 8. The average deviation, variance, standard deviation, and coefisint of variation For the ungrouped ata givon io Kxample 1 can be found with the aid of Table 2.5 (je = 7; eae Example 3k “ DESCRIPTIVE STATISTICS lomar. 2 "Palle 2:5 Custos he Dut bn Examgie 1 Grade | Yawn [Nal Wen? 6 |7 T 7 7 7 |? o ° ° 6 |? “1 1 1 s |r \ 1 1 5 7 2 4 1/7 ° 6 6 |? 1 1 9 |? 2 4 w |? 3 3 ’ 6 |? “1 1 1 Elteal=0 | DW am EXAMPLE 6. The average deviation, variance, standard deviation, and eoeficient of variation for the frequency distribution of weights (grouped data) piven in Table 2,3 can be found with the aid of Table 26 (1° — 2008 a; see Brample O31802 225 9 star quid [ELL OY POS _ Vous = 0.84202 # 0.3982 02 ae 0.0196, or 1.56! Yo War oz * [Note that in the formula for ? and ¢,a— I rather than m is used inthe denominator ieee Prob, 2.16 forthe reason} [Pr the fiers fv oor a Biv tis ssl thers may he esi that wl sey scars for a large body of data (soe Probs. 2.17 to 2.19 for their derivation and application} Table 246 Calculations om the Data in Table 24 a a we | edna | ER] efi} eam ae} Towa | rn we ie some | 0 4 |e | ome] on as ones manana | ans + | am | nae | one as De ava | 0 2 | aw | oz] on La si Eysss> Lae Foe cuar, DESCRIPTIVE STATISTICS 1s 24 SHAPE OF FREQUENCY DISTRIBUTIONS: The shape ofa distribution refers to. (1) its symmetry oF lack of it (skewness) and (2) its peak: edness (kurtosis) 1. Skewness. A distribution has zero skewness if it is symmetrical about its mean. For a symmetrical (unimodal) distribution, the mean, median, and mode are equal, A distribution is positively skewed if the right tail is longer. Then, mean > median > mode. distribution is neastvely skewed if the left tail is longer. Then, mede > median > mean (see Fig. 2-3). Mean Mode Mean ae on Pu A Syma Pama Rose shew na ent avd fg 23 Skewness can bo measured by the Pearson coeficien of skenness: sx = %A= met) for populations 23a) and se Em bop samples (2b) Monn and variance ary the first and second moments ofa distribution, respectively, Skowmeas an also be measured by the third moment [the numerator of Eq. (2.14a.b)] divided by the cube of the standard deviation: sea ZL or popattons (2 and SELEY compte eum For symmetric distributions, Sk = 0. 2 Kurtosts, A peaked curve is called leprolerric, as opposed to a flat one (plarykurric), relative te fone that is mesokurtic sce Fig. 2-4). Kurtosis can be measured by the fouth emament [the numerator of Eg. (2.154.01] divided by the standard deviation raised to the fourth power. The kurtosis for a mesokurtic curve is 3. Lepeokutic Meese 16 DESCRIPTIVE STATISTICS Eset for populations (2.154) : ana E LUT por sampes 2.090) 3. Joint moment, The comovement of two separate distributions can be measured by covariance: er Tyr -F) year) N N E(Y- WF) Ey) eo(¥, 1) — XY for populations cov(¥ Y= YF for samples ‘A positive covariance indicates that 1" and ¥ move together in rel negative covariance Indicates that they move In opposlie directions. jon to their means. A EMAMPLE 7, We cin fl the Poissons coslfict of keane fu the grinds givin Cosas 1 Ry nag ye 5 (see Example 3), and o = 18 (se Example 5): Heal 6a sie 2 Similarly, by using V = 20.08 o2, med = 2kox sce Example 4), and Pearson coefficient of skewness for the frequency distribution of weights 347 — med) _ 30 Sk 239. (see Example 6), we can fod the Table 2.3 as follows: Sk= 28015 toe Fi Le), For kurtosis, see Prob, 223, Solved Problems FREQUENCY DISTRIBUTIONS: ZL Table 2.7 gives the grades on a quis for a cass of 40 students, (a) Arrange these grades éraw data set) into an array from the lowest grade to the highest grade. (B) Construct a table showing class Intervals and class midpolats and the atsolute, ratlve, and cumulative frequencles for each grade, (@) Present the data in the form of a histogram. relative-frequency histogram, frequency polygon, and ogive, Taille 2.7 Grades on u Quite for x Class of 40 Statens (a) See Table 28. Table 28 Data Array of Grades > 2 2s 3 3 @ @ @ @ 4 5 5 5 § 5 6 6 6 6 Boe FF a 8 os 8 8 9 9 9 9 wo cuar. 2) DESCRIPTIVE STATISTICS " () See Table 2:9 Note that sinos we ars dealing here with discrete data (is. data expressed in whole snambere), we weed the actual grades ae the clues misdpoints. ‘Table 29 Frequency Distribution of Grades ‘Class Absolute | Relative Grade | Midpomt | Frequency | Frequency isa 2 3 ons > 2sh4 3 3 aus 8 asada 4 3 0133 a asa 5 5 as 6 5564 6 6 4.150 2 674 1 8 200 ” 1884 8 4 100 ™ 808 9 4 4.100 8 9s 10 uso 40 10 (0) See Fig. 25. Panel A: Mistogears Panel B: Relative Frequency Dicribution Fegan Relate fregeney : , ale we + Be i i i Panel ©: Frequency polbigon i ? i Gendee 2 DESCRIPTIVE STATISTICS lomar. 2 A sample of 25 workers in plant receive the hourly wages given in Table 2.10, a) Arrange thet caw data into ai aivay fiom the lowest to the highest wage, (2) Group the ata isto classes. (o} Present the data in the form of a histogram, relative-frequency histogram, frequency polygon, aud ogive. Table £10 Hoarly Wages is Dottars TAS M7e O8F 998 400 410 435 RSS ORE nme Sad 390 426 378 39S gOS ame 41S 380 ans 388 393 40d 4a dos (See Table 2.11 300 68 7S 378 380 SRS BAS ORAS 395 398 198 3.96 400 405 ans 405 406 48 40 413 48 42S 4.26 (@)Thshourly woges in Table 2.10 range from $3.55 to $4.25, This can the conveniently subslvided imio ® cqwal classes of $0.10 cach. ‘That is, {$8.30 ~ £3.50]/8 = 8080/8 = 80.1, Note that the range was extended from 3,50 to $4.30 s9 thatthe lowest wags, $3.55, falls win the lowest cass and the largest wage, $4.26, falls widhiv the largest class. Tt is also convenient (and needed for Plotting the frequency polygon} to find the class mark or midpoint of each class These are shou in Table 2 ‘Table 212 Froqucacy Distribution of Wages [Hourly Wage] Class ‘Absolute | Relative | Cumulative 5 ‘Mispoint, $ | Frequency | Froqueney | Frequcr ‘na sa) 360) 3.69 3 o.08 370-3.79 o.00 330 3,89 a0 4004.09 om 410-419 an 420-429 uns Loo) (6) See Figs 26 Armley of ttn thesrsive slo plo the eunvulalve esa wpe 53.595, 3.695,, 3.795, and so on (so asto include the upper limit of each class). The-values 53.595, 3.695, 3795, etc. are ‘often refern 10.45 the clase hoamdaricyaresct its, Moke hat the clays midints are obtained by ‘adding together the lower and upp class houndarizsaand divideng by 2. Forename, the second class smicposnt se goven by’ (3.508 4 3.688)/2 — 7.2002 — 3.65 (nee Table 2.12). cuar, DESCRIPTIVE STATISTICS ro Panel Ac Hisgram Pan Neate rogue sention gE : 1 fe =. ‘ gon ass = 3 olay Precl De Ogre MEASURES OF CENTRAL TENDENCY 24 Find the mean, median, and mode (a) for the grades om the quiz for the class of 40 students given in Table 2.7 (the ungrouped data) and (6) for the grouped data of these grades given in Table 29, (a) Since we are dealing with aif grades, we want the population smear DN TES 46445 MO ey x cr “ay = SPH ‘That ix, jb obtained by adding together all the 40 grades given in Table 27 and dividing by 40 [the three centered dats flips) were pat i 19 sNoid repeating the 40 values in Table 2.7] ‘The median i siren by the values of the [(W 4 1)/2th tem in the data array in Table 28 Therefore, the median ix the vale of the (40-4 1)/3 oF 20.5th, oF the average ofthe 20th and 2Ist item. Since they are both qual tn 6, the metinn is, The mind is 7 (the vale that qssare mot frequently in the ata set) (6) We can find the paputarian mean for the grouped data in Table 2.9 with the aid of Table 2.13 This isthe some mean we found for the ungronped data, Note that the som of the frequencies, $f. equals the number of observations in the population, N, and EN = 5°70. The median for the grouped data of Table 2.13 is given by = 554067 =617 20 mM DESCRIPTIVE STATISTICS lomar. 2 whore L.— $.5— lower limit of the median class (ie the 5,564 elass, whieh contains the 04h stad 298 obser vate) = 40 = number of observations F = 16 =su of observations up to but aot including the enedian. cass Frequency of the median class seith of class interval ‘The made for the grouped data in Table 2.13 i given by +74 avd Where £= 6.5= lower limit of the modal clas fue, the 6.5-7.4 class with the highest frequency of 8) i —2 = frequency of the maa clas, 8, mins the Frequency of the previons clas, 6 sh 4— frequency of the modal clas, 0, minus the frequency of the following class, # = L = wiih of the olass interval Note that while the mcan calculated from the grouped data is in this case identical to the mean saloslated for the ungrouped data, the median and the mode are only (goad) approximations ‘Table 2.13 Cakulaton of the Population Mam forthe Groped! Data in Table 29 Grade [Class Midpoint x] Frequency aa Z 2534 3 3544 4 as Sa s 5$64 6 5ST T 7584 ‘ es o4 ® 95-1 Find the moan, median, and mode (a) for the cample of hourly wage received by the 25 workers recorded in Table 2.10 (the ungrouped data) and (d) for the grouped data of these wages given in Table 2.13, oe yp EX _ sas 4 sizes 9.68 SEAM or S98 8 Medion = $3.95 the value of the fn 1/3 (25 | 1) = 13th fe in the data array in Table 2.11} Moge ~ §3.95 and 54.05, since there are three of each of these wages, Thus the distriution is iste Ge at hs tuo ates (6) We can dnd the sarmple mean for the grouped data im Table 2.12 with the aif af Table 214: Note that in this ease 5 fil = 98,75 # SO’ ='998.65 (found in part a) since the average of the cobrervation: in sack clace ic not equal ta the clacs midpoint for all classes [ar im Prob, 2.38 cur, 25 1 DESCRIPTIVE STATISTICS 2 ‘Thus T cabcuated from the grouped data is only a very good approximation for the trie value of F calculated for the ungrouped data. nthe neal workd, we often feave only the grouped data, or if we have a very lasge Body of usgeouped data, i will save on calentions to estimate the meat by fest cermping the atm Te 1 compared with the true median of $4.95 found from the ungrouped data (sce part). age HOT 5H Mode = 1+ (0.10) = $400 + 80.028 = $4028 or S403 1s compared ‘ith the true modes of 5395 andl $4.05 found from the ungrouped data (see part a Swvaetinin Un re senor given asthe anidpwnnt of te wa tas ‘Table 2.14 Caleutation of the Sample Mean for the Grouped Data in xt = Compare the advantages and disadvantages of (a) the mean, (6) the median, and (c} the mode as measures of central tendency. (ah Te aug Une vnc ase CF iC Gains an sleet by vinhslly everyones (2H lle observations in the data are taken into account, and (31 it & used in performing many other statistical procedures and tests. The disadvantages of the mean are ()) itis afested by xtreme Values, (2) it is time-consuming to compute for a large body of ungrouped data, and (3) if cannot be calculated shen the lst clate of grouped data ie opemended (Le, it inchudes the lower limit of the last class “and aver”) G8) The ausaniages of themmalias a's €1) ibis uw alfeted by cuisine valuss, (2) i iscaily netstat (Gc. hal the data are smalles than the median-and half are greater, and (3) it ean be calculated even whan the Inst olast 9 open-ended and shen the data are qualitative rather than quantitative, The slsadvantages of the mean ars (1) it does not use much of the information available, and (2) ib recpires that obearsations be arranged into an amray, which ie time consuming for a Harge badly of ‘ungrouped data, (0) The enlvantayss wf the enous are the sans as theme For iis snsaion, The analsantagss uf ahs mode are (J) as for the median, the mde docs not use much of the information available, and (2) sometimes no walns of the data is repested mons than ones, ao that there is no mode, while al other times there may be many maces. In general, the mican i the most frequently used measure of| central tendency and the mode ic the beat wiod 26 aT 28 DESCRIPTIVE STATISTICS lomar. 2 Find the mean forthe grouped data in Table 2.12 by coving (ie, by assigning the value of x = 0 te the tho Sth esses ai — —1, yo = —2, eRe eae lower elas and j= Hy jem 2, oe 80 cach Larger class and thon using the formula Terst ce, (210) where Xp is the midpoint ofthe class assigned j = O and cis the width of the hiss intervals}, Ses Table 2.4, ‘Table 2.18 Calculation of the Sample Mean by Coding forthe Grouped Data in Tabi 212 Waously Wage, » | Clas wapomet, > | Codey | Prequeney ) 7 ee 3s 3 T se 360 aes 3 : mM 2 ars 1 2 380-389 3.88 0 4 390-399 sas 1 : 400-409 408 2 6 400-419 aus 3 3 420-43 4 z = 5395 Et sassy Sn in) sans “F for the grouped data formed by coding is identical to that found in Prob. 2.48 without coding. Coding eliminates the problem of having to deal with possibly large and inconvenient class rmidpoints; thus it may simplify the calculations. A firm pays a wage of 54 per hour to its 25 unskilled workers, $6 to its 15 semiskillod workers, and 3810 is IU skilled workers, What is the wergiied arerage, oF weighted meu, wage pais by this fim? In find the weightet mean, ox weighted average. of a poptlation, j4., oF sarmple. T. the weights, w, have the same function as the Frequency in finding the mean for the grouped dala. Thus Lew or a= ee (207) ‘For this problem, the weights are the number of workers employed at each wage, and Ss equals the sum of all the workexs (S425) + (56) (15) we wie ie This weighted average compares with the simple average of S6 (S44 S6-+ $8)/3 = S6] and i a betier imeasare ofthe average wages, Anation faces a rate of inflation of 2% in ome year, 5% inthe sevond year, ane 12.5% inthe third your. Find tho geametrio meun of tha inflation rates (the geometnie man, op Ng, of oat oF n Positive numbers is the mth root of their product and is used mainly to average rates of change and index numbers XN, (2.8) cur, 29 DESCRIPTIVE STATISTICS 2 where Nj Xy).00) Ny refer to the w (or N) abservations. He = Y/CVSVUTS = WTB = 3% This compares with = (24+ $+ 12.5)/3 = 19.5/3 = 6.5%, Whew all the musbees are equal, jg equa otherwise jy smaller than j. In practic, 1g i ealculated by logarithms: Slee N ‘The scometsis mean is wied primarily i the mathematics of finance and Finansial managsmeot op ho = (ny A commuter drives 1Omi on the highway at 60 mi/h and 10mi om local streets at 1Smi/h. Find. the harmonic mean, The harmonic mean jx is used primarily to average ratios: N Bu = Spe) a (1/60) + O15) (1 4)/60 10 sean Tos amie sscanpeted with je =O VIN = (14 16)/9.= 14/9 = 37 Sanith Note that if ris ecnnter had aereapied 30.5 mifh it would have taken her (20 on/37-Sanij6O min = 32min to drive the 2 mi. Insicad she drives Gimin om the highway (10 ai at 6@ mish) and 40 min oe local streets (10 mi at LS mii or a total of Sin, and this is the (comreet) answer we get by using jy = 2M igh. That i (20rni/24i/h) x 60 min = Sein. (a1 Por the ungrouped data in Table 2.2, find the first, second, and third quartiles and the third deciles and siatieth percentiles. (6) Do the same for the growped «ata in Table 2.12, (Quarriler divide the data into 4 parts, deciles into 10 parts, and percentiles into 100 parts) Go) Q) Uist quartile) =.4 (the average of the 10th and 11th vahies in Table 2.8) 2; (second quartile) = 6 = the valve of the Sth item = the median 2 (thied quastie) — 7.5— the value of the 20.2 itn Dy (third decile) = 5 the value ofthe 125th item Fa (sistiath percentile) = 7= the value of the 28.5 inom nis F af = 24 msassmnses (220 (ey Beals nit aa * (90,18) = 53.90.4807 = 8897 = median (22%) =" (sa10) = 5.00 sn0792 = $4.08 (227) 2 DESCRIPTIVE STATISTICS lomar. 2 (224) = $4.00 + (80.10) seis SH + $0067 = S402 1225 MEASURES OF DISPFRSION Ru 243 (a) Find the range for the ungrouped data in Table 27, () Find the range for the ungrouped data in Table 210 and for the grouped data in Table 2.12. 4c) What are the advantages and disadvantages of the rangs? (@) The range for ungrouped data is equal to the value of the largest observation rminus the value of the smallest observation in the data sxt. The range forthe ungrouped data in Table 27 is from 210 10, 0r8 points, (8) Tassie far th ageonped ata is Table? Inde feeen 814St0 $4 26, 08 STL TE Fae grange sata, the range extends from the lower lint ofthe smallest lass to the-upper Imi ofthe largest class, Fo the srouped data in Table 2.12, the range extends from $3.50 10 5.29 (©) The-advantapes of the range-are that it i easy to find and understand, Its disadvantages are that it ‘cso the lowest nl highest valves of adistriition, ee ereally illinsea by-exterme abies sand it cannot be found for aper-ended distributions. Bectuse af these disadvantages, the range is of tel usefulness (except in quality control. Find the interquastile ange aval Ue quantile deviation (2) fox the wrod it Fable 27 and (4) for the grouped data in Table 2.12 (w) The interquartile range is equal tothe difference hetwcem the tind and frst quartiles; - 21-9 1226 For the ungrouped data in Table 2.7, [R = 7.5 —4 = 35 points ftilizing the values of Q; and Q« found in Prob. 210 (a) Note that he antrguartl ange iv aot afte By careme values becane a lies cooly the mide Kalf ofthe data, Its thus better than the range, but ite no as widely used. the other measures of cispersion, For the quartile deviatio o = (22) QD Therefore, QD = (9.6 4)/2= 3.6/2. ‘one-fourth of the da (R= Q, ~ 0, = SA8 ~ $3.82 = $0.25 otilering the values of Qy and Qy Found ip Prob > 10(6¥ p= 21-21 _ $4.08 S383 1.78 points, Quartile devindon measures the average mange of 02s Find the average deviation for (a) the ungrouped data in Table 2.7 and (B) for the grouped data in Tabls 29. (a) Since ps = 6 [see Prob. 2a). Eu DHLSOFAH2ETSOS1ESESEAE IE LEIS DOE TEED EE $ASISOFIES424EG42EIES42FOS 1424340404 34441 n ap DL. Lspointe cur, 2d 1 DESCRIPTIVE STATISTICS 28 [Note that the average deviation takes every ebscrvation into aecount. It measures the average of the abvolute deviation of each abusrvation from the mean. It taker the absalute value (indicated by the to vertical bars) Because SO(¥ — 2} =O (see Example Sh. (oy We sae fal rstnes evant fv Une sane rpm da wits Une abd of Table 216 DA wl 72 Ap ND the same as we Found for the wngroupod: data, ‘Table 216 Calewtalons forthe Average estat for the Grouped Data im Tabbe 29 Clans Midpoint r Frequency. | Moan v—p| fra 2 3 6 a 3 3 6 3 4 4 5 6 2 0 . . «6 | 1 5 6 6 6 o fo ® 8 4 6 2 | 2 8 ° 4 6 sf 3 2 S104 0 2 6 a | a 8 Dyeve@ Elr-a=7 Find the average deviation for the grouped data In Table 2.12, ‘We can dnd the average deviation for the grouped data of hourly wages in Table 2.12 with the aid of Table 217 (F = 3:95, ee Prob, 2.463): Note thatthe average deviation found forthe srouped data sm estate of the “rus” average deviation ther comid be wad ke the agent ata Th sally es saat fers tbe Fran average devitin because we use the estimate af the mean for the grouped data in our ealculations [compare the values of T Found in Prob. 2.0) and (6) ‘Table 2.17 Calculations forthe Average Deviation for the Grouped Data in Table 2.12 Hourly Wage, [Class Midpoint] Frequency [Mean J ¥—¥,]|— HL] f= ¥h, s XS f 5 si] os s Sa-h60 hos 040 | 030 bs 30-478 335 120 | 020 oa 380-389 385 4 1 | 010 pap 30-398 398 5 o.08 | 0.00 boo 400-409) 4.05 6 ow | a0 a0 410-419 48 3 20 | 020 050 420-429 4 2 0.30 | 030 ba Lfaaas Eri T = 300 26 DESCRIPTIVE STATISTICS lomar. 2 AS Pind the warianoe and the standard deviation for (a) the ungrouped data in Table 2.7 and (@) the grouped data ia Table 29. (°) What is the advantage of the standard deviation over the variance? fa Te and 6 Goce Prob. 234) SUV Wh UGTA OFS ELA OS TE IE OE WS TELS E OFTHE ESE LG HOPLAOFLS OAS IG H4 SFOS ESO E TEA O FETE OS ICH =i 2h .8 points squared Eww _ (_ ae, on Pe a pe VEE 219 pons (6) We can find the variance and the standard deviation for the grouped dats of grades with the aid of Tale 218 SEyiy =u _ 92 Poet ints. square ° w ay = 48 points squared and om Var = VER 219 points the same as we Found for the wngrouped data “Table 2.13 Calculations for the Variance and Standard Deviation for the Data in Table 2.9 Frequency f tm?) fora? ” 2 16 36 2 Tifa = py = 192 (6) Tisacvantags af me stand deviating wer the waa is thatthe stata oval is mepesiel the same units a& the data rather than in “the wideh squad,” which is how the variance is expressed ‘The standard deviation is by for the most widely used measure of (absolute) dispersion. {E10 Find the variance and the stangard deviation for the grouped data in Table £10 ‘We san find the varie andthe standart deviation forthe groped data hourly wage withthe nit of Table 2.19 [¥ = $3.95; soe Prob, 2416): obs aT and cuar. DESCRIPTIVE STATISTICS 7 ‘Table 219 Caleulatlons forthe Variance and Standard Iestation fur the Data in Table 2.12 Hourly | Class tea Wage, S [Midpoint ¥. 3] Froguency/ |S iw - 8 fit = 3 yas] ass 1 335 016 Oe aap369 | 3.65 2 395 0.09 01s aman] 31s 2 39s oat sama | 38s 4 398 oat song | 95 5 39s 00 on-s00 | ans ® a9s oat wea] 4s a ass 04 amar] 43s 2 393 008 Epan-3 IT 18 ote that in the Formula for and s,m — I rather than 9 i used in the denominator. The reason for this is that if we take many samples from a polation, the average of the sample varianees does not ead to qual population variance, 0°, unlce we we» 1 i the donominator of the Formula for «(mora wll be sald oF this im Chap, 5). Furthermore, ° and s for the grouped data are estimates for the true Fane £ thot com be found foe the grouped data because ae ie the coimate of W from the grote eat i our ealeulations, Starting with the formula for a” and s' given In See, 2.3, prove that (@) (2.280,4) (6) (2.2¥0,4) oy We can get by simply replacing wih Tang 4 sth im the numerator and WHR A — 1 mn abe denominator of the Formal for EF =F EPO = tea DAF = eT A Nt @ x N x AEP ae pe EM m We can pet rin the ame way as we did in part a The preceding formulas will simplify the-aleulutions for of and st fora large body of data. Cadi also helps (see Prob. 2.6) Find the variance and the standard deviation for (a) the ungrouped data in Table 2.7 and (0) the groupe lata in Tate 2.9. wine rhe style canpuarianal fowmnulas in Prob 217 28 DESCRIPTIVE STATISTICS lomar. 2 ) SENT Hh 28 Nh dy 6b 8D 4 36 I Se BT 106 15 4 254 25 4164 36-449 4 18444494 254 B64 ADF RY Hd 16 RTE 164 16494 Od 4 Op SbF DY BT LOD 9 2S = 162 1.637 (any(36) L482 —1aan _ 197 = 4.8 points squared the samme a2 in Prob. 2.1548), (6) We can dnd o” and o-for the grouped data in Table 2-9 with the aid of Table 220 11.832 — (409136) _ 1,682 — 1.840 192 ae Vee = VER 2 19 points 48 polats squared the same as in part @ and Prob 2.15 Table 2.20, Calcutations for the Vartance and Standard Deviation for tn Tabte 2.9 Gente [Cae Mutpaior Y | Feeney re v ne 1S24 2 3 é 4 asa a 3 9 ° 1344 4 5 x 16 4354 s 5 3B 3 S84 6 4 % 36 6574 T 8 56 ” 184 8 4 2 a asa4 » 4 Ea a yo 2 a ow Lrewean| Ser 219 Find the variance and the standard deviation for the grouped data in Table 2.12 using the simpler computational formula given in Prab. 2.17(b) “We can tind ¢ and » foe the grouped data in Table 2.12 with the ld of Table 2.21 0.0342 dollars sauazed and os VOORE 50.18 the came se-we found in Prob, 216. cur, 220 DESCRIPTIVE STATISTICS » ‘Table 2.21 Calewlations for the Variance and Standard Deviation forthe Grouped Data i “Vane 212 Hourly Css ‘Wape,'S | Midpoint x8] Frequency |X, $ a a saess9 | ass 1 338 1200 s03.09 | 365 2 730 265450 amo3% | 375 2 7.50 28.1250 asos9 | as 4 15.40 9.2000 s903.99] 39s 5 19.35 7a0128 4oo-409 | 40s 6 ux | 164025] geal aoa | as 3 yas [inzmas| S167 amar | 42s 2 aso [isms] 361280 Efean8 [Es = 9075 Ee = ORs Find the coefficient of variation V for the data in {aj Table 27 and (6) Table 2.12. fe) Whats the usefulness of the cocificient of variation’? (a) with je~ 6 and 2.19 (se Prob, 2.19) a 219 points eo Gpeints 0835, of 4.38% (6) With = 93.95 and » 30.18 (oor Prob. 2.19) (©) The coefficient of variation measures the relatiw dispersion in the data and is expressed as a pure number without any units. This ss to be contrasted with standard deviation and other measures af ‘absolute dispersion, which are expressed én the unite of the problem. Thue the eoeficient of variation ‘cam be used to-compare the relative dispersion of two oF mare distributions sxpressed in diferent nits, c= wns lia he ee i val ifr, Fo esata we wa ay Un lenge eat it ‘Table 2.7 i greater than that in Table 2.12. ‘The ovellcient of variation aso can be used to compare the relative digpersion of the came type af data over different time periods (when ys ar F and ors change) SHAPE OF FREQUENCY DISTRIBUTIONS 220 Find the Pearson coctfcient of skewness for the (grouped) data in (a) Table 29 and (@) Table 212 (ah With j= 6, ned 6.17 [ose Prob 2.3¢8), and o 22.19 soe Prob. 2.15(61 _ Hucimed) i 2 zy Se 0.23. (a pure number) ‘Note that mectan is greater than mean and that the distribution is sightly negatively skewed (see Fig. 2a. (6) With T= $3.95, med — $3.97 [see Prob, 2-4()], and 5 280.18 (see Prab. 2.16) sx = 20 = med). 4395-397) _ 34-002) Sa a = 033 30 DESCRIPTIVE STATISTICS lomar. 2 2:22 Using the formula for skewness based on the third moment, find the coeflicient of skewness for the data in (a) Table 2.9 aad) Table 2.12, (@) We can find the eoelciont of skewness for the data in Table 2.9 using the formula based on the thind moment with the aid of Table 2.22: 2 “Tamm = 4 This indicates that this distribation is negatively skewed. but the dogree of skewness is measered differently than in Prob, 2.71 ‘Table 22% Calcuations for Skewness for the Data in Table 2.2 Grade Frequency [Mean fara] fa? isa z 3 6 ; 3 3 6 ass 4 6 2 assa s 6 | $ ssa 6 6 0 0 e578 T 6 L 1 8 1384 t 6 2] 8 2 asoa ® 6 af|on 108 osm % 6 a] ot E () See Table 2:23, [Note that regarutess of the mensure of skewness sed, te 398 | ~0.30 | 0081 onie: 370-339 373 ao 0.0016 sons 3.80-389 388 4 a9s | nin | ooo oon 3590-399 395 5 ass | 0 fo o 00-09 403 é sas | oo | 000 0006 410-439 4s 3 as | om | aoe sons 49 2 ass | 930 | ooost Sole EAN T= 00570 R DESCRIPTIVE STATISTICS lomar. 2 2:24 Find the covariance between hourly wage ¥ and education Y, measured in years of schooling in the data in Table 2-26 Table 2.26 Employee Hourly Wages and Years of Schraling Employee Hourly Years af Number | Wage x,3 | Schooling : 1 a0 n > now uu 3 2.00 0 4 20 R s 11.00 6 7 25.0 18 8 1.00 18 » 650 R io 825 0 From the calculations ip Table 2.77, cow(, ¥)~ (108.55/14) ~ 10.388, When 1 and 9° are both above a tuclow their means, eavariance # imereased. Wher X and Y move in apposite diectians relative to that cans (empress 9), cownriance i decreased Sinee in this ease eaw(N, V} >> 0. ¥ and ¥ mawe together to thetr means. Table 227 Employes] Howly | Years of ~ oo Number | Wage X,S [Schooling r] (x —F) iw-Tor-7} 1 2 327s | 18 5595 3 ta | -2775 | 38 lasts 4 1050 2 175 | -18 2.05 5 1.00 | -orrs| 22 1.705 6 1500 6 3aas | 22 avs 7 25.00 42 S548 8 10 4a os ° 650 13 9.495 0 828 38 13.398 suas Bix — THY — ¥y= W338 CHAP. 2 DESCRIPTIVE STATISTICS 2 2.28 Compute the covariance from Table 2.26 using the alternate formula, Computations are given in Table 228 eovy, ¥) = (17388/10) (11.728)(13.8) = 172.88 162.49 = 10.355. ‘Table 2.28 Caketations for Covartance with Altenate Fortsala Employee Supplementary Problems FREQUENCY DISTRIBUTIONS: 1226 Table 2.29 gives the frequency far gasoline pricesat 48 stations ina town. Present the data in the Form a bistogram, « elativedrequency histogram, a frequency polygon, and an ogive. Table 2.29 Prequeney Distibation of Gasoline Prices rice, Frequency Toot 7 1.01.09 6 Liga a Lis-49 1s Lae. Ls 227 Table 2.30 gives the frequency distribution of family incomes for sample of 100 families ina sty. Graphs the data into a hietogramm, a relative frequency bisogramn, a Frequensy polygon, and an ogive ™ DESCRIPTIVE STATISTICS lomar. 2 ‘Table 2.90 Frequency Disinbution of Feil Tneosvet Fanily Income, § Frequency 10,000-11,999 12 12900-13,999 4 14,000 15,999 4 16,000-17,999 Is 1s,000- 19,999 1B 20,000. 21,9901 7 22,000) 33,900, ‘ 24,000 25.995 4 25,000 27.999 3 25,000-29,959 2 10 MEASURES OF CENTRAL TENDENCY aoe 229 Find (ap the can, (i) the madian, and (2) tho mode for the grouped dasa in Table 2.29. ns. (a) e= S115 8) Mestan = $1.16 (@) Mode = S117 Find (ah the arean, (6) the median, and fc) the mode for the frequency distribution wf incomes in Table 23. Aes. (a) N= S170, (69 Median = $16 000 (<) Mode = $15,053 FFind the mean for the grouped datain (a) Table 229 and (6) Table 2.30 by eodi dns. a) jem S118 (6) V =S17,000 1A ins aay 5/120 iby aon vec bru ly ee of $5 1/9 of Uke ab Freee mane A 6, ana 1M wane PST. What isthe weighted average paid By this fn Ans. hy 2685.88 For the se anna of apa invested in cach of 8 yeas aa invest dacued a sabe? sets of 1% dicing the frst year, 4 during the second year, and 16% daring the thisd. (a) Find yg. (8) Find px (c) Which ie appropriate? Ans. (a) ig =H BET (2) Ho |A plane traveked 200 mi at 60¢mi/h and 100 mi at S00 mh. What was itsaverage speed? Ans. ty = 562.5 ih A deiver purchases $10 woul sf gasiling at 90.90 4 gallon aol SI $1,100 gallon, What is the average pice por gallon?” das. ty $0.99 per gallon For the grouped data of Table 2.29, Gand tu) the Rist quale, (b) the secon! quartile, fe} the than ‘quartile, (4) the fourth decile, and (e) the seventicth pereentie. ns. 2) Dy = SUI (b) Q. S116 (0) Q,~ 81M Kl) Dy SLING fe) Pyy 81.195 For the gouped ata ia Table 2.20, fund (a) the tnd (d) the sixtieth percentile, dns. (a) Q= SIRRST (6) eunti,(6) th ious, fe) the thie desis, 19,833 $19,538 fe) Dy = SAAT (4) Pay CHAP. 2 DESCRIPTIVE STATISTICS 38 MEASURES OF DISPERSION 2.37 What isthe range of the distiitution of (e) gasoline prices in Table 229 and (hb) family incomes in Table a0 Ans. (a) S29 (8) $10,000 to $29,999, oF $20,000 2.34 Find the interquartile range ane quartile deviation for the data in (4) Table 22% and (4) Table 2.30, Aes (ah IRE SO Mand ON NANG (b) TR SATéand OR S938 2.9 Find the average deviation for the data in (a) Table 229.and (b) Table 230. Ans: (a) SOOS?S (6) $3520 240 Find (the variance and (6) the standard deviation forthe frequency distribution of eusolin pices in Table 229 dns. (oh of & 0.0045 dollars squared (8) 0 & 80.0698 2AL_ Find (a) the variance and (bp the standard deviation forthe Frequency éistibution of family ineonses it Table 230, ans, (0)? 19,760,000 doulas squared (9) 3 2 3489.22 EAE Using the eer camperanional formaias, find (a) the variance and (b the standard deviation for the distribution of gasoline pries in Table 2.29, css (wh 0 0.0089 allars suaeed CE) 0 230.0099 2AB Using the easier computational fovitals Hae (a) the varinace aid (0) the standant deviation for the family incomes in Table 2.0, Ane. (2) = 19,760,000 eotlare scpeared (Bb) #5 $415.22 244 Find the coeficient of varintion V for (a) the data in Table 279 and (i) the data in Table 230. (€4 Which data have the greater dispersion? Ans, (2) 0.080, or Be (H) O61, oF 26.1% () The data of Table 2.30, SHAPE OF FREQUENCY DISTRIBUTIONS 245 Find the Pearson coefficient of ckewness for the data in (a) Table 229 and (5) Table 230. Aus, (a) — 0.43 (6) 0.07 246 Find the coofltent of seewness using the formula based om the thid moment foe the data in (2) sand (8) Table 2.30. Ans. (a) = 188 (8) 755 27 Pinal the sefsinat of ketosis for th data do (a) Table 720 nel (2) Tae 9 0 es, (a) 177 (8) 300 248 For covariance, (a) in what range should the covariance for directly elated data fall? (6) for inversely related data? (ec) for unrelated data” dies, (at sow = (b) cov 0 [e} cove Probability and Probability Distributions 3 PROBABILITY OF A SINGLE EVENT If event can oceur in ny ways out of a total of A’ possible and equally likely outcomes, the Probability that event will occur is given by Pia) wu) where P(A) = probability that event 4 will occur aq = number of ways that event 4 can o¢e NV = total number of equally possible outcomes Probability can be visualized with a Venm diggs ofa? are of the rectangle represents, PEA) ranges between 0 and I In Fig: 31, the circle represents event A, and the G2) Feu TE PEA} = 0, event 4 cannot occur. Wf Pla) = 1, event # will oocur with certainty ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 7 If PAY} represents the probability of nonoccurence of event A, then PIA) + PLA) = 1 (ap EXAMPLE 1. A hicad {H) and a tail (T) are the two equally possible outcomes in tossing a balanced coin. Thus and A) PT) = 3 EXAMPLE 2. In rolling a fair die once, there are six posible and equally likely outeomss: 1, 2,3, 4 $, and 6 Ths Py FN 2} — 1H F5) GF “The probability of not rolling » Lis and EXAMPLE 8. card dock has 6 cans divided ints suits (hanaonds, Bears, chubs, aa spades) walle 1st it cach suit (1.2.3... Wjack, queen, king). Ifthe deck is welleabuficd. cach of the 52 cards is equally Wkly to be Picked. Since there are 4 jacks, the probability of picking a jack, J, on a sine pick ts m4 INR Since there are 13 diamonds, D. PID") 1 Prp) = La a and rib} + F(D') EXAMPLE 4. Sungei: that in 100 tess of a haikanssd sai we set OV Rend ame 47 tls Tha eeative forest of heads is $8/100, oF 0.53. This is the refative frequency or emplscul probabil’y, which isto be distinguished from the ¢ priori or catsiea! profabilry of FXEN) = 0.5. As the number of toss increases and approaches infinity in the limit, the relative frequency ot empitical probability approaches the a priori or elasical probably. For example, the relative frequency ar empirical probability might be 0.517 or 1000 tosses, 508 for 10,000 tosses, and s9 on. 3.2 PROBABILITY OF MULTIPLE EVENTS: 1. Rule of addition for manmutually exclusive events. Two events, A and B, are not mutually cexciusie ifthe accurrence of does not preclude the occurrence af B, or view versa, Then FA or By = Pia) + P(B)— P(A and B) a4) PLA and f) is subtracted to avoid double counting, This.can be seen with the Venn diagram in Fig 4 2. Rule of addivion for muruatly exclusive events, Two events, 4 and 8, are mutually exctusive ifthe soccurseive of of precludes the wveurrenwe of Byer vive versa [P(A aval) =O). Th Pid and Bl = Fi + PBL J) 38 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 Fig. 32 3. Rude of multipbearion for dependlens events, Two events are dependem if the occurrence of ome is connected in some way with the occurrence ofthe other. Then the joint probability of A and B is PUA and B= PLA) PLBy AY (36) This reads: “The probability that Aorh events and # will take place equals the probubility of event A times the probability of event 8, given that event A has already occurred.” P(B/A) = conditional probability of B, given that A has already occurred (3.7) and P(A and 8) = PB and A) Ga) Dee rob, 5.1(6) and (a). 4. Rule of madtiptication for independent events. Two events, A and B, are independent if the ovcurtence of A is not connected in any way to the oocwrrence of B, [P{8/a) = P(B)). Then P(A and B) = #4) PB) (9) EXAMPLE 5. Ona single tossofa dic, we can get only one of six posible oateomes: 1,2, 3,4, 5,0" 6. These are routualy exchstve vents, W'the di is fait, P{1) = P(2) = P(3) = P(8) = 213) = #16) = 1/6. The probability of setting a2 ov a 3 on a single toss af the dic is PQQ oF 3) = PI) + P13) = Similarly {2 oF 3 oF 4) = Pi2) + Fi8) + Fea) = EXAMPLE 6. Picking at random a spade or a king o0 a single pick from a wellabufled card deck does not constitute two mutually exchuive events because we could pick the king of spades. This 1 L_w_4 PIS or K) = PIS) + PIK) ~ PIS and K) = Using set hey, the pec statement can be reuse i an euivlen way as 4 SUK) = FS) +AK)— PISOK) = B+ 3-B- Sak ‘where the symbol Ui (read “union”) replaces wv and 7 tread “intersection”? replaces and, EXAMPLE 7. The outoomcs of tao svocessine tosses of « hakanced coke ar inipondens cvents. The outcome of the first toss im no way affects the oirtsome on the Keeond tous, Tha PUH and Hy) — PHM) — PIR) EH) Similac, AH and H and Hi = PH HH) = 1H) POE Pt) = 3-4 EXAMPLE 8 The probally that onthe ist pick fom dak we gt the king. ood is ri, CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS » 1 the frst card picked was faded he Ky of lamonds and iF the fist card was not replaced the probability of getting another Kingon the cond pick is dapendiet er the Get pick because these aro now only 3 Kings and 1 cards bet in the deck. The conditional probability of picting another king, given thatthe king of dimaonds was already picked and not reptant. is 3 PiK/Kul = 5 ‘Thus the probability of picking the king of diamonds on the fest pick and, without replacement, picking another king of the setond pick is Pe Fike) RI RE) = 5 (Rp and KY= FUR) RIN) = 35°97 = Fees ‘abt iv M000, Relat to sonal gra combinations and permutations, or “counting techniques Bayes thounsn (see Prolt 3:17} Proilie 3s18 seven 33, DISCRETE PROBABILITY DISTRIBUTIONS: THE BINOMIAL DISTRIBUTION A candor variable is variable whose values areassociated with some probability of being observed. A discrete (as. opposed to continuous} random variable is one that can assume only finite and distinet values, The set of all possible values of a random variable and its associated probabilities is called a probability distribution. ‘The sum of all probabilities equals I (sce Example 9} ‘One discrete probability distribution is the binownéal distribution. This is used. to find the probability of ¥ number of occurrences or successes of an event, P[-), im m trials of the same experiment when (1) there are only rio possible and mutually exelusive outcomes, (2) the m trials are independent, and (3) the probability of oocurrence ur suns, g, remains eonsiamt im gach trial, Then Pin) agg (su) where sr! (read “a factorial) =n (# — 1) (mn — 2) ‘The mean of the binomial distribution is 3-21, and 08 | by definition (see Prob. 3.18). =n (3.41) The standard deviation is a= apa) (3.22) Ip — 1p —0.8, the binomial distribution is symmetrical; if p < 0.4, i is skewed to the rights and if p> 05, itis skewed to the let EXAMPLE 8 The possible outcomes in * tosies ofa halanced coin are TT, TH, HT, and MM. Thus 1 poet rin! ant mt a The number of heads is therefore a discrete random variable, and the set of all possible outcomes with their associated probabilities is a discrete probability distribution (Gee Table 3.1 and Fig. 3) ‘Table 2.1 Probability Distribution of Heads in Two Tesses of a lanced Coin Nurnber of Heads Poste Ouicomcs Probaity a 7 a TH ur 9.50 1 40 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 as Prstabiby Nur hee Fig. 321 Probability Distetbution of Heads in Two Tosees x Balanced Coin EXAMPLE 10, Using the binomial distribution, we can find the probability of 4 heads in 6 Nips of a balanced ein as allows: a IB aye— ay! O63 aja t ta 5 43 me) 2 EL yoni asyen nxn as embers nh cet id peas can be aol wing App Ee ered nmtot ted nd te errsapeONh/}2) hose ie nama denise oe ota o- vn =a— TPR TA — Va VTS 1 22nd Bocanse p =0 5, thit probability disribation is symmetrical If we were not dealing with a coin and the trials were not dependent (asin sampling without replacement), we would’ have hid tee the hyporgecmeeri distribution (see Prob. 3.27, 34 THE POISSON DISTRIBUTION ‘The Poisson dsirsburion is another diserete probability distribution. Tt is used to determine the probability of a designated number of successor per ult of rimw, when the events of successes are independent and the average number of suscesses per unit of time remains constant, ‘Then Mes PAX) (ra) where X= designated number of successes PLY) = probability of W number of successes (Girock letter lambda) = average number of suocesses per unit of time ¢ = base of the natural logasithasie system, oF 2.71828 Given the valus of \ (the expected valle oF mean and variance of the P find e~* from App. 2, substitute in Eq, (3.13), and find PCY). n istribution), we can EXAMPLE 11. A police department reosives an average of Scallsper hour. The probability of reciving 2eallsin a randomly selected hour i Pix) ‘The Poisson distribution can be wsed asan approximation to the binomial distribution when wis large and vor | — p fe emall eay, 2 30 and mp $ and nil —p) > §, and it approximates the Fonson distribution when A > 11 Sse FTODS, 857 and 3.881, Another continuAKIs probatsty stistribution isthe exponential distro (see Prob. 3.39) Chebyshev 'stheceam, oF inequality, states that repardless fof the shape of a dlstriberlon, the proportion of the observations or arca falling within K standard deviations af the ‘mam is atleast |= 1/K, for X > | (see Probs, 340 and 3.72), cd ~ ar a eae + tf “ Yee Pip 38 Solved Problems PROBABILITY OF A SINGLE EVENT 31 (a) Distinguish among classical or a priori probability, relative frequency or empirical probabil- ity, and subjective or personalistic probabsity. (b) Whatis the disadvantage of each? (e) Why lo we study probability theory? a) According to classical prababily, the probability of an event A is given by Pid) = ¥ ‘where P(A) — probability that event 4 sill o¢eur re = number of ways event 4 can oecur N = total pumber of equally possible eutcoenes By the classical approach, we can make probability statements about balanced coins, fakr dice, and standard card decks a prior, ar-withowt tossing a coin, rolleng a cie, ar drawing a card. Relate eesucmy st erpirioal petabaiy i given by the eat of the wusnbe of ties ae vent cextrs the ‘otal number of actual outcomes or observations, As the ptumber of experiments ar trials fsach as the ‘ooring of a coin) increacer, the relative Frequency or erapirical probability approachec the laescal ora CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a 32 a3 priori probability. Subjective or persnnatstc probability refers to the degree of betle/of an individual ‘that the event wall oceur, based on whatever evidence i available tothe individual () The classical ora priori approach to probability can only be applied to games of chance (such as tossing ss Traut, rns Fait iss oe pishins wards fiona stanadund aovh wf sao} lies we wae determine a prior, or without experimentation, the probability that an event will oscar, Ia realk ‘world problems of ceonamies and business, we afte cannot axdgn probabilities « price aad the classical approach cannot be used, The relaive-frequency or empirical approach eversomes the sicvantopes of the clastcal approach by sing the rvlalive frequenries of maasl ceewrrences as probabilities, The diliculty with the relativefrequency or empirical approach és tat we get different probabilities (relative Frequencies) for different numbers of trials or experiments, These probabilities stabilize, oF approach a limit, as the numer of tris or experiments increases. ecause this may be expensive and time-consuming, people may end Up using it without a “suflesent” aumber of trials of experiments. The disadvantage of the subjective or personalistic approach to probability is that sffrent people faced with the same situation may come up with completely different probabilities, (o) Most of the decisions me face in economics, business, seiece, and everyday’ life invatve risks and probabilities, These probabilities are easier fo understand and illustrate for games of choice bocanse Objective probabilities can easily be assigned to various events, However, the primary reason for studying probaly theory i 10 help us make intelligent decisions in economics, busines, selene, ant everyday Me when sk and uncertainty ase mvolved, What is the probability af (a) A head in one toss af a balanced coin? A tail? A head or « tail? (6) A 2 in one rolling of a fair die? Nota? A2ornota 2 (or (by. Sinee each of the 6 sides of af ic is equally likely to come up and a 2 is one of the possi Pi) = ‘The probability of not rolling 2 that is, #42") i given by cia 1-P Pays ei) = (iy a spade, (c) the King of spades, Cd) ner the king af spades, ar (0) the king of spades or not the king of spades? ah Since there are 4 kings K an the 9Z-earas oF the sangre acok a (6) Since there are 13 spades Sin the SE cards, P(S) = 18/52 = 1/4 (©) There is only one king of spades in the deck, thetefone PCRs) = 1/32 (ai The probability of not picking the king of spades is PUK) = 1 ~ 1/52 = S1/S2 o) (RS) | PORES) = 1/52 1 51/30 = 53/30 = 1, or exctainty “ 3s 36 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 ‘An urn (vase) contains 10 halls that are exactly alike except that 5 are red, are blue, and 2 are gueen. What is the probability that, in picking up a single bal, the ball is (a) Red? (i) Due? (e) Green? (d) Nanblue? (e) Nongreen? (f) Green or nongreen? ¢g) What are the odds of picking a blac ball? (h) What are the ookls of wot piching « blue ball? Nn _$ « ny ho nas w « “ rip) 1B) 1-03-07 ) FG) 1 F(G)= 1-02-08 wn HG) + PG) 02408 = (e) Theodds of picking a he ball are piven by the ratio oF the mumber of ways of picking a blue bal to the ‘numberof ways of not picking & Hue ball, Since there are 3 Hue balls and 7 nonblue balls, the oddsin favor of picking a blue ball are 3 to 7, of 3:7 (ih) The odds of not (against) picking a blue ball are 719 3, or 7:3 Suppose that a 3.comes up 106 tlmes In 600 tosses of dle. ar) What Is the retanlve frequency of the 3? How does this differ from classical ora priori probability” (by What would you expect to be the relative frequency or empirical probability if you increased the umber of times the die is rolled? (a) The relative frequency or empirical probability of the 3 is given by the ratio of the number of times 3 comes up (106) out ofthe total number of times the dic is rolled (600). Thus the rekative froaucasy o7 empirical probability of the is 16/600 0.177 in 600 rolls. According to the classical ar a prion approach fand without rolling the die at alll, P(3) = 1/6 0.167. the die i fais, we expect the 3 10 ‘cme up 100 times in 600 rolls ofthe die as compared with the actual, observed, or empirical 106 times (b) Ifthe mumber of times te sane dic is roted is increased trom 60, we expect the relative frequeney empirical probability to approach (i, to becameles+ unequal with) the classical ora priosi peabalility The production process results in 27 defective items for each 1000 items produced. (a) ‘What is the relative frequency or empirical probability of a defective item? (b) How many defeeti do you expect out of the 1606 items produced each day? (e) The relative Frequency or empirical probability of defective item is 27/1000 = 0.027 () By muleplying the number of thems produced cach day (Ie00) by the relive fequency or emnplrieal probability of a defective stern (0,027), we get the number of defective items we expect omt of each day's ‘vutput, This is (1600}40.027) — 43, te the aearest ise. PROBABILITY OF MULTIPLE EVENTS a7 Define and give some examples of events that are (a) mutually exclusive, (b) not rhutually exclusive, (e) independent, and (af) dependent. (e) Two oF more events are mutually exelusve, or dinjoint, if the cectrsence of one of them precludes prevents the occurrence of the ethers). When one event takes place, the others) will not. For cuample, In-a single Mp of a coin, we pet elthor a head oF a tall, but nox both, Heads and calls are therefore mutually exchusive events. In a simple tous ofa dic, we get one and only one oF six possible watodnis, 1,2, 3,4, Sea 6. The oulscnies ant iefove swaRUally exclusive, A cas picked At éasons san be of only one sui: diamonds, hearts, clubs. orspades. A child is hom either a boy ara gi items produced an an assembly ine ic sither good or defective CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS: 48 39 (6) Two or more events are nos nautuaty excfustve if they may occur atthe same time. ‘The oesurrence of ‘one does not preclude the eocurrence of the other(s). For example, a card picked at randkown from 2 deck of cards can be both ant ace and a club. Therefore, aces and clubs ave not mutually exclusive vente. herr: we crmldl pick the ace of elnbs Resance wr eon have inflation and reeession at the same time, inflation and recession are not mutually exchisive events (2) Two or more events are inepondont if the oscarrence of one of them in no way afte the oceurrence of the other(s). For example, two successive fups of halanced coin, the outcome of the sacon Hip im 0 way dopeads ow te term of the fst fig Ths Sue is tre fay raw sures tasers a a fief dice or picks of two cards fram a deck with replacement, (Two oF more events are dependent if the securrones of onc of thom offsets the probability of dhe ‘ecurrence of the others) For example, if ae pick a card from a deck and do not replace it, the peababulity of packane the same card ae the second piek is Allother prokabiitirs alo are affected since there are naw oaly SI cards in the dock. Similarly. af the proportion of defective item is greater for the evening than for the morning shift. the probability that an item picked at random frem the evening satput is defective is arcater than for the morning oatput Drawa Venn diagram for ta) mutually exclusive events and (5) not mutually exclusive ever (c) Are mutually exclusive events dependent ar independent? Why? (a) Figure 3-6 illuctrates the Venn diagram for events 4 and é which ave enucuslly exclusive (6) Fagure 3-7 usteates the Venn diagram for events 4 and dF which are mot mutually exctusive. OO) Fig. 26 Figa7 (eo) Mutalty exchosive events are depsndent events, When one crsnt secure, the probability of the other occurring is. Thus the oecusrence of the fist allects (precludes) the escurrence of the other. What is the probability of getting (a) Less than 3 on a single roll of a fair die? (6) Hearts or clube on a cingle pick from a well chuilled standard deck ofearde? (s) A red or a blue ball from an urn containing 5 red Balls, 3 blue balls, and 2 green balls? (df) Mere than 3 on a single rol a fair dis? (a) Geting tess than 3 on a single roll of a (air dic means geting a | ora 2. These are mutually exclusive events. Applying the rule of addition for mutually exclusive events, We get Pier Fy +r) Using set theory, P(L or 2) can be cewrtten in am equivalent way as P{Q/U2}. where U is read “anion” and stands for a. (8) Getting s heart or a stub 96 a single pisk from a welkshufed desl of cands alse constitatcs two: maually exshisive events. Applying the rule of addition, we get PH or C) = PIMC) = © POR of B) = P(RUB) Mor Ser6)=rausus) =a mols ms) bed 46 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 (a) What is the probability of getting an ace or a club on a single pick from a wellshuled standard deck of cards? (Ia all rernaining problems, it will be implicitly assumed that coins are balanced, dic are fair, and decks of cards are standard and well shuttled and cards are picked at sarnborn without veplaccanat;) (@) What is the fwaction of the negatine tern in the whe of addition for events that are aot mutually exclusive? (a) Getting anace or a club does not constitute tuo mutually exclusive events because we could get the ace of clubs. Applying the rule of addition for events that are not mutually exclusive, we get 4. tw 4 Bt Ron G FiN os C) = F(A) + IC) ~ PIA and C) ‘The preceding probability statement can be rewsiten in an equivalent FIAUC) = PIA) + PIC) — ANC) using set theory a hse 7 i ral “nhcnoctvn aad sans fads () The function of the negative term in the rle nF addition for events that are net mutually exchaine isto avoid Wouble countmg. For example, m calculating FA or {) m part 4, Me ace of eluDs 1s counted tice, onge as an ave and onse asa eluly, ‘Therefore, we subtract the probability of geting the aoe of subs in ordcr Gv avoid thissdouble counting, IC iicevcris-arc mutually exclusive, dhe prebabiliy tha ‘both events will occur simultaneously is‘, and no double counting is involved. This is why the rule of sddition for mutually onchusive ovents dees net contain a negative tem, What is the probability of (a) Inflation [or recession R ifthe probability of inflation is.3, the probability of recession is 0.2, and the probability of inflation and recession is 0.06? (é) Drawing an age, a elub, or a diamond on a single pick trom a deck? (a) Since the probability of inflation wid recession is not 0, inflation and recession are not mutually exclusive events. Applying the rule of addition, we get Por R) = PU) + PIR) = P{l and Ry or PULURY = PUL) 4 PIR) = PUL R) nel PUlor R) = MUU R} =03 40.2 —n06— 0.44 ()Gotting an ace, a eb, ara diamond doet not constitute mutually exclusive evens because we could pet the ace of tubs or the ace of diamonds. Applying the rule of addition for events that are not mutuals exclusive, We get P(A or © or D) = MYA) + #(C) + PIDI— PA aml C) — PLA and D) 4,1 ,18 1 VT a is PUA oF © a 2) = What is the probability of (a) Two Os on 2 rolls of a die? (6) A Gon each die in rolling 2 dice once? c) Two blue balls in 2 successive picks with replacement from the urn in Prob. 3.4? (a) Thrce girls in a family with 3 ehibdren? (eo) Getting 4 6 on each of 2 rolls of a die constitutes independent events, Applying the rule of sis plication for independent cvents, we get P{6 and 6) = PIB) = PIG) Fi6}= 6 (6) Getting a 6.0m each die in rolling 2 dice once also constiies independent events, Therefore FUG and 6) = PIG) = PIB) PLO}= 6 () Since we replace the frst ball picked, the probability of geting a bu ballon the second pick is the same fc 09 the fet pick. The events ara independent. Therefore CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a Aa aa 9 warms! Gd) The probability ofa girl, G, on each birth eanstiutes independent events, each with a probability of 0.8, ‘Therefore MG and G and G) = PGOG 016) ‘oF J chance in 8 (Band (BN B) = PLB) PIB) 1G PIG) - FIG) = (0.8) (05) 40.5) =0.125 (a) List all possible outcomes in rolling 2 dice simultaneously. (6) What is the probability of petting a total of 5 in rolling 2 dice simultaneously? (ch What is the probability of gctting a total of 4 ar less in rolling 2 dice simultaneously? More than 4 (a) Bach dicts 6possible and oqualy likely ouicumes and the wuleume on eackdicisindepewlent. Sinve cach ofthe 6 ouenmesoa the first die can be associated with each of the 6 oatcomes on the second dic, thore are a total of 36 possible autsomes that bi, the sample space Nis 24, (In Table 3.3, the ist ‘uiibor refs tthe oatconse om the Bist die, and the sooond aumnber refers to the sozond dee, The dist can be disinguished by diffewet colors.) The total of the 36 possible outcomes also-car be shaven by 2 roe (or sequential diggram, as in Fig. 8 Table 32 Outcomes in Reiling Two Dice Sinultaneousy wt BT BE 4 3,1 2 2 at 4 5,2 3 23 33 4 33 4 ha ua 4 sa 5 as a8 4 58 é Ne XG Ae 46 (6) Oot of the 36 pocsible and equally likely outoomer, 4 of them givea total of $. These are 1, 4.2, 3;3, 2: and 4,1. Thus the probability of a total of § (event ) im rolling 2 ice simultancausly is given by fia) ot (0) Rolling a total of # ar less involves rolling total of 2 3,9 4. There are f possible and ecwally Ukely ways of rolling atotal of 4 or fest. These are 1, 11.21.3521; 2.2 and 3.1. Thus event 4 is defined as rolling a total of 4 or less. Pi} = 6/36 ~ 1/6, ‘The probability of getting a total of more than + equals T mimss the probability of getting a total of 4 oF less, This is | 1/6 — 5/6, What isthe probability of (a) Pickiag a second red ball from the win iin Prob, 34 when a red ball was alrcady obtained om the first pick and not replaced? (6) .A red ball on the second pick when dhs First ball picked was 04 rest aad was snot veplavel? Co} A seal ball ow the tise pich oh rod and a nonred ball were obtained on the frst two picks and were net replaced? (a) Picking a sccond red ball from the urn whem a red ball was already picked on the first pick and was not replaced is a dependent event, sine there are now only 4 red balls and 5 noneed balls remaining inthe turn. The conditional probubity of picking a second red ball when 2 re ball was already obtained on the first pick and was mot replaced is P(RR/RR} = 4/9 (6) The conditional probability of obtaining a red ball on the second pick when the first ball picked was not red (Rand was not replaced in the arm before the sesond ball is picked is PUR/R') = $9, 48 Bus. PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 Oxon oo scone on Ge inte the econ ie 6 Fig. 38 Tree Diagram for Rolling Two Dice Simultaneously e)_ Since ? balls, one of which was red, were already picked and not replaced, there remains a total of 8 alls, of which 4 are red, in the urn. ‘The (conditional) probability of picking another red ball i AARUR and Re’) = F(R/R' and R) = 4/8 = 1/2. What is the probability of obtaining (a) Two rod balls from the urn in Prob 3.4 in 2 picks ‘without replacement? (b) Twoaces from a deck in 2 picks without replacement? (e) The acs of ‘hubs and a spade in thar order in 2 picks from a deck without replacement? (df) A spade and the see of chuls ov that order im 2 picks fear a ddeck without replacement? (2) Throw ve halls from ‘the urn of Prob. 3.4in 3 picks without replacement? (f) Three red halls fromm the same urn 30 3 picks with placement? a) Applying the rule of multiplication For dependent events, we get 6) @ CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS ” 316 uy a Sand Ae) = ASN Ach = AS) Ae} = Beha = eRe a8 (Pend Rand R) = ARAROR) = AR) -#iR/R) ARIK ae) S43 ot oo Wee (7) With replacement, picking, three balls from an um constitutes three independent events. Therefore POR and R and R) = PIR) P(R)- PIR: io 10 10 Past experience has shown that for every 100,000 items produced in a plant by the morning shift, 200 are defective, and for every 100,000 items produced by the evening shift, $00 are defective During a 24-h period, 1000 item are produced by the morning shift and 649 by the evening shift What is the probability that am item picked at random from the total of 1400 items produces during the 24h period (a) War produced by the morning shift and ix dafective? (5) Was produced by the evening shift and is defective? (c) Was produced by the evening shift and is net defestive? (a) Te dofeetive, bother produesd by the morning or the evening shift? (a) The probabilities of picking an item produced by the morning shift MI and evening: E are 000 00 iM 0625 and PE) = SE ‘The probabilities of picking a defeetive item D from the morning and evening outputs separately are 20 00 2a a DIM) = sary = 8D — and FID /E) = ET = 0m ‘The probability that an item picked at random from the total of 16 Hems produced during the 24-h period was produced by the mening shif und ip defective ie XM and BD) = PM) #(D/M) = (0.6289(0.002) = 0.00125 ( P(E apd D) = PCE) A(D/E) = (0.375}(0.005) = 0.001875 %, G and D') = P(E)- A(D'/E) = (0.3% =asrsns @ PE and D!) = PIE) -F(D/E) = (0.15) SE = 03731 (a) The expected amber of defective itemsfrom the morning sift is equal to the probability of a defective item from the morning output times the mumber of items prodvocd by the momning, shift that i, (0,002 0 From the evening shift we expect (00005)(60M) = 3 defective items. Thus we expect $ defective items from the 1600 items prosiaced during the 24-h period, IF there are indies 5 defective items, the probability of picking al randar any of the S defective lems out of a total of 1600 items is $1600 1/320 or 0.003128, (a) From the rule of multiplication for dependent events Band , derive the formula for P(4/B) in terms of P(A} and P(R) This is Known as Raves" rhearem and is used to revise probabilities when additional relevant information becomes available. (b) Using Bayes’ theorem, find the probability Gtat a defective item picked at random from the 24h output of 1600 items in Prob 4.16 was produced by the morning shift; by the eve (a PiBand aj = FB) -PLA/BI By dividing both However, PR and 4) = PIA and By, exe Prob. 8.15(opand (i). Therefore 0 PROBADILITY AND PRODABILITY DISTRIBUTIONS cua. 3 P54) 8) ay r44/ Bp PE es toro 85) FR) FR (6) Applying Bayes” theoreen to the statement in Prob. 3.16, lets 4 sip the morning, shit Mand 8 sigmty defective D, and utiizing the results of Prob. 3.16, we get FIM) P/M) _ (0.625002) _ 0H? POn/D) = DY —aansiis~ a0sTs 04 Thatis, the probability that a defective teen picked al randora Gow the total 24h output of 1600 eens war produced by the morning shift i 40%. Similarly i9.375)(0005) _ 9.0m1s75 B/D) = rip rey = OTT ANNS! _ OemtES = 0.6, oF 60, Dyes’ theorcen can te generalized, for example, to find the probability that a defective item 2 picked at random was produced by any of w plants (4ie/= 1... ..n), as follows: Pid.) PB SAT PITA where 5) refers tothe summation over the plants (the only ones producing the wuipat), Bayes! orem is apphied im Hesiness decision theory, DU Is sekJom Wed IN the eG of oN, (Mewever, ‘rayesian econometrics is beeoming increasingly amportan.) Pai) = (48) 318 Acclub has § members. (a) How many diflerent committees of 3 members cach can be formed from the club? (Two committees are different even when only one member is different.) () How many commitices of 3 members each can be formed from the club if each commitice is to have a president. a treasurer. and a sccretary? (a) We are imerested here in finding the number of eombinasions of $ people taken 3 at a time without ccancern forthe onder ! SOF In genera, the number of arrangements of things taken ata time-without eoner for the onder isa combination given by = (aaa tar) where al tread w fastorial) =e-fn 1) fa —3)-—-3-2-1 and OF = 1 by definition, (6) Since cach committee of 3 has to havea president, a treasurer, and a seeretary, we ane mow interested in nding the number of purmmutations of 8 people taken 3 at atime, whem the order x éportane ee oe =a ial In general. the number of arrangements i define ode, of n things taken 1 ata time ism peomutae tion given by ” a (4s) Permutations and combimations foften referred to as counting teinigues) are helpful in counting the saeiher OF ally Whely ways eve a ode cela te Une Lat of alps aid ual likely ‘oatcomes, Combinations and permutations were not used in previous problems because those pro blame ware simple enough without ther CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 5 DISCRETE PROBABILITY DISTRIBUTIONS: THE BINOMIAL DISTRIBUTION 319 30 Define what is meant by and give an example of (a) a random variable, (b) a diserete random: variable, and (e) a discrete probability distribution, (a) What is the distinction between a probability distribution and a reativesfrequency distribution? ab A rondo wate isa variable host vahucs are aasecated with som protatility of hing sbacred Fr enatple, oe 1 roll ofa fat die, we have 6 mutually exclusive outenmes (2 3, 4,5, 0° 6), each aatociated witha probability ccurtcace of 1/6." has the eutcome from the rll ofa die Wa random “arable. CO) A cdssreie renin raninble is ou Haat cau asses ouly Guile or distinet values, For esas the ‘outsomes from rolling a dic sonstitutsdisrete random variables bscaruse they arc limited to-the values 12,44, 5, and 6, Thie to be contrasted with continous vorlahter, which san accome an infinite number of values within any given interval [see Prob. 3.31fa) (0) Addoceeie probably asieauion veer te he 961 of all puss values uf a (uixercleh random variable land their associated probabilities The sct of the 6 outcomes in rolling a die and their associated Peohabiltcs in an example of a disorate probability dsteibution. ‘The sum af the probabiliion ania ciated with all the valies that the diseste random variable can assume alway’ equals | (a) A probabiiy diserbusion reters wo the classe ora prfart probable associated with ll the values that 1 random variable can assume. Because those probabilities arc assigned a priosi and without any sapirimentation, a probability distribution is oftsn referred to as a ehevvencul (lative) fPequensy sdstribution, This differs from an empicical (relative) frequency distribution, which refers 1o the ratio of the number of timer exch outcome actually occurs to the total mumber of actual trial or observations. Far example, in actually rolling « die a number af times, we are not likely to get ch outcome exactly 1/6 af the times. However, at the number of tolls increases, the empirical (elative) frequency distribution stabilies atthe (uniform) probability ar sheoreticl relativefreq wency distribution of 16 Derive the formula for (a) the mean js ar expected valwe EC¥’) and (b) the variance for a sdscrete probably st ration. (a) The Fortuila for the arithmetic mean far grouped population data [Eq. (2 2a] is ret ante where 55 ffs ihe sum of the frequency of each class f thnes the class mikpolnt W and.” = 5 7, whieh te the number ofall observations or frequencies. In dealing with probability dstibutioms, the mean ye ‘soften soterced tows the “eajtesl Nabe” £(). Ths fovaula fos ye or EA) fou a shancste poobalty sistribution can be derived by starting with Eq, (22a)and keting f = PL). which isthe probability of och of the possible omtoomer W, ‘Thon, 32 f¥ — S5MDUN), which ic tho cum af the valve of each outcome times its probability of eccurence, and N= Ef = 5.A(X), which is the sum of the prob abilities of each evtasune. which is 1 Thus Fer) =e EP (n (6) The formula for the varinnce of grouped popolation data [Eg. 2] is Ev - i" u (ray ‘Qnoe again letting f = PLY’) = probability of cach outcome and the formula for the variance of a discrete probability distribution Erebrun we cam got Var Xa of = = E(YIPPC) = SPP EG = BY (2 22 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 321 Table 3.3 gives the number of job applications processed at a small employment agency during the past 100~day period. Determine the expected number of applications processed and the variance and standard deviation, ‘Table 3.3 Number of Jub Application: Procesced during the Pact 100-Day Period a it) 0 » : M4 4 ‘To the extent that we believe that the experience ofthe past 100 days is typical, ws can find the relative frequeney distibution and equates probability dist®bution, This and the other calculations to find) and Var Y are shows in Table 3.4 VarX =o} =) A0X) —[SENPUXIF = 116— (10.6y = 116 — 112.36 = 3.64 applications squared SDN = oy = ye} = W369 & 1.91 applications ‘Table 34 Caleuations to Fin the Expected Vatue and Variance lumber, Days, rin) Erin x “erin 7 1 on Oo ” 1 8 w ou 08 o 64 w 20 02 20 100 204 un 20 03 33 12 363 1 » va 24 ry ake 4“ Ww on a 196 4 NeSsreto | Daye | xy = 106 y= 6 BUYS = DPD 106 applications 3:22 (ap State the conditions required to apply the binomial distribution. (8) What is the probability ofS heacls in 5 flips of « balancod-coin? (c) What is the probability of less than 3 heads in 5 flips of a balanced evn? (@) Theinomial distribution i used to find the probability of 1” number af occurrences oF soocesses of an seat, PA, aw Winks ofthe sia eopesinget when (2) trace sul)? auutually ealuseve oulkonnes, (@) them trials are independent. and (3) the probability of vccurrence.or succes, p, remains constant in-each trial CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 2 (on FX) aap = PF = (Pol at = a at a See Ege. 3.10) and (3.17). Ia some Books, 1 — p (the probability of failure) iedefised at. Here we — 5 No=3,p=1/2,and 1—p= 1/2. Substituting these values into the presediag equation, we get PO)= ge gg tat (Ua? = (1/2 = 191/32) = 92125 i) PIX -<3) = PION PI) + PD) ET 5 as. PD) = peg UF RY = 35 = 0.125 Thus PUN <3) = POON PI) + PQQ) — 03125 +9.15625 40.3125 — 0.5 323 (a1 Suppose that the probability of parents having a child with blond hair is 1/4. ‘there are 6 chikdren in the Family, what isthe probability that half of them will have blond hair? (bt I the probability of hitting a target on a single shot is 0.3, what is the probability that in 4 shots the target wil be hit at Teast 3 times? (a) Meee 6.8 —3.y— 1), and 1p 3/4, Substituting these values inte the binomial formals, we st 8s apraiay Phe teayanven 85432 ae PSN ap UNC =a (LOHNTION) =F (2/4096) Nga son thy Here n= 3, and 1p PIX> 3) PI) +A) PB 0.3"(071 Thos 3.24 (a) A quulity inspector picks a simple of 10 tubes al random from a very large shipment of tubes knows to contain 20% defective tubes. What is the probability that no more than 2 of the tubes picked are defective? (b) An inspection engineer picks a sample of 15 items at random from a manufacturing process known to produce 85% acceptable items. What is the probability that 10 of the items picked are acceptable? (0) Heron = 10, 22, pehd,and 1 p05: s PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 AN S21 PLO PL) + PRR) 10! ‘oro —07) = 0.1074 ooking up m= 10,0 0, and p= 02 in App. 1) Pil} = 0.2684 (looking up m= 10,1 = 1, and p= 02 in App. 1) P{2} = 1.3020 (looking up w= 10,1 = 2, and p= 02 in App. 1) Thus PIN S21 P(OD-+ PL) + PLZ) —O.1074 + 0.2684 + 0.3000 = 6778 (8) Here m= 15, ~ lip 8.85, and | p= 0.15. Since App. | only gives binomial probabilities For up 10.0.5, we should transform the problem. The probability of = 10 acceptable items with equals the probability of = 5 defective items with p=4.15. Using a = 15, ¥'= S defective, (of sbjcctive) = 0.15, we pet 0.0849 (from App. 1). Pio} (o2)"os)" 25 (a) IE balanced coins are tossed simultaneously (or 1 bakaneed coin is tossed 4 times), compute ‘the entire probability distribution and plot it. (6) Compute and plot the probability distribution for a sample of 5 items taken at random from a production process known to produce 30% defective items. ta) ; V=0H, IM, 2H, 3H, or 4H; P= 1/2; and App. 1, we get POOH) = 0.0625, 3180, POH) = 4.2400, P(aH) 00635, and En PUI) = 0.3500, PH) thas POOH) + #(0H) + PIM) + PCED + PAH) (0625 + 02500 + 0.3790 + 02500 + 90625 = 1 ‘See Fig. $1 Note that = 0.3 and the probability distribution in ig. 3-9 is symmetrical, z 2 ass an a qu ans ‘a Senter le Number f eter fers Fig, 34 Probability Distribution of Heads in ‘Tosting Foor Balanced Coins Fig, 310 Probability Distribution of Defective ems (81 Using n= 5 4 4.4, or $ dof five; ar p= 0.3, we got pf) = 0.1681, #1) = 0.9602, #(2)—= 0.3087, #3) = 0.1523, 2(8) = 0.02H, AS) = 00028, Therefore PQ) + #1) + PI) + PCR) + PIA) + PIS) = 0,168] + 0.3602 +.0.3087 + 0.19234 00384 40.0024 = ‘See Fig. 3.10, Note that p<. and the probability distribution in Fig, 3-10 is skewed to the righ; 3.26 Calculate the expected value and standard deviation and determine the symmetry or asymmetry of the probability distribution of (a) Prob, 3.2%(a), (6) Prob, 3.24by (c) Prob. 3.240), and (d) Prob. 3.2406). %) EL) = po up = (6)(1/4) = 3/2 = 1S blond children SDY = ye@pT =i = YETTA = VTR7TR = VTE & 1.6 blond children Becaure p < 0.5, the probatility distribution of blond children ic ckewed to the + CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 58 or T= op = (410.3) = sox = vty = YARTHET < = vO & uyens Becatse p05, the praabilitydivebution is skewed to the right (eh BUY) = c= mp = (10)(02) = 2 defective tubes SD-X = Vinpll — p) = V{1OO2NO.8) = VIG = 1.26 defective tubes: ecause p< 0.9, the probability dixtrbution ts skewed to the sight cr zixt= = (1510.85 = 12.7 accemtabie items SD. = api =p) = VTSORS|@1S) = vASTTE 1.38 avoepeable ems Because p> 0.5, the probability distibution is skewed tothe et 3.27 When sampling is done fiom a finite population wishous replacement, the binomial distribution cannot be used because the events are not independent, Then the Aypergewmetric distribution is wed. Thit ie given by CQ) hhypergeometrie distribution (an Te measures the number of suosesses in a sample size taken at random and without replace ment from a population of size N, of which ; items have the characteristic denoting success, (a) Using the Formula, determine the probability of picking 2 men in a sample of 6 selected at random without replacement from a group of 10 people, Sof which are men. (6) What would the result have been if we had (incorrectly) used the binomial distribution? an (@ (") 7 re aa (al ag Pua o Pa) [should be noted that when the sample ie wery small in relation to the population (sa, less than 3% of ‘the population), sampling without replacement has ile effect on the probability of sueves in each tial and the binomul distribution (which is easier to use) #64 good approaimation for the hyperscometcic istribution. This is the season the binomial distabution was used in Prob, 3.2Ka), THE POISSON DISTRIBUTION 3.28 (a) What isthe difference between the binomial and the Poisson distributions? (b) Give some examples of when we can apply the Poisson distsibution, (ce) Give the formula for the Poisson distribution and the meaning of the various symbols. (d) Under what conditions can the Poisson distribution be used as an approximation tw the binomial distribution? Why can this be useful? (@) Whereas the binomial distribution can be used to find the probability of a designated number of suvseases im ins, ths Poitaon distibution is used to funk the probability of designates uuaibec of successes per wn ef tine, ‘The other conditions required te apply the binomial distribution also ars reuited to apply the Poizvan dictation: that i (1) there must bo only te matallywxchicive oot 56 any 30 [omar 3 comes, (2) the events must be independent, and (3} the average number of successes per unit of time (6) The Poisson distribution is ofen used in operations research in solving management problems Some samuples ate the aber of telephone alls to te poles pat hous, Hae wunibes of castonnaes aciving ata ‘gasoline pump per howr, and the sumber of trafic accidents at an intersection per week (6) The probability of a designated number af successes per anit of time, Pi), can be found by Met oT ix ‘where X= designated number of successes he averse neimber af sueeesies wes a specie ime perio he base of the natural logarithes system, oF 2.70828 Given the value of, we can find «* from App. 2 substitute it nto the fom, and-ind P(X). Note ‘hats the mean and variance ofthe Poison distribution, (We can use the Poisson distribution 85 an approximation to the binomial distibation when w, the srumber of tak, i large and p oF Up is small (are events}.A good rake of thumb isto use the Poisson distribution when 20 and np or n{l-~ p< S. Whenm is large, it cam be very time consuming to wse th binomial distribution and tables for binomial probabliiss, for very small vals of p may pot be availble. Ifa(l ~p) < 5, soosess and faire shut be redefined so that ap < 5 to snake the approximation ascarate. Past experience indicates that an average number of 6 customers per hnur stop for gasoline at a gusoline pump. (a) What is the probability of 3 customers stopping in any hour? (b) What is the prehahility of Tcustomers or less in any hour? (0) What is the expected value, or mean, anc standard deviation for this distribution? fe*_ (2 \ ® any — GINO _ OSES _ gap é oy Fin ray) P2) 4) fe ayaa Se (O08 gory Ge _ (360.0248) £3)= 00898 fo ut) Ths 5 3) Onn Com oc S28 =o.onss = 00ds6 (2) The sepsgted walvs, of moan, of this Poisson distribution is A — 6 cistomers, and the standard devis tion is VA = VB 2.45 eustoners Past experience shows that 1% of the lightbulbs produced in a plant are defective. Find the probability that more than | bull is defective in a random sample of 30 bulbs, using ta) the binomial distribution and 4b) the Poisson distribution (@) Here 30, p = 0.01, and weare asked to find P(V > 1}. Using App. 1, we pet POO) + Fi) + PLA) + = MORE + O.O031 + 8002 = AMET, oF 61% (6) Since oe 90 aad np — (3RY(0.MT) —03, We san use He Poissow appeosination of the binonal Alstibution. Letting = ap = 0.3, we have tofind PN > I} = 1 — PLY 1), where ¥ is the mamber of Gofective lls. Using Tg, (3.13), we got CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 7 Poy =e (0.3)(0.74082) = 0.222285 Pio 8 —o74n82 PUY S 1) = PI) + Pi) = 0.22246 + 0.74082 = 0.965066 Thos PUPS Ent St) = 1 sie = O.0;6M, oF 3.895% ‘As becomes large the apyicosimation besomies even clotce. CONTINUGUS PROBABILITY DISTRIBUTIONS: THE NORMAL DISTRIBUTION aa aa (a) Define what is meant by a continuous variable and. give some examples. () Define what is meant by a continuous probability distribution, (c) Derive the formula for the expected value and variance of a continuous probabiity distribution, Ga) A. continser veriabte is one that can assume any valne within any given interval. A continous \anable san be measured with any degree of aocuracy simply by wing smaller and smaller ucts of rmearurement. For example, if we ray that » production procots takes 10k, this mane anywhere berween 93 ard 10-4h (10h rounded to the nearest hous). If we used mimates as the unit of measure- ment, we could have sail thatthe nrevluction process takes 10h and 20min, This means anywhere between IWhard 19.Seminand 10k and 24min, and sows, Times thus a continuous Variable, and 30 arc ucight, distance, and temperature. (6) A.cominuous probability setbuson refers to the range ofall possible values that a continuous random ahr saa ascnene ragether with the stsoeiatet peabaltiis The penhabiity cistriatinn of a ei tinuous random variable is often called a probability density mcrion, or simply a prabulility Faction, Tes given by a smooth curve such that the total area (probability) under the curve is 1. Since 2 continuous random variable can assure an infinite nurmber of values within any given interval, the probability of a specific value is 0. However, we ean measure the probability that a continuous random vaniable ¥ assumes any valne within a given interval (say, betwcon .y and 3} by the area under the carve within that interval; iti W Ny [i rae (27 ys ‘whore f(s tho equation ofthe probability density funstion, andthe integration sign, J, ie analogous to the summation sign © for discrete variables, Probability tables for some of the mast sed con tinnions prnhabity Aitiohntions are gira inthe appends, this ciminnting thr rit to getirr the integration oursches, (2) Tho expected value, or mean, and arance for continuous probability disteiutions can he dovived by substicating J for 5 and f(¥) for PLX) into ths formuls foe the expected value and varianee foe dliscrete probability distributions (Eqs. (3.0) and (2.2 symm [aren av (34 Var = fw mevyp pvp a (a5) (a) What is normal distribution? (5) What is its usefunese?.(c) What is the standard normal disteibution? What is its usefulness? (a) The norm dicinbution 8 a continous probability function that & bell-shaped, symmetrical bout the ii, ail scouts eGue iat Sec. 24), AS we ans Mle aay Cons Ue cna i ttle directions, the normal eurve approaches the horizontal axis but never quite touches it). The equation of the normal probability fanction i given by 38 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 14] Where (17) = height of the normal curve 2 = shanna deviation of thea l. () The normal distribution is the mast commonly used of all probability distributions in statistieal anaivsis. Many distnbutions actually found in nature and industry are normal. Some examples are the IQs (intelligence quotients), weight, and Beights of a large aumber of people and the variations in dimensions ofa large number of parts prodosed by a machine. The normal cistribation often can be used to approximate other distributions, sich ac the binomial and the Poisson distributions (ese Prob. 3.7 and 3.38) Disinbutions of sample means and proportions are often notmal, regardless of the distibution of the parent population (Se See. 4.2), (e) The standard normal distribution i¢ a normal distribution sith j= and o° ‘Any normal disttbution (defined bya particular value for y and o°) can be transformed into a standard normal distribution by letting ¢— 0 and expressing deviations from y+ in standard deviation units, We often can find areas (probabilities) by converting Y values into corresponding > values [that ix, (= )/o} an looking up these = values in App. 3 from minus infinity to plas infinity) 2) fore era ar sa ene Find the area under the standard normal curve (a) between z+ 1,242, and 243; (6) from 2S Dluz = O88 () hows = 1.0 lue = 2.55, (a) Ww lheboll uf: = 1.60, (@) lu the aight of r= 2.55; (A) (0 the left of z= =1,60 and to the right of z= 2.55, a) Thearea (probability included under the standard normal curve between = 0 and z= 1 is obtained bby looking up the vale of 1.0 ix App. 3. This is accompbabed by moving down the z column en the tableto 1.0 and then across until we-are below the columa headed 00. ‘The value that we get i 0.3413 This means that 34.13% of the total area (of 1 of 100%) under the eurve lis between z= 0 and P= LO0, Because of symmetry, the area hetween z—0 and z~—I is also 0.313, of 34.13%, the area. betwee Land z= 1 8 68.25% (see Fig. 3). Similarly, the area between ig 4092, of 41.12% (by Hooking up r= =u) im the eablep, 30 hat the area between, Fo £2 1s 95.44% (ope Fig. 4). The area between 7+ 3 = #9474%5 (see Fig. 3-42, Nove thatthe table sly ass tailed valucy fre ay hy 2.99 Benne Une a wes Ue ete wale «3 i wali (6) Thearea between z= Oand 2 = 0.88 is obtained by looking up 0.88 in the table. This is 0.3106. (©) Thearea between z= O.and = ~1.60 is obtained by booking up z= 1.60 in the table, This is 0.4452. ‘Thearea between z= 0 and : = 2.58 is obtained by looking up 2 = 2.55 in the table. This is 0046. Thas the area under the standard normal carve from z= =I-60 and 2 = $5. cquals 0.4452 phas D546. This is 0.9598, or 93.8% (see Fig. 311). Ima probleme of this nature itis helpful ta sketch a figure i) Weknow that the otal arca under the normal curve is oqual fo 1. Bocauseof symmetry, 0.$0Fthe area s on either side of =O. Since O.A8S2 extends from 2 = 0 to 2 = ~ 1.60, 0.5 ~ 0.8482 = 0.0548, or 5.48%, is the area in the left tll, to the left of 1 6D (ave Fig. 3-11) fe) 0.5~ 0.4049 = LOSS, oF 1.54%, is the area in the right tail, to the right of = 2.85 (see Fig. 3-10. (Fr Thearca to the left of z = —1.60 and tothe right of : = 2.55 is equal to-1 ~ 049998 (sce part ch. This is 1.0802, o 6.02% of the tal. CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS: 2 male Pig. S18 334 The lifetcne uf lightbuls i» kawwa to be morally distributed with ys = LOK sumer = Sh, What is the probability that a bulb picked at random will have a lifetime between 110 and 120 burning, hours? ‘Weareasked here find P(110 < ¥ < 120), mbere 1 refers to time measured in hours of burning time. Given = 100 and o'= Sh, and letting 2 = 110b and 1) = 120, we get My=w_ tot Xam _ 120-100 O28 and $20 100 20 ‘Thus we want the area (probability) between +) = 1.28 and =) =2.50 (the shaded area in Fig, 312). Looking up £3 = 2.50 in App. 3, we get 04938, This is the arca from 2 =0 to 2)= 250, Looking up 21 = 125, me get 0.364. This thearca from z= 4-24 = 1.25, Subtracting 0.394 from 0.4938, we pot (00954, of 9.948%, for the shaded area that gives P(I1O <1 < 120). ee = Fig. 312 3.38 Assume that family ingomes are normally distributed with js — $16,000, and» — #2000, What is the probability that a family picked at random will have an income: (a) Between $15,000 and $18,000? (6) Belew $15,000? fc) Above $18,000" (ay Above $20,000? (a) We want (815,000 < ¥ ~ 818,000), hese X is faeily incase: y= nw _ $15,000 — 816,000 _ Ayn _ $18,000 - 316.000 e ‘aan 5 dae oy ‘Thus we want the area (probabiltys between z= —05 and 4) =1 (Whe shaded area in Fig. 313). Looking up 2=05 in App. 3, we got 0.1918 for the arca from z—0 to z= 05. Looking up aI, we get OMI for the area from z= ta z= 1, ‘Thus, P(815,000 < X < $18,000) =0.1915+ O13 = 0.5828, oF 53.25%. ie ica oe Hecle oe Fig. 313, 337 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 18) PLY’ < 815,000) = 0.5 ~ 0.1915 = 0.3085, or 30.85% (the unshadedd ara im the left tail of Fig. 3-13), Ae) ra > $1000) = US — 0.3418 = OAK, of 18874 (the Unshaded anen in the right tal of Fig. $132. {d)_¥ = $20,000 eorvesponds to 2= (820,000 ~ $16,000) /$2000 = 2. Therefore, PL’ > $20,000) =0.5~ ‘The grades om the midterm examination in a large statistics section are normally distributes! with mean of 78 and a standard deviation of & The professor wants to give the grade of A to 10% fof the students. What is the Towest grade point that can be designated an Aon the midterm? In this problem we are asked to find the point grade such that 10% of the students will have higher grades, “This involves finding the grade point X such That 10% of the area under the normal curve Will Be to the right of (the shaded azeain Fig, 3-14), Since the total areaunder the curve tothe right of 8 5 0.5, the swashadet area in Pig 3-14 tothe righ of 7S mma be O.. We muse look inv ahe Body oF App. ¥ forthe valve lowest 1004. This is 0.3997, which corresponds to the z valve of 28. The X value tthe grade point) that sorresponds to the = vals of 1.28 is obtained by substituting the known valuss inter — (N'— sr and solving for W “This piers 1074 WTR Thesele Vm 78+ M24 = 88 74, oe RS te avast Whe ae 9.3000 oe (Gene pit Pig. 54 ‘Experience indicates that 30% of the people entering a store make a purchase, Using (a) the binomial distribution and (8) the norenal approximation to-the binomial, find the prabability ‘hat out of 30 people entering the store, 10a more will make a purchase. ta) (= 10) = PLO) + {TI + PI) +--+ + P(30) = 0.1416 + 0.1103 +0789 + CO + 00231 ++ A10106 40,0042 + 0.0015 44.005 + 0.001 a? 16) je np = (309(0.3) <9 persons, and o = yfapit—p) = yGONOSHOT = v3 002.51 persons. ‘Since n= 30 and both ap and a(t = p) > S, we can approxmate the binomial probabelty with the ‘normal. However, the number of people ssa dscrote variable. In onder to use the normal distribution, ‘4p Must {reat the number of people as HAL Were a continNOUS NaN and Find FA. 93). Thus 2 From z= 8.20, we get 0733 (from App. 3). This means that 0.0793 of the area uoder the standard normal curve bes from = Ota = 0.20. Therefore, P(X > 9.5) = 0.5 = 0.0793 = 0.4207 tthe normal appiesimalions Ase becomes even large, the appresimation Lacowns eve chiser [LP we had wot ‘treated the number of people as a continuous variable, we would have found that PLN’ = 10) =O, and the approximation wold not have been ae clace.] CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a 338 339 A proclction process produces I defective items per hour. Find the probability that 4 or ess items ate defective out of the eutput of « vaadonily chosen howr using. a) the Poiston distrbe tion and ¢4) the normal approximation of the Poison (ab Here A= 10 and we are asked to find P(X <4), where X is the number of defective items from the output of a randomly chosen hour, The value of ¢"* from App. 2 & 0.00005, ‘Therefore FID ZEN _ W005) gggos = 00083335 gr08s9s nM (0) + #1) + PLR) +3) + 0000S + 0.0005 + 0.0025 + 0083335 + 0.020335 = 0032217, oF about 3.2745 (Gh) Treating the item os comtinuows [ose Prob 33%, we are wid ta find 2X <4), whens W inthe number of defective items, = A= 10, and o = v= VIOH316, Thus Ka 49-10-92 a" ie Fie Foc : = Lin App. 3, we get 0.459], This means that M.S ~ 0.4591 = 0.0409 of the area (probability) ‘under the standard normal curve lies to the left of : = 1.74, Thus ALN < 4.5) = 0.0409, of 4.09%, ‘As ¥ booowics lager, we get a betier approximation (If we had not ‘rcatad the mamber af defective items as a continous variable, we wouk! have found that PAX < 4) 0.287) 1 Thevents or successes fallow a Poisson distribution, we can determine the probability that the frst event occurs within a designated period of time, P(T <1), by the exponential probability distriburion. scause we are dealing with time, the exponential ic a. continuous probability dlstribution, This is given by (3.27) wlicte i Use wuinber of wseanseacs Fo Ue inernal af iaverest anal e* cai be obtained from App. 2. The expected value and variance are (25) (329) (a) For the statement of Prob. 3.29, find the probability that starting ai a random point in time. the fit eustomer stops at the gasoline pump within a half hear (A) What isthe: pensahility that no customer stops at the gasoline pump within a half hour? (e) What is the expected value and variance nf the exponential distributing. where the comtinuons variahke is time 7? (a) Since am average of 6 custoniers stop at the pump pee hour, A = average of 3 custowiers per half hour. ‘The probably thatthe frst customer wil stop within she frst half our is Ine 7 = 1 —0,09979 (from App. 2) = 0:9502, oF 954 340 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 () The probability that no-eustomer siops at the pump within a half hour is ae san (2 E(P) = 1/4 = 1/6 20.17 por sar, and yueT = 1/38 = 1/26 20.07h por car aquared, The expe. acatial distribution also can be used to calvulats the tims betwoen two successive eves, The mean level of schooling for a population is § years and the standard deviation is | year. What is the probability that a randomly selected individual from the population will have had between 6 and 10 years of schooling? Less than 6 years or more than 1D years? Since we have nat been told the form of the distribution, we eam use Chebyshev’ theorem, which applies tivany diseiburion. With ye = 8 ears andr = 1 year, 6 years af schawaling #2 standard clevintinns Below j and 10 years of schooling ts standard deviations aboxe Using Cheryshev's theorem or inequality We obtai PUR —y| = Ko) > 1 130) ‘The probability ofan individual picked at random froma the population wil be within 2 standard deviations trom the mean 1s Therefore, the probability that th indivigeal will have ha cither less than @ or mere than ID years of schooling & 25%. Supplementary Problems PROBABILITY OF A SINGLE EVENT 3aL ‘What approach to probabihty ts mvolved in the Yollewang statements? {a) The probabibty ofa head in the tex of a balanced coin is 1/2. 18) The relative frequency of a head in 100 tosses af a coin Is S3. Ke) The probability of rain tomorrow Is 29%. ‘dns. (a) The classical a a priori approach (6) ‘The relative frequency or empirical approach. ¢@) The subjective or persoaalistc approach. ‘What isthe probability thst in tossing a balanced coin we get (a) a tail, (6) alhead, (c) not ata or (dt a Wor pot a tail! ans. (a) PUT 1/2 b) PUMP 1/2 te) PC) = ya a PCH) + PT ‘What isthe probability that ine roll ofa fair die We Bet (a) a1, 48) 46, (Ch Hota Lor td) al oraot alt fins. (a) PAY) = 1/6 (b) PG) = 1/6 fey 5/6 Md) PI) Pi ‘What isthe probability that ina single pisk from a standurd desk ofcards we pick (a a club, (6) anaes, (o) theacr af clibs, [d) nol acloh, ar fe a club er not a Ans. (a) PIC) = 13/82 =1/4 (6) PIA) =4/S2= 1/13 (6) PAC (d) ric) =3/4 fe) PICI+ P= 1 Aw tuo contains 12 balls that as saactly alike encapt that 4 ase ble, Saco 3 ane geen anal 2 aie hile What is the probability that by pisking.a single ball we pick (a) A lve ball? (By A ced ball? fe) A green ball? (a A white ball? (@) A ponred ball? (F) A-nonshit ball? fg) A shite or nonwhite ball? Alco CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a () What are the odds of picking a green ball? (2) What are the odés of picking « nongroen ball? ns: 4a) PB} ~ 1/3 or 0.33 (b) PR) ~ Lidar B25 (o) PIG) —1/for O25 {d) PW) — 1/6 or 0.167 (e) PIR) S078 (F) POW!) = 0833 dg) PW) + PTW) ST (h) 3-9 (9 9:3 Suppose that a card is picked from a well-shufled standard deck, The card is then teplaced, the deck reshuffled, and another card is picked. Ax this procen is repented $20 timex, we obtain 136 spades. (a) What is the relative froquency o¢ empirical probabllty af getting a spade? (hy What is the classical fo a price! probability of gctting » spade? (c} What would you expsct the relative frequency or empirical probability of getting spade to be ifthe proves is repealed many mors times? dmc. (a) 134/820 0.26 (44 PIS) —1/4 (6) To approach 1/4 or 0.28 An insuranos company found thal Gum a sample 6 10000 mew bebwcen the ayes of 30 and 40, 87 become seriously ill during a I-year period, 0) What is the relative frequency or empirical probability of men betwcen 2 and 0 becoming seriowaly il during 4 I-year peviog? (6) Why fe the insurance sompanse iaterested in these sults? — (cb Suppose that the company subsequently sills Realth insurance te 1.387.684 men in the 30 tad age group. How many elaimscan the company expect during a laxear period? Ans. (ay The relative frequency or empirical probability is 87/10,000 = 0.0687. (6) The insnrance-com- pany is interested in the relative frequeney or empirical probability in order to determine ite insurance premninis. fe} 12.073, to the nearest person PROBABILITY OF MULTIPLE EVENTS ae 350 ase What typos of events ase the following? (a) Pioking hoarts or chubs am a. single pick from a dock. (0) Picking diamonds or a queca on a single pick from a deck, (} To successive fips of a balanced cain. td) Two soocessive tosses of a fair dic. (ob Picking two. cards from a deck with 1=placement. () Picking two cards from a deck without replacement, (gb Picking two balls from an ura without replacement, Ars: (a) Mutually cxctasive —(b) Not mutuall exclusive (e) lodependent (d) Independent (e) Independent () Dependent (g) Dependent What i the probability of getting (a) Folie or shone on a simile tas oF a Fair ie? (8) Acer king on single pick from a welhshulflad standard deck of cards? (6) A green or white ball from the ura of Prob. aa? Ans; (ah 1/2 (8) 8/SE or 23. (e) SZ ‘What isthe probability of getting. (a) A diamond or a qussn on a single pick from a deck of cards? (b) A diamond, waqocen, or a King? (6) An African-American ar a woman president of the Linited States if the probability of an African-American president is 0.75, of a woman i 0.15, and of an Afi&can-American woman is 00072 Ans: (a) 16,52 07 4/13 (8) 19/52 (e) 033 What isthe probability of (a) To ones in 2 roll ofa die? (6) Three tile i 3 fips ofa coin? (e) A total of 6 in volling 2 dice simultancously? (a) A total of oes than $ in solliag 2 dice siemultancously? (0) A total af 16 oF more in rolling 2 dice sinmtancossly* Ans. (ah 136 (AV UK te) S36 G16 ted 18 ‘What isthe probability of obtaining the following from a feck of cards: (a) A diamond on the weed piske when the first card picked and not replaced was a-diammond? (6) A diamond on the scoond pick when the Breteard picked and not replaced wat nota diamond? {c) A king on the thind pick when a queen and a jack wwete already obtained on the frst and sscond pick abst aot replaced” Ame. (ah 12/81 (6 SL oe 4/50 What is tee probability of pickings (wh the king of clas sand liacnoud that wre ine pits fom a desks without replacement? (b) A white ball and a green bal in thus order in2 picks without replacement fram the torn of Prob. 115? (09 A preon ball and a white ball thor ordi i picke without replacement from the “ PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 uum of Prob, 3.457 dd) A grestsand a white ball ie shat ode in 2 picks turn? (6) Thos green balle in 3 picks without replacement (toon the ura? Ans. (a) 13/2682 a 1/208 (8) GI132 oF 1/22 feb 122 Ae) YEE fe) 6/1320 oF 1,220 jout replacement from the same SM Suppose thatthe probabity of rtm on a given day i 0.1 and the probability of my having a-cxr accident is (9005 on any day acd L012 seein aye (a) What ce svuhl Vase to cabal the platy that oa a tiven day it will rain anc will have a car accident? (8) State the rule asked for in part a, sting A signify acciddant and R signify rain. (8) Calculate the probability acked For in part a dns. (a) The rule of multiplication for dependent exents (6) A(R und A) = F(R) F(A/R) (2) 002 388 _ (@) What rule or theorem should { use to calculate for the statement in Prob. 3.54 the probability that it was sings lige: Fad a car atsntset? (2) Stale the cule-ov thecesns applable be pat ae (e) Ansties the question i part fans. (a) Baye’ theorem (5) BR/A) = A(R) FA/R)/PIAY 4s) O24 438 In how many ciflewent ways can @ qualified individuals be assignod to. (a) Three trainee positions available if the positions are wentical? 48) Three wainee positions eventually ifthe positions cifer? ) Six trainee roils avails ithe pit lifes? Aus. (a) 20 (8) 130) 720 DISCRETE PROBABILITY DISTRIBUTIONS: THE. BINOMIAL DISTRIBUTION, 3ST The probability distribution of lunch customers al a restaurant is given in Table 3:5. Caleulale (a) the ‘expected number of hunch customers, (8) the varianec, and (c) the standard deviation ‘Table 35 Probability Distribution of Lanch Customers at 4 Restaurant Nasu of Castine 100 10 us 120 12s Ans, (a) 113.1 customers (6) 65.69 customers squared (¢¥ 8.10 customers 358 What is the probability of (a) Getting exactly 4 heads and 2 tails in 6 tosses of a bullaced coin? () Getting 3 sixes in 4 rolls of a fair diet Ans, (a) 923 (6) O0LS4R21 380 (a) 120% of the seadents entering college deop out fore secelvingthels diplomas, find the pesbabilcy that ‘ut of 20 stucents picked at random from the very langs numberof students entering college, less than 3 drop fut (8) If 0% of the bulks produced in a plant are acceptable, what isthe prabahulity that out of 10 bulls, Picked at random from the very large outpot of the plant, 8 are acceptable? ns. (a) 9206 tb) 0.1937 ‘380 Caloulase the expected valve and standard deviation and éewermine the symmetry or asyrametry of the Probability distribution of (a) Prob. 3.5842), (8) Prob. 3.59{a), and (ey Prob, 3.3%) Aeon (a) E(A) — 1 els, SEN — 1.22 haul, aod theistabution és ayunneteical (2) ECE) —4 stntents, SD ¥ = 1.79 students, and the distribution is skewed to the right, (e) (1°) = 9 bulls, SD. = 0.95 balks, and the dietribution ie el:ewed to the let CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS oe 261 What is the probability of picking (a) Two women in a sample of $ drawn at random and without replacemant From a group of B people, 4 of whom are womma? (8) Eight men in a eample of 1 drawn at randoot and without replacement from poputation of L000, half of which are men, Ams (a) Ahont C171 dosing the hypergecmesrie slisritusion) (hy Abst O39 (using the hincwnial approximation to the hypergcometric probability) THE POISSON DISTRIBUTION sa Past experience shows that there are to traffle accidents at an lnrerscsion per week. What isthe probe ability of: (a) Four accidents during a randomly selected week? (8) No accidents? {cy What is the sxperted vahts and standand deviation of the distribution? Aus; a} About 0.36 (6) About 14 (@) BLA} A— 2 accidents, and SD. — VR— Ll accidents Past experience shows that 00% of the national labor force get seriously ill during a year, If 1000 persons are randomiy selected from the national labor Force: (a) What is the expected mumber of workers that oil get sek during a year? (8) What i the probability that S workers will get sick during the year? Ars. (a) 3 workers (6) About 0.1 (using the Poisson approximation to the binomial distribution) CONTINUOUS FROBABILIFY DPSTRIBUTIONS: THE NORMAL DISTRIBUTION as aT am Give the formas: (a the probability that eontinuoys variable X falls berween As and Vs. (8) the normal Slistrinution, (c) the expected valve and variance of the normal distribution, and “{d) the standard normal distribution, fe} what i the mean abd Variance of the standard Hormal disteibution? Ans. (ab PLM) OSD, thea _ 2 [Nan OA rem TeV 30 = 4, witht the jite correction factor instead of op = EXAMPLE 4, The probability that the mean of a random sample V of 36 elements from the popalation in [Example 3 falls between 18 and 24 units i compited as Fallows 18 ang oF Looking up 2; and 2) im App. 3, we get rise <2) =08 13 + MATT2 = O.RTRS, oF BLASS cuar. 4) STATISTICAL INFERENCE, ESTIMATION @ Soe Fig. 42, se ca Sica io 7 cele Fig. a 43 ESTIMATION USING THE NORMAL DISTRIBUTION ‘We can get a point or an interval estimate of a population parameter. A poi estimate is a single umber. Such & point estimate is wibiased if in repeated random samplings from the poputation, the expected oF mean value of the corresponding statistic is equal to the population parameter, For example, is an unbiased (point) estimate of because pg = p, Where jy #8 the expected value of The sample standard deviation sfas defined in Eqs, (2.20b1and (2,1Jbi] is an unbiased estimate of {sce Prob. 4.13(6)). and the sample proportion jis an unbiased estimate of p (the proportion of the population with a given characteristic). ‘An interval extimare rofors to a range of values together with the probability, or confidence level, that fhe interval includes the unknown papnlation paramcter Given the population standard deviation ar its estimate, and given that the population is normal or that a random sample is equal to or larger than 4, we can find the 95% confidence interval for the uakinown popubation mean as PUL — 1.960ry < p< 8 + 1.9604) = 0.98 (4) This states that in repeated random sumpling, we expect that 95 out of 100 intervals such as Eq. (4-4) include the unknown population mean and that our confidence interval (based on a single random, sample) is one of these. A confidence interval can be constructed similarly for the population proportion (see Example 7) where (the proportion of suscesses in the population) 43) (the standard error of the proportion) (668) EXAMPLE 5. A random sample of 144 with a mean of 100 and a standard deviation of @) is taken from a population of 1000, The 95% confidence interval for the unknown pptlation mean is £1 mep since n > 30 £196. since m > 0.058 = 100 4 1.9622, OORT sing sat an extmate of © aa 1000 = 7 — 100+ £9048) (093) = 1040.1 Thus sis between 9,89 and 109.11 witha 95% degree of eonfidenes. Other frequently used confidence intervals are the 80 and 99%; level, corresponding ta the 7 value: of 1,64 and 2.5%, respsctively (ose App. 3 70 STATISTICAL INFERENCE, ESTIMATION [omar 4 EXAMPLE 6, A manager wishes to estimate the mean number of minutes that workers take to compli particular manufacturing process within 43 min and with 80% confidence, From part experience, the manager Knows that the standard deviation o is 15min, The minimum required sample sie (w > 30) is found as follows: x oF sop =X we 1a aecaming 1 805N 1s ret 8 ra re SL 3 ince the total confidence interval, fe 3 min 167.24, of 68 (rounded to the next higher integer) EXAMPLE 7, A suste clucation departarent finds that ina random sample uf 100 persons why aitendal college, sérrcceived a college degree. To find the 9% confidence interval for the proportion of college graduates out of all the persons whe altended college, we precoed as follows. Firat, we note that this problem tnveives the binomial distribution (sce See. 3.3}, Since. > 30 and both op > $ and {1 — p) > S, the binomial distribution approebics the normal distribution (which ix simpler to use: sce Sec. 15). Then an papery assuming a < 005) 59,0008) 258) SE sing as an estimate of p was 7 58(0.05) oat o13 ‘Thus pis between 0.27 and 0.53 with a 99% level of confidence 44 CONFIDENCE INTERVALS FOR THE MEAN USING THE ¢ DISTRIBUTION ‘When the population is normaly distributed but ¢ is not known and w < 30, we cannot use the noriial distribution for determining cosfidence intervals for the wiknown population mean, but we can Use the Faistbubion, Tus is symmetrical about sts zeta mean Du i Haller than the standard normal distribution, so that more of its area falls within the tails. While there is a single standard normal stistrbution, there diferent J distribution for each sample size, x, However, asm becomes larger, the 4 distribution approaches the standard normal distribution (sce Fig. 4-3) until, when > 30, they ate approximately equal Appendix 5 gives the values of 10 she right of which we fine 10, 5. 2.5, 1, and 0.5% of the total area wunder the carve for various degrees of freedom. Degrees of freed (4) ate defined in this case as a — | Standard normal dissin > 2X rineation, 93 cuar. 4) STATISTICAL INFERENCE, ESTIMATION a (or the sample size minus I for the single parameter j we wish fo estimate). The 95% confidence it for the unkiwwn population wican when the ¢ distribution is used is given by o(e- 2 The sandurd exer ofthe mean 2 i cven by the standard deviation ofthe parent portation 2 divided by the mare foot ofthe samples sie J that So = o/ va. Pos fie populations ice N,fintle correction factor most be added, and of = (o//byGW = n)TN =}. However af the sample si is ‘ery smal ineation tothe poptation sie, /(N = n)/( = [}igetose to | and canbe dropped fromthe formuls, By convention, this is dome whenever n = O0SN, Independently of this tinite correction factor, np is drctly related to. and iversely elated to in [soe Eq. .20.8)) Thus increasing the samples sie 4 times increases the accuracy of as an estimate of by catting oy in half, Notc also that ‘9p s anways smaller an 9. he reason tor this Wat the sample meats, 38 areager of IME pe ‘observations exhibit Iss variability or spread than the population values, Furthermore, the lrgerare the sane siee, he mone he valucoat'apaneuveragsl Uk wits repost Ihe valuvst Gs Figs 4+. For a population composed of the following $ mumbers: 1, 3, §, 7, and 9, find (a) ye and 2, (B) the theoretical sampling distribution of the mean for the rample size of ?, and. (c) yp and 0 la LEW _ls3454749_ 25 (6) The theoretical sampling distribution of the sample mean for the sample size of from the given finite population m is given by the mean of al che pusshle different samples that can be obtained fom this Dopulanon, Me MuMDsr oF commPmuacoss OF 3 NUENDEKS fakeN 2 at LANE EERO concern For Ae Or? 1s SY/2188— 10 (sae Prov, 3.18), These 10 samples are 1,351,551. 1,8: 3,5: 3,2, 3,9:5 7: $,9; and 7,9, ‘The mean, of the proceding 10 samples ts 23,4, 3,4, 8.6, 6, 7, 8. ‘The sheoretiea sampling dlistribacion of the mean is given in Table-4.1, Nove thatthe variability ar spread of the sample means (from 2 to.8) is less than the varity or spread of the values in the parent population (from | to 9. confirming the statement made at the end of Prob. 4.55). (©) By applying theorem 1 (Sec. 42), 1g = y= 5. Since the sample size of population sine (that is, > 0.05V), greater than 54 af the vi OK Ver ” 48 STATISTICAL INFERENCE, ESTIMATION lemar. 4 ‘Table 41 Theoretical Sampling Disisibuiion of the Mean Values of the Mean | Possible Quicomes | Probability of Oveurrenee 2 2 a1 3 3 ol 4 4 a2 $ 45 02 6 ‘ 2 , 7 a 8 8 a Total Lie or the theoretical sampling distnbution of the sample mean found in Frob. 4.0(0+ (a) tind the mean ane the standard error of the mean asing the formulas for the poputation mean and standard deviation given in Secs. 2.2 and 2.5. (0) What do the answers to part a show? PEASME STA HERE THR Pear pOTIsOsTSITaT_ Og PEREAESESES EAESESEC BOLE ELS (6) Theanswers to part a confirm the rests obtained in Prob. 4.5{¢hby the application of trarem F (See. namely, that ap =y and op = (o/ ya)y(N=m)/(W~ 1) for the finite population where n> 0.056, Noe thal We LOOK alf the postin diferent samples of size 2 that me cout take from our ite population of $ mmibers, Sampling from an infinite parent papalation (or from a finite parent ‘population with replaccment) would have required taking an iil number of randem samples of sie ‘frome the parent population (an abuiously impossible task), By taking oly a fonited number of random samples, theorem I would hold only approximately (iss. yy ™ wand vy % yA with the approximation besoming better as the number of random samples taken is increased, In this cass, the tarepling distribution of the eampbe mean gerurated ig refereed to athe ompé (the ae, A population of 12.000 elements has a mean af 100 and a standard deviation of ¢#, Find the mean and standard crror of the sampling distribution of the mean for sample sizes of (ab 100 wk hy 00. (al a) up aa Since a sample of $00 is more than 5% of the population size, the finite correction factor mst be wsed jnthe formala fr the eiandaed error cuar. 4) STATISTICAL INFERENCE. ESTIMATION 78 60 [iano 900 60 | om TMT 1 = 30) To 294.9% oe 20,982) a 1.2 net oe Without the correction factor, - word have been equal to 3 instead of 1,92, (a) What i the chape of the theoretioal sampling distribution of the moan if the paront popula tion is norma? Ifthe parent population is not normal? (2) What is the importance of the answer (© part a? (a) Ifthe parent population is normally distributsd, the theoretical sampling distributions of the mean are also normally dstribuied, regardless of sample size. According (o the centru lime dhearem, even if the parent population is not normal, the theoretical sampling distributions of the sample mean approach Normality as simple size increases (Le.,asm— co), Thisapproximation is sufficiently good for samples of at east 30, (6) The contrabimit theorem is perhaps the most important theorem in all of statistical inference. Te alloms us to use sample statistics to make inferences ahout population parameters without knowing. anything about the shape of the parent population. This will be dane an this chapter and in Chap. 5. (a) How can we calculate the probability that 2 random sample has a mean that fall within a given interval if the theoretical sampling distribution of the mean is normal or approximately: normal? How ie thit different feom the procoss of finding the probability that a normally dis twibuted random variable assumes a value within a given interval? (2) Deaw a noemal curve in. the ¥ and zecales and chow the percentage of thearea under the curve within 1, 2, and 3 standard. deviation units of ite mean, () [the theoretical sampling distribution ofthe mean is normal or apprositmatcly normal, we can find the probability that a random ample has a racan that falls within a given interval by calculating the sorresponding 2 values in App, 3. This is analogous to what was done i See, 3.5, where the normal and the standard normal curves were introduced. The aly diflerence rs that aow we aze dealing ‘sith ‘ue distribution of the 1+ rather than with ihe distribution of the 1s. In addition, Before (X= nie, while now 2 = (4 — ue) /ee=(X —al/or, sinoe ap (6) In Fig 4.5, we have a normalcurve in the 1 scale and a standard normal curvein the rseale. The area Kectle a a er Heelers: a st ort ar Fig. 45 16 STATISTICAL INFERENCE. ESTIMATION [onar, 4 ‘Find the probability that the mean of a random sample of 25 elements from a normally diss twibuted population with a mean 90 and a standard deviation of OD is larger tha 100, ‘Since the parent population is normally distributed, the theoretical sampbing distribution of the mean is ako normally distributed and op = 7/ /m because w-< O0SN. For X= 100 kop www or elyn GOVE TE ‘Looking up this vabue in App. 3, we pet 083 PCE > 100) = 1 ~ (0.5000 +0.2967)= 1 ~ 0.7967 = 0.2033, o¢ 20.33% See Fig. 46, Atk Euale Fir 4 4.12 A small local hank has 1450 individ wl sivings accounts with an average balance: of $3000 and a standard deviation of $1200, If the bank takes a random sample of 100 accounts, wit is the probability that the average savings for these 100 accounts will be blow $2800? ‘Since w= 100, the theoretical sampling ditibutioa of the mean is approximately normal, but since > WUBIN, the finite sorrection factor must be wsed fo find rp. For X= 82800 N-up =u 280 — 5000 m2 sy ao Nan 1200 flaso. 19 [380 OR VaYN-1 JinoV 0 oy ia 73 im App. 3, we pot PCY < 82800) = 1 — (0.5000 + 0.4582) ‘Looking adit, on 4.18% See Fig, 47, ESTIMATION USING THE NORMAL DISTRIBUTION 413 What is meant by (a) A point estimate? (@) Unbiased estimator? te) An a) Because of cost, time, and feasibility, population parameters arc frequently estimated from sample statistics, A sample viaistic used to estimate a popwlation parameter i called an exsimaror, and specific observed value is called an estimaie, When the estimate of an ueknown population parameter is piven by a single number, itis called a poiw eseimate, For example, the sample mean is an feslmator of the population mean, and a single valve of fsa point estimate of Similarly, she ‘sample standard deviation scan be Gsed as an estimator of the poptlation standard deviation @ and single valus of» isa point estimate of u. The sane proportion psa be used as an estimate Fr the population proportion p, and a single value of 7 is & point estimate of » (ue. the proportion of the popelation with a given characteristic) val estimate? CHAP. 4) STATISTICAL INFERENCE, ESTIMATION 7 (6) A estimator is anblesad if in repeated random sampling from the population the corresponding ttatictic frora the theoreiical campling divteibution x equal to the population parameter, Another Way of stating this is that an estimator i unbiased if its expected Value (see Probs. 3.20 and 3.31) is qual to the popmlatinn parameter being estimated For example, ¥, « [einer in Fas. (106) ant @.NB)), and pare unbinsed estimators of w,.0, and p, espectively. Other important criteria for a good estimator are discussed in Sec. 6 (0) A fetcratessimate refers to the range of values wsed to estimate an yaknown population parameter gether with he peababiliy, oF candice level thatthe interval dis doch the va kas BopMsTc parameter. This ic known as. eowfdence inerval and is usually centered around the unbiased point sstimate, For example, the 95% coniidenos interval for ur is given by FN = 1960 < ps Nop 360.9) = 0.95 The two mimbers defining confidence interval ars called confidence fits, Because an interval ctiate also expresses the dogsse af accuracy o¢ cowialence we have it the estionate, st i SUpEHOr te 2 point estimate, 4.14 A random sample of 64 with a mean of 50 and a standard deviation of 20 is taken from a. population of 800. (a) Find an interval estimate for the population mean such that we ire 93% confident that the interval includes the population mean, (8) What does the result of part a tell us? G2) Since n> Hi, 20 can woe the 2 wulue of 1.56 from the standard normal cstsibution to construct the 98% confidence interval for the unknown popalation and we can we sas an estimate for the unknown oo en see he RA" (anes a tn an Ze a Sf EE eens om tan ow WAT ta hi rsh a [N=a ~ 8 Oy ogo 24 a - 2 a ao aN =17 yea soo 1S (24) oe $0.4 4.30, Tia is Beton the Hower contidenes nt of 46.2 nthe upper condense limit of $4.7 witha 98% level of sontders. () The result of par tells us that if we take from dhe population repeated random samples, cach of sine 11 = 64, and construct the 88% contidenes interval for cich of the sample means, 98% of these cone fidence intervals will contain the trae unknown popaation mean, Dy assoring tha cur confidence interval (based on the single random sample that ws have actually ken) esone ofthese 95% sonidence that include p, we take the calculated rik of being wrong Sof the tne A random sample of 25 with a mean $0 is taken from & population of 1000 that is normally disteibuted with a standard deviation of 1 Find (a) the @0%, (8) the WM%_and (6) the 9% confidence intervals for the unknown population mean. (aly What dogs the difference im the results to parts 2, b, and ¢ indicate? @ W=N-L16top seh youn i normally dt Jee E64 se <1 anes 04m i i were vis = wats so+ 98d 78 416 47 STATISTICAL INFERENCE, ESTIMATION lemar. 4 ‘Ths jis betmcen 70,16 and 89.94 with 0% confidence. (a) be 80 1:96(6) ~ HN ILS ‘Ths jis betmsen 68.04 and 91,76 with 9695 level of eonfiaens, be BALE 2,58(6) ~ 15.48 ‘Ths jis bermcen 64.52 and 95,48 with 49% keel of confidence, (i) ‘The results of pars a, 8, and ¢ indicate that as We inerease the degree of contin required, the size of the confidence interval inereases and the interval estimate: becomes mare vagne (he. bess precise) Honever, the degree of confidence associated with a very narrow confidence Interval may be $0 low sssto have litle meaning. By-canvention, the most frequently used confidence interval is #5, followed by 90 and 994 ‘A-random sample of 36 students is taken out of the S00 students from a high school taking the college entrance examintion, The mean test score for the sample is 33H, and the standard deviation far the entife population of S00 students is 40. Find the 95% contidence interval Jor ine unknown population mean score, Since 1 1, the thesia nenpling Atribtion rhs aan I appeaeNARy Ane Alu, seas > 005 [RaW _ 0 HHH. og Ve NT a V apm T Ot Then wa Leoy = 3804 1.96164) ve 3021254 Thus is between 367.46 and 392.54 with a 95% level of confidence, ‘A researcher wishes tn estimate the mean weekly wage of the several thousands af workers employed ina plant within plus or minus $20 and with a 99% degree of confidence. From. ast experience, the meearcher knows that the weekly wages of these warkers ne normaly distributed with a standard deviation of $40, What is the minimum sample size required? ina S168 = 26.3, or 27 (rounded to the nearest higher integer) (a) Sotve Prob. 4.17 by first getting an expression for n and then substituting the values fram the problem info the expression obtained, (6) Why is the question of sample size important? (e) What is the size of the total confidence interval in Prob. 4.17? (d) What would have to be the sample size in Prob. 4.17 if we had not been told that the population was normally distributed? (e? What would have happened if we had not been told the population standard deviation? (o) Searing with an/J= 2 yu(soe Brab, 4.17) we get 29/(8 ye) yi Thue CHAP. 4) STATISTICAL INFERENCE, ESTIMATION » (5) so Substituting the values from Prob, 4.17, we get ne ES] = 2668, or 27 (the same as in Prob. 4.17) (6) The question of sample size i important because if the sample is too small, we fall to achiews the objectives of the analysis, ang ifthe sample istoo large, we waste resources beearuse i is more expensive 10 colleet and evaluate a larger sample. (0) The size ofthe total confidense intceval ia Prob, 4.17 is S40, of twice N— x, Since we arousing Jas at catimate of 1 — gis sometimes referred to a8 the error af the estinate. Because in Prob, 4.17 we ‘want the error ef the estimate to be “within plus or minus $20," we get — = $800, ora range of $40 for the total confidence interval (a) Uf-we had not been told that the population was normally distributed, we would have had to increase the sample to at east 30m Prob, 4.17 im onder to justly the use of the nommal distribution, (©) [fms had not been told the value of, we auld not have solved the problem. (Since we were deciding fon what camp ce to take fn Prof, 4.17, we could not porsbly have known the © nvear an esimate of 0.) The only way we sould estimate «(and thes approximate m) would be if we knew the range of wages Fn the highest tothe lowest Since £ Ie inches 99.78 of all the agra vase the-normal ere. wwe cauld have equated Go with the range of wages and thus estimate @ (and solve the prablern) With reference to & binornlal distribution, indicate the relarlonship between 4a) pean gigs (1 and A, and fo} wap. and dy. a) = np = mean monber of succeses in. tials, where pis the probability of succes in any of the trials (ee See. 3.31. 4p = w/e p = the proportion of swesesses of the sampling distribution of the propor- tion, Ub) p— the proportion of succes i he peputuian, and f — the proportion of accesses ds the sample (and fan unbiased estimator of ph (0) o = Var pT ~ starr eatin of the mums suena re palo ata smd ene 9 (v4 yn lee hon 9608" (28 4.20) Fora random sample of 100 workers In a plant employing 1200, 76 prefer providing for thelr own retirement benefits over belonging to a compans-sponsored plan, Find the 95% confidence interval for the proportion of all the workers in the plant whe prefer their own retirement plans 80 an 42 STATISTICAL INFERENCE, ESTIMATION singe 4 > 30 and mp > Sand a(l —p)> 5 since w > 0.05 =07 2196) S) a a= sing fa am estimate for p 0.7 1 9640.05)(0,96) so7z009 Thus {the proportion ofall the workers in the plant who prefer their oun retirement plans) is between 0.6 and 0.79 with 95% degree of eonfidence. A polling agency wante to estimate with 964% level of confidence the proportion af voters who would vote for a particular candidate within +£0.06 of the truc (population) yeoportion of voters. What ic the minimum sample cize required if other poll: indicate that the proportion voting for this candidate is 0.307 % poe a nf pp prorat n cas 164 [PMO og EERO OTI — oqn36 by squaring both ses (2.0896910.3)¢07) = RESSPNO SOT) 156.59, oF 187 (a) Solve Prob, 4.21 by first getting. an expression for mand then substituting the values fram the problem into the expression obtained. {b) How could we still have solved Prob. 4.21 if we had not been told that the proportion voting for the candidate was 0.30? (eo) Starting with 24/00 pie 7 * aa 2 ‘ell = -(- and pa) ene vn Pp tuve Prob. 4.21), we get 1422) Sutmsitusing the valucy from Prob. me pet (Loy"taayo.7) _ Qosseyo. ‘O08 aos 15689, oF 157 (dhe same as in Prob, 4.21), (}) Ih we had nos boen tid that the proportion votIng for the candidate was 0,30, we ewuld estimate the largest value of m to achieve the precision mutter what the acta! value of pis This # done uy Letting p — 0.8 (ao that I~ ~ 0.5 al), Shae pf — pe} ayaa iw the erator of He fon fe 1 (ce part «) and this product is greatest when p and 1p bath equal 0:5, the vale of wis greatest Thor CHAP. 4) STATISTICAL INFERENCE, ESTIMATION 81 p(t =p) _1.63(0.500.5) _ (26896) 0.25) ot oe z 21868, or 187 (nstead of w — LST when we were told that p 0-30). In this and similar cases, trying to get an actu estimate of p does not greatly reduce the size of the required sample. When p is taken to be 0.5, the formula for m can be simplified to en Using this, kr got CONFIDENCE INTERVALS FOR THE MEAN USING THE ; DISTRIBUTION 4 44 (a) Under what conditions can we not use the normal distribution but can use the J distribution (wo final coufidence interyals for the unknown population sucan? (8) What is the eehstiuaship between the rdistribution and the standard normal distribution? (c) What is the relationship between the ands statistics for the theoretical sampling distribution of the mean? (a) What is meant by degrees of freedom? (a) When the population is normally distributed but the population standard deviation o i not known and the sample sci smaller than 3, we cannot usc the normal distribution for determining confidence intervals forthe unknown population mean but wean vse the Students (or simply, the f) distribution, (b)- Like the standaedt narmal Aistihution the 1 istibuting is bell-shaped and symmetrical akoat is era ‘can, but itis platykurtie (se See, 24) or latter than the standard normal distribution so that more of its area falls within the tails, While there is only one standard normal distribution, there is a different ¢ distribution for each sample size x, However, as m booornes larger, the ¢distibution approaches she standard normal distribution until, when w > 30, they are approximately equal @ and is found in App. 3. (ery land is found in App. $ for the degrees of freedom involved, (a) Degrees of freedom (af refer to the number of values we can choose freely. For example if we deal with a ample of and we know thatthe sample mean for these two vals is 10, ne can freelv asian the value to only one of these two numbers, Mone number is 8, the other nurnber must be 12 (to get the ican of 1). Then we say that me have a— 1 =2— 1 = Ed, Similarly, ifm = 10, this means that we can Freoly assign a valoe to only 9 of the 10 values if we want fwestimate the popolation mean, and 30 we have n= P= 10-129 at (2) How can you find the ¢ value for 10% of the area in each tail for 9 df? (B) Ta what way are ¢ ilues interpreted differently from z values? (e) Find the # value for 5, 2.5, and 0.5% of the arca within cach tail for 9 df. (4) Find the rvalue for 5, 2.5, and 0.5% of the area withim each tail for a sample size, m that is very large or inf How do these ¢ values compare with their corresponding = values? (ah The sas fav IDS the ak within ch dao cai Ley ag doi the ces a O10 ae App. Sto df. "This pives the rvahieof 1.383. By symmetry, 10% of the area under the ¢ distribution ‘with 9 alco Kee within the lot tail, to the leit of = 1383, 425 STATISTICAL INFERENCE, ESTIMATION lemar. 4 () The ¢ values given in App. 5 sefer to the areas (probabiltics) within the ‘all's) of the ¢ distribution indicated by the dograes of fevedom, However, aloes given ia App. 3 rer to the arsas (probabilities) under the standard mortal curve jrom the mean tothe specified £ values (eaeapare Example 4 with Frample 8 (6) Moving down the columns beaded 0.05, 0.025, and 05 in App. § t0 9 df, we get ¢ values of 1.833, 12262, and 2.280, respectively, Racouse of symmetry, $2.8, and 0.59% of the area within the left tail the Fdistmbuion for 9 lf lie to the let of f= —1833, 1 2.262, and f= 3.250, respectively () For sample sees (ann at ane ery ange or afin, Hage = LOS, danas = 1.960, anal ggys = 2576 (irom the last rom of App. $1 These coincide with the corresponding = values in App. 3. Specifically, fuges = L960 mans that 3.882 of the ars under ther distribution with sdf Wes within the ight tai, €2 the right-of = 1.96, Similarly, 2~ 1.9% gives (rom App. 3) 14780 of the arsa under the standard normal curve From si O40 r= 198, Thus, Far df ==] = co. the Plitribution is identical to the standard normal curve A random sample of 25 with a mean of 80 and a standard deviation of 30 is taken from. a population of 1000 chat és normally distributed, Find (a) the A%%, (A) the 954%, and ¢¢) the 99% confidence intervals for the unknown population mean. (d) How do these results compare with thase-in Prob, 4.157 (a) L711 for 24 af oss ene 674 an 1.206 th ef oa ” oo or 24 a Wo BO 20 Tw AE er eter 4 a 8 ef oe yo 19 or 34 Ge tpamarm mets Thus jis between 63.218 and 06.742 with 99% degree of confidence. (d) The 90, 95, and 99% confidence intervals, as anticipated, are larger in this problem, where the 1 disttbution was used, than in Prob. 4.15, whese the standasd normal distribution was used. Hawwever, the diferenars ate not great because when w= 25, the distribution and the standard normal distifou- tiom are laily similar, Note that in this problem we had to use the f distribution becase rwas given (al wot, as at Pools 415% Arandom sample of 1 = 9 lightbulbs with 1 mean operating life of 300 and a standard deviation fof ASh it picked from a large shipment of Vightbulbs known to have a normally distributed: operating lS, (a) Find the 90% confidence interval for the unknown mean operating Ife of the entire chipment. (8) Sketoh a figure for the reculte of part a @ gas — L880 foe 8 af be Pardy os tae$S sae ‘Thor jis approximately betwaon 272 and 328 h with a 90% level of confidence. cuar. 4) STATISTICAL INFERENCE. ESTIMATION 82 any 424 (6) See Fig. #8. A random sample of = 23 with = 80 Is taken from a population of 100 with @ = 30. Suppose that we know that the population from which the sample is taken is not normally distributed. (a) Find the 95% confidence interval for the unknown population mean. (6) How does this result compare with the resubts of Probs. 4.15(6) and 4.2S(b1? (a) Since we know that the population from which the sample i taken is nat normally distributed and 11 30, we cam use asither the normal nor the ¢ distribstions, We ean apply Chebasher's theorem, Thich sates that regardless of the shape of the distribution, the proportion of observations (or area fallg withix K standard deviations ofthe mean) is at last |— (1/2), for A> 1 (ee Prob. 3.40) Setting f= (1/08) =0.95 and solving for we get a 20 Then wake ames 2 wos 2682 Vi 3 Thus 11s approximately Between 53.and 107 with a 95% level of contidence. (6) The 98% confidence interval using Chcbyshev’s theorem is much wider than that fours when We could use the normal distribution [Prob, 4.1509] or the # distribution [Feob. 4.25(0)]. For this rason, Chebyshev’s theorem is seldom used to find confidence intervals for the enknown population mean. However, it represents the only possiblity short of mereasing the sample size Lo at least 30 (60 thatthe ‘nopmal distribution can be used. Under what eond:tions can we construct confidence intervals for the unknown pepukation mean from a random sample drawn from a population using (a) The normal distribution? (6) The ¢ distribution? (c} Chebyshev’s theorem? (o) We can use the nocmal distribution (1) ifthe parent population is normal, » > 30, apd « or s are Snows; (2) ifm > 30 (by invoking the central-timit theorem) and using s estimate for 6; or (3) if = 30 but o is given and the population from which the random sample is taken is known to be normally distributed, (6) We ean uss the edisteibution (fs the given digress of freedoms) whea «230 bat it nat given and the population from which the ample is taken is known to bo normally distributed. (0) Tees 20 but the populatin Gm which the das sap o taken a od Kani te be cnally lstrdbated, theoretically ws should use neither the normal distribution nor the distribution, In such ‘cor, eithor wa chord ea Chebychev's theorem or school increace the ssa af the random camp to st STATISTICAL INFERENCE, ESTIMATION lemar. 4 ‘2 = 0.0 as to be able to use the normal disriation). fn reality, however, the eisribution is sed seven in thene cater. Supplementary Problems SAMPLING 429 430 (a) What does statistical ijerence sefer to (§) What are the names of the descriptive characteristics of populations and samples? (c) How can representative samples be obtained? Ans. (a) Estimation and hypothesis testing (4) Parameters and statisties (et By random sampling (2) Starting foam the thied columa apd tunth row of App. and reading horizontally, obtain a camp from 99 elements. (8) Starting from the seventh columa and frst roo of App. 4 and reading ve tain a sample af 10 from 46M clement dws. (a) 31, 13,33, 67, 68 (B) 24, S4, 290, 218, 385, 130, 24, 72, 313, 397 SAMPLING DISTRIBUTION OF THE MEAN 4 on 4 as How-can-ve obtain the theoretical sampling distribution ofthe mean from. poptulation which is ta) Finite? (@) Kotinite? ans. (4) By taking all possible diferent samples of sie @ from the population and shen fading the mean of ach sample (6) By (hypothstically) king an infinite number of samples of size » from the infinite eplatcns anl thes ing ee sn ee age ‘What is (a) the mean and (5) the standard error for # theoretical sampling distribution of the mean? Ans. (a) Jo = where jis the mean of the parent population (6) oy =f vi where oi the standard deviation of the parent poplin and m isthe sample ie; oe ite poplations a see N wheven > OASN, og = (oval = mith = 1) Foca papnlatin of 100 tors, p= ane = 1 What ethene atl stanwaed orrne af The thetic sampling distribution of the mean for sample sizss of (a) 28and (b) 817 din, (a) sey = 50 gits and oy = 2 48) ap = Sits and rp = 107 What i the shape of the theuretical sampling distibution of the mcan for samples of fa) 10 the parent population is normal? (6) $0 ifthe parent populations not normal? (c) On what was the answer t part 8 busca? Ans. (a) Nomal 16) Approximately normal (c) The eeatrabtimit theorem What ic the statistic for (a) Random variable XP (8) The theoretical sampling distribution of 2 Ans, (a) 2=(N—ylie (6) 2= (8 alfa What ic the probability of 1 fying between 49 and 50 for a random sample of $6 fear popslation with tb and o= 12% ‘dns. 01498, oF 14.98% What i¢ the probability that the mean for a ransom sample of 14 accounts receivable drama from a population of 2000 accounts With can of SUOMO and & stacubaad deviation of $4000 will Le betes 91500 and 510,500? dns. 08813, oF 88.13% CHAP. 4) STATISTICAL INFERENCE, ESTIMATION BE ESTIMATION USING THE NORMAL DISTRIBUTION 48 an What are unbiased point estimators of ys, 2, and p, respectively? Ans. 5 [as debined in Eqs. (2.108) ad (2.1761), and p Using the stanelacticn! macrnal distesbutin, stow foe ye Go) the 90%, CH) the WSR, a (o) the BO confidence intervals Ans, tah AY = L6bop (by Mhoperating hoars(e)» wookd have had co be increased to 301 justi them of the normal dxtniiion FFor the binomial dtsrbutlon, wrke the formula for (uy sand o, (by op amd dp when # < 0.05%, and (6) whee n >A.a5 Aes (ab on aie — ATP) op — Vie — Pia ad dy — VU 10) Oy = VT PITH x VN For a random sample of 36 graduate students in economics in a graduate cconopnics program with $80 students, § students have an undergraduate degree in mathematics. Find the proportion of all graduate studens at this university with an undergraduate major in mathematics at the 90% comfidenae hve Ans, O11 40 0.33 A. manulacturer of lightbulbs warts (0 catimate the proportion of defective lightbulbs within 0.1 with a 96% ogres oF evade What isthe ini saruple sie eoypinet if previnns experience idicates that the proportion of defective light bulbs preduced is 0.2 Ans, 62 (ua) Waitedowa taesapression forte solve Prob, 47, (Lp Hlow-cuull still ave solved Prot. $47 if the sanafactrer did not know tha Ans: (ah 2p ap (8) Ry Itting p= 0.5 and n= 87 86 STATISTICAL INFERENCE, ESTIMATION lemar. 4 CONFIDENCE INTERVALS FOR THE MEAN USING THE : DISTRIBUTION a9 40 4st Find the evalu for 29dffor the Following areas falling within th (right) til of the sdstritation: (ay 10%, (6) S85, fed 2.554, and) 056%, Ans. (a) fayy = LET (8) fens = 109 el dawns = 20S (a) fae = 2.786 FFind the : value for the following areas falling from the mean to the ¢ value under the standard normal curve: (a) = 40%, (H) F— 45%, (e) F~4TSN, and [d) TAOS fe) How do these = salucs compare with the corsesponding ¢ Yalues found in Prob. 449° ins. (a) 2= 128 (b) 2= 185 fe) z= 196 (d) 22.88 Ce) Corresporling 2 and ¢ values are very similar (compare = 1.28 to f= 1.811, 2 L.65t0 1 = 1,699, 2= [9610 1 = 2045, and 2 = 2.38 10 7361 Arrandom sample of m= 16 with X= 5M) and ¢= 10 ictaken from a very large npalation that is normally distributed. (a) Find the 95% confidence interval for the unknown population men. {5} How would the answer have differed if = 10? ins, (a) $467 to 35,33 (using the ¢ distribution with 15 ef) () 45.1 to $4.9 (using the standard normal Alistribution) On. particular test for a very large statctiss clase, random sxmple of m= 4 students has a moan grade Wa Tard 58 The onde far the entincelass ate knowin to be nermalty distrihered. Fae thence population mean of the grads. find (a) the 95% confidence interval and (h) the 99% coafidence interval. das. (a) Approximately from 62 10 88 (6) Approximately from 5 Avrandom sample of n= 16 with Y= 50 and s= 10 is taken from a very large population that nomnally distributed. (a) Find the OS. aanfidance interval for the unknown popula (@) How isthe answer in part « diffrent from those of Prob, 4.517 Ans. (a) 38 to 61 fusing Chebysher’s thoorem and + as a rovsh ertimate of «) (B) The 858% eanfidence interval bere is uch wider than those found in Prob. 431 Indicate which distribution to use in onder to find eunfidense intervals for the unknown popalation mean from a raneom sample taken front the population inthe fllawing cases: 4a) w= 3éand.s= 10, 48) r= 20 And + — 10 and the population is normally distribated, and (cb 11— 20and s — 10nd the population is not rnomnally distributed, dns. (a1 Noswal dutrbation invoking the centeal lst thease and using rae an extineats of) (by The ¢ distribution with 19 df (e) Chebyshev's theorem Statistical Inference: Testing Hypotheses §. TESTING HYPOTHESES Testing byporkeses about population characteristis (such as j and a) is another fundamental aspect of statistical inference and statistical analysis, In testing hypothesis, we start by making am assumption with regard to an unknown population characteristic We then take a random sample from the Fepelation, and on the basis of the corresponding sample characteristic, we either accept oF rejeet the hypothesis with a particular degra of confidence. We can make two types of errors in esting a hypothesis. First. on the basis of the sample (formation, we could reject a hypothesis that is in fact true. This és called a rype error. Second, Wweecould accept a false hypothesis and make a sype IT error. We can control or dctermine the probability of making a type Ferror, a. However, by reduc; wwe ll have tor st prohnbility af making a type TT error, A, unless the st increased. a is called the level of significance, and | ~aris the lev af confidence of the test EXAMPLE 4. Suppose that a fim producing lightbulbs wants to know if team staim tha its lightbalbs fuming hour, 4, To do this the firm can take a random sample of, sy, 100 hulls and sind their average lietume W. The stmalle the difference is between V’ apd je, the mare likely s acceptance of the hypothesis that y= 1000 brimming: hours ata speciied level of significance, @, By sctling eat $%, the frm aoexpts the calculated risk of ‘Of the tHe. By setng @ aN I", Re frm WOH! fase a greater probably of accepting 52 TESTING HYPOTHESES ABOUT THE POPULATION MEAN AND PROPORTION 9s follows ‘The formal sieps in testing hypotheses about the population mean (or proportion) a 1. Assume that js equals some hypothetical value jig. This is represented by My: j= jey and is called the sald Aypathests, Une allernative hypotheses ake Men Hy: je My (Fead “ju 18 HO equal 10 fig"), Hy se > fe, oF Hf: je < fig. depending on the problem Decide om the level of significance of the test (usually $%, but sometimes 1%) and define the ‘accepiance region and rejection region for the test using the appropriate distribution 3, Take a random sample from the population and compute X, 1f¥ tin standard deviation units) {alle iw the acceptance region, accept Hy; otherwise, reject Hy in favor of H ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 85 STATISTICAL INFERENCE, TESTING HYPOTHESES [omar 5 EXAMPLE 2. Sepposc that the firm in Example I wants ta est whsterit can claim that he ihtbalb it process Inet [000 Burning hers, The firm esr random rape of no Te te ght dfs that she empl scan T'=960 hand the sap standard deviations = 0h. Ifthe i wants o conduct the tit al the 5% velo Snificame tshonldprweed as falls. ince ral eral to. larger than, osm han 1, the Sm ‘Book at the all and sllermativehxpotheses x My w= 0m wg 1000 Sine = 30, denen tiation of the va apytesinaey seal (aa we a we swan “The acsptance pion of th testa the $Y lve of signconces within 41:96 under the standard normal curve aod the ration region i ots foe Fp. 5 1p. Sino the ejection region sn both ai, ve have awe aie, The third sep iso find the # wae sorrssponding fo T: Tse _ 990 — 1000 dn “R077 Telesoneglen —Acsopance reps Radecka ga ig. 51 Since the caleulate = vahue falls inthe rejection region, the firm should reject My, that = IO and accept My, that i # 1008, at the 3% level of significance EXAMPLE 3. A firm wants to know with a 95% level of confidence if it can etaim thatthe boxes of detergent it sell coataia more than £00, (about I.1/b}of detergent. From past experince the frm knows thatthe enous of Stergnt inthe bos is normally dsibuted.- The frm takes a tandort sample of m= 2S and fndsthat V = 30g and s= 7S q. Sings the tr is interesed in testing if ue > 300g, 4s have He w= 80 Hye p> SO Since the popilavon dstnibution 1s normal bul n= 30 and-¢ # Mot known, We must use the str ibUtON KHER =| = 24 depres of freedom) to define the critical, or rejection, region ofthe test at the Ss level of signicance, “This is found trom App. 5 (ice See. 4) and ls gine In Fig. 3.2. ‘This tsa ripheral rest. Pally, since Vou _ 520-500 20 sida” 357085 "15 und it as within the acceptance region, we accep 4, that jt = Sg atthe $e level of significance (or with a 85> level of contdenee). 133 ‘Acepane mgs Fa y Fe 82 EXAMPLE 4, In ths past, 60% of the students entering a specialized college program received Mir egress within years Foe the 1980 entering class of 38, only 1S reccived their degrees by TSM To test if the 1980 class