0 Up votes0 Down votes

2 views8 pagesOct 12, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

2 views

Attribution Non-Commercial (BY-NC)

- Chapter 6_Problem Solving (Risk)
- Sumarry
- Debunking of the Fraud NIOH Report on Endosulfan
- Table 1
- Maths Coursework - Mayfield School
- XMR_Dec1_Day_3_1030
- Basic Math Symbols
- Normal Distribution
- Set 1
- Int. J. Epidemiol.-2004-Waters-589-95(1)
- Handout1
- Variance
- Solomon W. Polachek - Jobs, Training, and Worker Well-being (Research in Labor Economics)-Emerald Group Publishing Limited (2010).pdf
- Using Analysis Tools - Goal Seek, Solver, And Data Tables
- Variance and Standard Deviation
- T-Test_ Correlated Q3
- HW4
- asda.pdf
- New Yet to Read Dmop Post Mid
- Griddatareport-data Praktikum Iut Ubahan

You are on page 1of 8

In the last handout the subject of statistics was defined to be the art and science of making inferences about a population on the basis of a sample. But what sort of inferences about a population does one seek to make ? One often seeks to estimate the population mean and the population variance. So the point of taking a sample and computing a sample mean and a sample variance is to estimate the population mean and population variance, respectively. The foregoing discussion is getting ahead of itself, in the sense that it presumes that the terms population mean and population variance have already been defined which they havent. There are two cases to consider : the case in which the population is finite, and the case in which the population is infinite. Finite versus Infinite Populations In the case of a finite population, say of size N, the definition of a population mean is the obvious analogy to the definition of a sample mean given in Handout 1 : the population

mean is

populations are not just the province of pure mathematicians spinning abstract theories for the sake of abstraction itself ; the case of an infinite population is quite common in scientific settings. Suppose one is investigating the wing span ( in millimeters ) of the progeny resulting from the cross breeding of two strains of fruit fly. What is the population under investigation? The fruit flies bred by the investigator ? The fruit flies bred by all researchers in the US that year ? The fruit flies bred by anyone, anyplace to date ? If its a truly general scientific investigation, the population of interest is the set of all wing spans of all potential fruit flies that have been or could be bred, anytime or anyplace ; which is, by the nature of the concept, an infinite set. And any scientific investigation of sufficient generality to be of significant enduring interest, is likely to be about an infinite population. Developing a Definition for the Case of Discrete Random Variables We will consider the problem of defining the population mean in the special case of a discrete random variable : that is, a random variable that takes on a finite number of different values. ( For example, consider the random variable X that takes on the value 1 if a coin toss comes up heads, and the value 0 if the coin toss comes up tails. This is a discrete random variable since there are only two different values the random variable takes on ; but one can, at least conceptually, keep tossing the coin from here to eternity, and hence the population of interest is infinite. )

The goal is to develop a definition which makes sense whether the population is finite or infinite. Were not going to just plop down an unmotivated definition !! The motivation will be provided by a consideration of how to efficiently compute a sample mean when one has so-called grouped data. Keep in mind that the computational aspect is not what is of primary importance here ( ....although some textbook authors make a ludicrously big deal out of dealing with grouped data ...) ; the significance lies in the motivation grouped data provide for defining the population mean and population variance. Lets start by considering a particular example. Consider the experiment of randomly selecting a household in Fairfax County . The random variable of interest is the response to an inquiry as to the number of adults over 18 years of age residing in the selected household. Repeating this experiment 1000 times gives the data set given in the following table : Number of Persons Over 18 1 2 3 4 5 6 7 8 The sample mean may be computed as : Number of Households 223 437 150 86 62 22 15 5

To generalize the computations in this example, say that the discrete random variable X takes on the value xi with frequency fi . Then the sample mean is computed by

where k is the number of different values of xi in the sample and n is the total number of observations.

Note, just doing some algebraic re-writing, that the last formula for

can be rewritten as :

The ratio fi /n is an approximation to P ( X = xi ) . To keep the notation simple , lets agree to write pi for P ( X = xi ) and Then one may write the formula for for the approximation fi /n . as

is the sample mean. There should be an analogous average motivates the following definition.

or mean for the entire population. The last formula for Definition.

Given a discrete random variable X that takes on k different values x1, x2, . . ., xk , the expected value of X , or population mean :X , is

:X

where pi is P ( X = xi ) for i = 1, , k. --------Remarks ( 1 ) The notations E[X] and :X are two different views of the same thing : using the notation :X one is looking at the population mean as a property of a set of values ; using the notation E[ X ] one is thinking of the population mean as a property of the random variable that generates the set. Both ways of looking at things have advantages in different instances. ( 2 ) Remark on notation : lower case Latin characters generally denote characteristics of a sample , e.g. ; lower case Greek characters generally denote population

characteristics ; e.g. :.

( 3 ) Its a good idea to get used to using summation notation, so note that

Computational Example Consider a random variable Y which takes on values 2,4 and 8 with probabilities , 1/4 and 1/4 , respectively. Then :Y = E[Y] = ( )( 2 ) + (1/4)( 4) + 1/4( 8 ) = 1 + 1 + 2 = 4. Computational Example If Y is a random variable, any function of Y , e.g. Y2 , Y3 - Y , *Y*, etc. , is a random variable, and so also has an expectation. For example, for Y as in the example above, E[Y2 ] = ( )( 22 ) + (1/4)( 42) + 1/4( 82 ) = 2 + 4 + 16 = 22. Note that E[Y2 ] is NOT the same as E[Y ]2 !!!! Computational Example In a certain board game, the number of spaces a player moves his token is determined by using a spinner. The spinner is just a circle printed on a rectangle of cardboard with a light metal arrow attached to the center of the circle, so that the arrow rotates freely. The spinner for the particular game I have in mind is marked off into three sections : a 180 section colored blue , a 120 section colored red, and a 60 section colored yellow. If , when spun, the arrow lands in the blue section , the token is moved forward 4 spaces, if the arrow lands in the red section the token is moved forward 6 spaces ; and if the arrow lands in the yellow section, then the token is moved back 12 spaces. Let Y be the random variable corresponding to the number of spaces the token is moved. Assuming that the head of the arrow is equally likely to stop at any point on the perimeter of the circle, one has P( Y = 4 ) = , P( Y = 6 ) = 1/3 , and P( Y = -12 ) = 1/6. Then one computes :

:Y = E[Y] = ( )( 4 ) + (1/3)( 6 ) + 1/6( -12 ) = 2 + 2 - 2 = 2

Computation aint everything, ya know ??? How does one interpret this result ? If one plays the game a very long time, and spins the spinner billions and billions of times ( to echo the late Carl Sagan ....) , one will average about 2 spaces forward per spin.

Defining the Population Variance The motivation for the definition of a population variance is very similar. If one writes out the definition of the sample variance long-hand, i.e. without using summation notation,

Now for grouped data , where x1 , x2 , . . ., xk are the k different values occurring in a sample of n observations, and where the frequency with which xi occurs in the sample is fi , one has :

If n is very large, dividing by n - 1 isnt all that much different computationally than dividing by n ; and if n is large we expect to be close to :, so that for large n,

This expression motivates the following definition : Definition. The population variance for a discrete random variable X, denoted Var[X] or , is

Remarks ( 1 ) The square root of the population variance, FX , is called the population standard deviation. ( 2 ) Notice again the convention that lower case Latin characters denote sample characteristics, while lower case Greek characters denote population characteristics. ( 3 ) Note that one could also write the definition as Var[ X ] = E[ ( X - : )2 ] : an observation we will make use of later. (4) (5) Observe that , by definition , the variance of a random variable is non-negative. When the definition of the sample standard deviation was introduced in Handout 1 we noted that it would have been more natural if the denominator in the defining expression had been n instead of n - 1 . Heres the intuitive explanation : what one would really like to measure is the dispersion of the data about the population mean but since the population mean is unknown, one uses the sample mean in the computation. Data is information ;so having to estimate the population mean is like having one less data point.

Computational Example Referring to the random variable associated with the game spinner of the previous example, Var[ Y ] = E[ ( Y - : )2 ] = ( )( 4 - 2 )2 + (1/3)( 6 - 2)2 + 1/6( -12 - 2 )2 = 2 + 16/3 + 196/6 = 240/6 = 40. And the population standard deviation is FY . % 40 . 6.324. Computational Example Consider the random experiment of rolling a fair die. Let X be the function that assigns to an outcome the number of spots that appear on the top face. Then P(X = 1 ) = P(X= 2 ) = P( X=3) = P(X=4) = P(X=5) = P(X=6 ) = 1/6. E[X] = 1/6 1 + 1/6 2 + 1/6 3 + 1/6 4 + 1/6 5 + 1/6 6 = 3.5 ,

Var[ X ] = 1/6 ( 1 - 3.5 ) 2 + 1/6 ( 2 - 3.5 ) 2 + 1/6 ( 3 - 3.5 ) 2 + 1/6 ( 4 - 3.5 ) 2 + 1/6 ( 5 - 3.5 ) 2 + 1/6 ( 6 - 3.5 ) 2 = 2.917. Computational Example To help illustrate the relationship between the sample mean and variance, and the population mean and variance, I rolled a fair die 600 times. ( Well, I rolled it with the help of a random number generator thats built into Lotus 123. . .. ) I then computed the sample mean and variance for this sample of size 600. After that I rolled the die another 2400 times, and again computed the sample mean and variance. The results are summarized in the table given below: First 600 Rolls X= 1 2 3 4 5 6 Total Frequency 91 92 118 101 99 99 600 = 3.537 s2 = 2.785 Second 2400 Rolls X= 1 2 3 4 5 6 Frequency 403 370 383 434 417 393 2400 = 3.530 s2 = 2.895

Compare the sample means and the sample variances with the population mean and variance computed earlier.

Nice Examination Problem : A randomly chosen drilling site in a certain area of West Texas is quite variable in its potential for producing oil. Eighty percent of the time the well will produce only 50,000 barrels of oil ; ten percent of the time the well will produce 100,000 barrels of oil ; seven percent of the time the well will produce 250,000 barrels of oil ; two percent of the time the well produces 500,000 barrels of oil ; and , on the average, one in a hundred wells is a real gusher and produces 3,000,000 barrels of oil.

( a ) If X denotes the oil production of a randomly chosen well, compute the population mean ( expected value ) and standard deviation. ( You may choose any units for oil production you find convenient . ) ( b ) If a barrel of crude oil sells for about $25, and the cost of drilling a well is about $2,000,000, what is the average profit per well for a company that has the financial resources to stay in the oil business for the long run ? And here is a solution ---Using X to denote oil production per well ( in thousands of barrels ), one computes the mean and standard deviation of X to be 107.5 and 301.3615 , respectively. This means that in the long run the oil company will average a profit of 107,500 ( 25 ) - 2,000,000 = $687,500 per well. It is well to emphasize long run , since 80% of the time a well will, in fact, lose money for the company.

- Chapter 6_Problem Solving (Risk)Uploaded byShresth Kotish
- SumarryUploaded byNiqo Ahmad Arifin
- Debunking of the Fraud NIOH Report on EndosulfanUploaded byshreeramkannan
- Table 1Uploaded byJoel Sequeira
- Maths Coursework - Mayfield SchoolUploaded bygeorgesmart1
- XMR_Dec1_Day_3_1030Uploaded byJetesh Devgun
- Basic Math SymbolsUploaded byKyungJun Shin
- Normal DistributionUploaded bysitalcoolk
- Set 1Uploaded byGanesh Sutar
- Int. J. Epidemiol.-2004-Waters-589-95(1)Uploaded byHerty Felicia
- Handout1Uploaded byahmed22gouda22
- VarianceUploaded byEmily Tan
- Solomon W. Polachek - Jobs, Training, and Worker Well-being (Research in Labor Economics)-Emerald Group Publishing Limited (2010).pdfUploaded byGulbin Erdem
- Using Analysis Tools - Goal Seek, Solver, And Data TablesUploaded bygerte_yuew
- Variance and Standard DeviationUploaded bySrikirupa V Muraly
- T-Test_ Correlated Q3Uploaded bytasara83
- HW4Uploaded byKassandra Gianinoto
- asda.pdfUploaded byClaudia Maldonado Mojonero
- New Yet to Read Dmop Post MidUploaded byPrerana Rai Bhandari
- Griddatareport-data Praktikum Iut UbahanUploaded byMuhammad Rully Muchni
- Revision ExercisesUploaded byনীল জোছনা
- Hyper g DistributionUploaded byMusab Usman
- shaylorskittlesproject1Uploaded byapi-354558235
- ProductivityUploaded bySucher Eolas
- final projectUploaded byapi-431589978
- Surveying Lab 1Uploaded byMyles Quintero
- profed10.docxUploaded byJunel Sildo
- DISE{PO.docxUploaded byJorge Vera
- C00 Logistica ULUploaded byDaniel Andres Pupo Castillo
- 51303 SAZ3C (1)Uploaded byDeepak Rajaram

- Surface DetectionUploaded bystranjerr
- Ecriture Fraction 1Uploaded byahmed22gouda22
- Medical Devices by FacilityUploaded byjwalit
- 40592374Uploaded byahmed22gouda22
- 6965197 the Nine Basic Human NeedsUploaded byahmed22gouda22
- Triumph Hotel & Conference CentreUploaded byahmed22gouda22
- 2692221Uploaded byahmed22gouda22
- North TutorialUploaded byahmed22gouda22
- OpenAir Calculating Utilization Services CompanyUploaded byahmed22gouda22
- Ten YearUploaded byahmed22gouda22
- new11Uploaded byahmed22gouda22
- 08Uploaded byahmed22gouda22
- 15195736 Selling Skills Module FinalUploaded byahmed22gouda22
- 15121230 Golden Selling Skills Rules 1Uploaded byahmed22gouda22
- 7182082 Selling Skill TrainingUploaded byahmed22gouda22
- Roberts-Phelps G.-customer Relationship Management (2009)Uploaded byArfianty Reka
- PIRT9Uploaded byahmed22gouda22
- Handbook of Market SegmentationUploaded byahmed22gouda22
- BrightTALK_Measuring Marketing EffectivenessUploaded byahmed22gouda22
- CommunicationAffectAndLearningUploaded byMohd Redzuan Mohd Nor
- Using the Simplex Method to Solve LinearUploaded bymbuyiselwa
- DK-The British Medical Association-Complete Family Health GuideUploaded bydangnghia2112
- Lecture 2Uploaded byahmed22gouda22
- Muleke Et AlUploaded byahmed22gouda22
- Kit Ron 2010Uploaded byahmed22gouda22
- no_9Uploaded byahmed22gouda22
- Lab5Uploaded byahmed22gouda22

- Sample Unit 3 World Class 1Uploaded bySharmelee Danitza Saldaña Cornejo
- Fixed PointUploaded bynabeel
- Data Mining PresentationUploaded byChandra Bhushan
- Statistics Success in 20 Minutes a DayUploaded byVishal Joshi
- KING SIZE ( presentation on itc ).pptxUploaded byPayalGupta
- Asthma Morb & MortUploaded byMaribel Mallorca Bella
- 4. EIA Documents PreparationUploaded byمحمد على شلام مونا
- Barclays - Research & Segmentation (Student)Uploaded byzain
- PileAXL 2014 - A Program for Single Piles Under Axial LoadingUploaded bymyplaxis
- Instagram Photos Reveal DepressionUploaded byUtku
- era_progress_report2013.pdfUploaded byyo_pupu
- oral communication in context worktext 3333Uploaded byapi-367364169
- LFE Report 15Uploaded bymili ahmed
- Final Year Project of TourismUploaded bySami Yess
- Toilet Seat Sanitizer (1)Uploaded bySahoo SK
- Module 1 SampleUploaded byEddRy Nz
- Mapping ISO 14001_2015 to ISO 14001_2004Uploaded byfl_in1
- US Treasury: att5-narrative-exampleUploaded byTreasury
- Assaying in Resource Evaluation - The Need for Clear and Open MindUploaded byhary3adi
- Mod 3 Phase 2 Strategic Plan Analysis HERGET Final Copy (1)Uploaded byOlga Arbulu
- Research MethodsUploaded bymana19
- Tutorial: Development of data entryand CAPI applications in CSProUploaded bySimon-florentin Lemoupa
- Challan FormUploaded byAmAr Ali
- Block Wall ReferanceUploaded byNavajyoti Das
- Performance Management SystemUploaded byAnjusha Nair
- IMS Engineering ServicesUploaded byGary J. Davis, P. E.
- Research Methods IIUploaded byAnna Mkrtichian
- Extended Abstract of ThesisUploaded byRenz L. Salumbre
- Examining the structural relationships of destination image, tourist satisfaction.pdfUploaded byAndreea Jecu
- Pengecekan Berkala Jembatan Kabel.pdfUploaded byAmato Ryuga

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.