5 views

Uploaded by elgorilagrande

save

- X
- Survey Samle
- Berg Um
- 8 Maintenance Optimization
- Problems From Chapter SPC Tutorial
- ET07Quality Concepts Module Test
- What is a normal distribution curve.docx
- Introduction to Portfolio Management
- Exercises.pdf
- 3_Sampling, Sampling Bias the Central Limit Theorem
- Vii Test of Hypothesis 2
- ASTM D 3042 – 97 Insoluble Residue in Carbonate Aggregates
- Relationship of Personal Competence and Managerial Competency of Business Organizers with the Quality of School Administration Services in Man Gorontalo Graduates
- Win Bugs
- Differences in Intelligence and Creativity between Tattooed and Non Tattooed Students
- Pome 2002 ,Hessian,Maximum Entropy
- ah_statistics_2006.pdf
- Reliability Distribution 1
- Copy of Mass Balancer - WeightRe
- Just the Facts on the Normal Distribution_Preview the Document
- Bootstrap
- Facet Joint Pain
- Dennis ChronicSinusitisarticle
- Fiber Bundles
- 11342623
- 4 Diffusion Peluffo2010
- 5 Levels of Self-Awareness
- Regression and Multivariate analysis
- HTCI Solutions
- Documento de Soporte No. 15 - Parametros Estadisticos
- Method of Statistical Analysis by Springer
- Análisis de La Varianza Con Un Factor (Anova)
- Chi Square Tests (Part 3)
- sampling.ppt
- Accident Statistics.pdf
- Adkins (2011). Using gretl for Principles of Econometrics, 4th Edition.pdf
- SSM Unit6 FirstHalf
- Teoria Del Muestreo
- cieficiente_correlacion
- Sampling Techniques
- Prediction of mandibular growth rotation, Assessment of the skieller, bjork and linde-hansen method.pdf
- Stat20SylF14
- 122518885-Trabajo-Estadistica.docx
- Ejercicios Tema 1 Estadistica Descriptiva
- Fowler Cohen Statistics for Ornithologist
- STAT7004_ASS2
- Cuantiles y percentiles
- Problema 6.5
- Multivariate powerpoint
- Intersection Union
- Exerci Cio 4
- Test7A
- Skewness/Kurtosis
- Time Series Regression & Non-Stationarity
- FE Examples Prob.&Stat. July22 09[1]
- 11 MA461_Regresion lineal_2017_02_V6 (1)
- Solution Tanvir

You are on page 1of 4

Holland (holland@um

Tues, March 15, 2011 (11am) Katherine Lubarsky (lubarske@umdn

Page 1 of 4

Confidence Intervals

Review

• All observations are a sample of possible outcomes, subject to random fluctuations.

• We are always uncertain, but make judgments based on the likelihood of the observations

that we see (“p values”).

• Decision rule: if p<alpha, too unlikely to consider as a random fluke -- reject null

hypothesis.

• Decision could be wrong:

o It IS a rare fluke → Type I error.

o You don’t pick up a difference or the sample doesn’t allow you to see a

o distinction → Type II error.

• To get the probability of a set of observations, use a probability distribution.

• Coin toss and patient survival (binomial examples).

**Normal Distribution and Standard Deviation
**

Last lecture when we did the coin toss, we learned that the results of the tosses result in a

binomial distribution; the possibilities are ‘heads’ or ‘tails’ (2 possibilities - ‘bi’), and the

probability of getting certain results can be calculated simply by multiplication:

• The possibility of getting either heads or tails is .5 on each toss.

• To calculate the probability of getting a certain number of heads (or tails) in a row, multiply

.5 by the number of tosses. [Probably of 3 heads in a row = .5 x .5 x .5 = .125 (½ x ½ x ½ =

1/8)]

**The normal distribution is commonly found in “normal” biological distributions such as height,
**

weight, blood pressures, etc. In a normal distribution, the values are continuous, and therefore,

the probability of values cannot be calculated by simply multiplying numbers the way it is done in

binomial distributions.

German statistician Carl Friedrich Gauss figured out a formula that related different spots along the

x-axis to different areas under the normal curve for which plugging in any x-axis number (the

observation) would give you the probability of occurrence of that number/observation (what the

likelihood is of getting that observation in that population). We must know the following terms:

mean, observation relative to the mean, and standard deviation.

**The standard deviation (represented by σ - Greek letter sigma) is a measure of spread or
**

variability. It is standardized so that the same number of standard deviation units encompasses the

same proportion of observations in any normal distribution.

The normal curve is symmetrical and bell-shaped and

has a mean in the center (µ, mu), and there are

notations for µ+σ, µ+2σ, in both directions. The

number line below the x-axis denotes σ: +1 is 1 SD

above the mean (µ+σ), +2 is 2 SD above the mean

(µ+2σ), -1 is 1 SD below the mean (µ-σ), -2 is 2 SD

below the mean (µ-2σ).

**Looking at the graph, you can see that in a normal
**

distribution, approx. 68% of the occurrences lie

within one standard deviation of the mean [+1 (µ+σ)

and -1 (µ-σ)]. Note that most values are very close to

the average, and few are at the extreme ends. If we

and so on and so forth. So if someone were to ask you. The mean +/. If you take the heights of 100 people from Newark. and we want to get the average height of . we find that 95% of values are within 2 SD. you would say that it is quite likely.).Biostats Lecture: Confidence Intervals Dr. 2 SD = 95%. in fact. thyroid concentration. March 15. This is exactly how labs generate normal ranges for all the things that are measured at labs (serum cholesterol. it occurs only 5% of the time. 2011 (11am) Katherine Lubarsky (lubarske@umdn Page 2 of 4 include more of the values (more of those at each end).] Example #1: Remember: mean +/. you can calculate the number of units of SD your observation is.2 SD includes 95% of the area. and 99% of values are within 3 SD. 3 SD = 99% . the taller and skinnier the curve. 3 SD = 99%. The diagram to the right shows various normal curves. you can tell how rare or surprising the observation is (what is the chance that I would have gotten this number?). then we can easily figure out the range of systolic blood pressures that includes 95% of the population: 1. and the highs and lows that show up in the abnormal column are that rare 5%. and when you divide that by σ.x is the observation you are interested in. Z-score The formula for calculating it is at the bottom of the above diagram . REMEMBER: 1 SD = 68%. This is true of all normal distributions. 95% likely. etc. The chances that one’s systolic bp is less than 80 or greater than 120 is rare. x-µ gives you the distance your observation is from the mean. Now we’re going to think about samples instead. What is the purpose of this? If given an observation (x). If we know (in a distribution of systolic blood pressures) µ=100 mmHg. it is bound to be different from the heights of another 100. but in each of the curves. Let’s take the average heights in successive samples of 100 from the Newark population. and the means will also be different. So far we have been talking about individual observations in a given population. 1 SD above and below the mean always encompasses 68% of the area under the curve (and 2 SD = 95%. 2 SD = 20 mm Hg (2 x 10) 3. and σ=10 mmHg. and that second set of data will be different than the following. what is the likelihood one’s systolic bp is 95. Similarly. Holland (holland@um Tues.see above. Mean =100 mm Hg 2. and you know the mean and SD of the distribution. The central 95% is considered normal.2 SD will give us our range: • (µ + 2 SD) 100 + 20 = 120 mmHg (our upper bound) • (µ – 2 SD) 100 – 20 = 80 mmHg (our lower bound) Answer: 95% of systolic bp observations are between 80 and 120 mmHg. Sample means (averages) taken repeatedly from the same population will almost always vary somewhat. Notice that the smaller the value of σ. The heights of each 100 measurements will be different from the other sets of 100. if there are 100 students in the lecture hall. If 1 SD = 10 mmHg.

The same rule regarding areas under the curve apply: 1 SE = 68%. If we had sampled only two student heights. a small sample size is not representative of the population. By the same token. then throw them back into the pool of 100. which results in a low average. There is just one difference between this and the normal distribution. one sample might include many short people. March 15. To the right is the distribution of means of samples. even if the underlying data are not normally distributed. 2 SE = 95%. We have now established that there is variability in samples.just understand the concept).it is the same because the means of samples are normally distributed. most of the means of those samples will be in the middle of the bell curve. This graph would therefore reflect the “normal frequency distribution” that we already saw .” and they themselves distribute normally. We take another 10 students and after measurement. In a normal distribution.Biostats Lecture: Confidence Intervals Dr. 3 SE = 99%. we may take a sample of size 10 (because it is easier and takes less time than measuring 100 heights). [We do this by random sampling. It wouldn’t even matter if we had the shortest or the tallest person included in that sample because the average is so stabilized by having 99 observations (which is 99% of the population) that the addition of an “atypical” observation doesn’t change the average very much. The distribution of means is normal. the accuracy of the estimation increases. Another sample may include many tall people. 2011 (11am) Katherine Lubarsky (lubarske@umdn Page 3 of 4 the class. So now we know that the means of samples are normally distributed – the second portion of the above mantra says that this will still be the case even if the underlying data is not normally distributed. but it is also important to realize that this variability is quite regular and predictable: The means of samples are normally distributed. Remember we are talking about means (averages) of samples. the more accurate the estimation. same students may be in the second sample – who knows ~ it is random. we talk about standard deviation. measure them. The central limit theorem proves that this is the case even if it may seem counterintuitive (we don’t need to know this. as sample size increases. average it. So again. most of the sample means will also be average. So if you take many samples (enough so that each sample becomes one observation on a graph). So we take 10 students. Let’s say in the second example we measured 99 out of 100 heights. Standard Error The standard error is the “error” that is involved when using a sample to gather data rather than using the entire population (which in most cases would be impossible to do). For example. and the average we would have calculated from them would not be accurate at all. which results in a high average. Samples will give you a . the same way that most people are of average height. What this means is that the mean of each sample becomes the new “observation.] Using these two examples we see that the larger the sample size. we use the term standard error (SE). it is very possible that we would have picked two students that were atypically short or tall. Holland (holland@um Tues. not specific pieces of data. In a distribution of samples’ means. But because most people are average height (by definition). calculate a different average from the first.

and the difference in exactness is calculated by the standard error. what is the range of sbp that includes 95% of mean bp that you’d find in samples? 1. 95% of the sample means are between 98 and 102 mmHg. and z becomes very small. or difference. take the sample mean (100) and add 2 SE (2 x 1) → 102 So. If the mean for samples of systolic bp = 100 mmHg. but it won’t be exact. This again reiterates the idea that the larger the sample size. 2011 (11am) Katherine Lubarsky (lubarske@umdn Page 4 of 4 representation of the population. the denominator becomes very large. Likewise with a small sample size (small n) z will be large .10 (SD of sample means) / square root of 100 (n) = 10 / 10 = 1 2. Holland (holland@um Tues. Note how the square root of n is in the denominator .Biostats Lecture: Confidence Intervals Dr. the SD = 10 mmHG. the more accurate the estimation. In other words. given the sample we drew and the variability associated with the size of the sample. “We are 95% certain that the real value is within these boundaries. if n is very large.the sample is so small that you can’t really get a sense of what the true population is. So in calculating the 95% CI (which is what we did above). To get the upper bound. That is the mathematical way of saying that with a very large sample size (n). To get the lower bound. The equation at the bottom of the diagram for ‘z’ is the shift. Because there is variability in samples we use a Confidence Interval to tell us how accurate our mean is. the difference (z) between the distribution of sample means and the distribution of population means is very small. . The equation for standard error is as follows: SE = SD of sample means divided by square root of n Example #2: Remember: +/.“n” is the sample size. take the sample mean (100) and subtract 2 SE (2 x 1) → 98 3. March 15.” A 95% CI = sample mean +/.2 SE includes 95% of the area. we sampled systolic bp. and the sample size is 100 people. Confidence Intervals In the above example. between the population mean and the sample mean.2 SE At this point we watched a video about confidence intervals that reiterated what was covered in this lecture and then moved onto the workshop to practice our newly acquired statistical skillz. Calculate SE (SD of sample means divided by square root of n): . we are saying that we are 95% sure that the real systolic bp is between 98 and 102 mmHg.

- XUploaded byJImmy Chan
- Survey SamleUploaded byanon_351241041
- Berg UmUploaded bysekaha
- 8 Maintenance OptimizationUploaded bytatyyuniarti
- Problems From Chapter SPC TutorialUploaded byNishanth Lokanath
- ET07Quality Concepts Module TestUploaded byMY NAME IS NEERAJ..:):)
- What is a normal distribution curve.docxUploaded bySoon Siew Lee
- Introduction to Portfolio ManagementUploaded byKatieYoung
- Exercises.pdfUploaded byAnonymous twYtU4U
- 3_Sampling, Sampling Bias the Central Limit TheoremUploaded byasvanth
- Vii Test of Hypothesis 2Uploaded bysgultom
- ASTM D 3042 – 97 Insoluble Residue in Carbonate AggregatesUploaded byalin2005
- Relationship of Personal Competence and Managerial Competency of Business Organizers with the Quality of School Administration Services in Man Gorontalo GraduatesUploaded byInternational Journal of Innovative Science and Research Technology
- Win BugsUploaded bymuralidharan
- Differences in Intelligence and Creativity between Tattooed and Non Tattooed StudentsUploaded byAnonymous ClJIaF0
- Pome 2002 ,Hessian,Maximum EntropyUploaded bym0105053257
- ah_statistics_2006.pdfUploaded byBalkis
- Reliability Distribution 1Uploaded byanil.ammina
- Copy of Mass Balancer - WeightReUploaded byJob Mateus
- Just the Facts on the Normal Distribution_Preview the DocumentUploaded byVivian Brooklyn Chen
- BootstrapUploaded bySergioDragomiroff

- Regression and Multivariate analysisUploaded byKesav Kumar
- HTCI SolutionsUploaded byleftpaw123
- Documento de Soporte No. 15 - Parametros EstadisticosUploaded byFernando Antonio Rodriguez Arrieta
- Method of Statistical Analysis by SpringerUploaded byjoycechicago
- Análisis de La Varianza Con Un Factor (Anova)Uploaded byMaritza Llanos
- Chi Square Tests (Part 3)Uploaded bykinhai_see
- sampling.pptUploaded byPrateek Saini
- Accident Statistics.pdfUploaded bySiyuanLi
- Adkins (2011). Using gretl for Principles of Econometrics, 4th Edition.pdfUploaded byWilliamtom
- SSM Unit6 FirstHalfUploaded byRam Krishna
- Teoria Del MuestreoUploaded bycoldman10390
- cieficiente_correlacionUploaded byOlha Sharhorodska
- Sampling TechniquesUploaded byHasnain Mohammadi
- Prediction of mandibular growth rotation, Assessment of the skieller, bjork and linde-hansen method.pdfUploaded byJose Collazos
- Stat20SylF14Uploaded byAmey Stune
- 122518885-Trabajo-Estadistica.docxUploaded byJesus Alba
- Ejercicios Tema 1 Estadistica DescriptivaUploaded byJose Ynes Marquez Marquez
- Fowler Cohen Statistics for OrnithologistUploaded byAlejo Camero
- STAT7004_ASS2Uploaded byswonder22
- Cuantiles y percentilesUploaded byEsthefany Pongo
- Problema 6.5Uploaded byEdgar De La Cruz
- Multivariate powerpointUploaded bySchahyda Arley
- Intersection UnionUploaded bychrisadin
- Exerci Cio 4Uploaded byLuciane Silva Marques
- Test7AUploaded bycamillesyp
- Skewness/KurtosisUploaded bylaxman19
- Time Series Regression & Non-StationarityUploaded byDoritosxu
- FE Examples Prob.&Stat. July22 09[1]Uploaded byHashem Mohamed Hashem
- 11 MA461_Regresion lineal_2017_02_V6 (1)Uploaded byGuillermo Torres
- Solution TanvirUploaded byTanvir Siddiquee