4 views

Uploaded by Ze Chen

Attribution Non-Commercial (BY-NC)

- Normal Distribution Notes
- COWAT Metanorms Across AGE Education and Gender
- Indus Engg
- Notes on medical statistics
- communications systems
- Sample - Lss Green Belt Course
- 2-Way-TOS-Reading and Writing -.xlsx
- 29. M.E. CSE.
- CSSGB_p
- CFM ROss Solutions
- J 00818 Paper II Commerce
- Business Stats Review
- 160515-A Bayesian Approach to Implied Volatility Analysis of s&p 500
- Assgment 2
- Chapter6 7 Sol
- ExcelFunctionImprovements 10-05-09 (1)[1]
- 2007-06-17_TPC_DD_BasicStatistics
- Session 5 Sampling Distribution
- illustrates a random variable.docx
- Statistical Calculations Using Calculators

You are on page 1of 15

We are now familiar with descriptive statistics; but the main use of statistical methods is not description, but prediction

o i.e. we collect samples mostly to predict

The key instrument of extrapolation from sample to population is the analysis of probability distributions:

o By assuming that our variables have a certain

distribution (normal, uniform, etc.), we can use samples to infer population properties

In the following, well examine the concept and uses of statistical distributions

2

Most utilised statistical distribution is the normal distribution (the Bell curve)

o also the most infamous due to certain misuses

o http://crab.rutgers.edu/~goertzel/normalcurve.htm

o well, anything in the wrong hands (from a bread

The first reason for popularity of the normal curve is descriptive; i.e. we use it to model distribution of certain traits that look bell-shaped

What traits are bell-shaped? Typically, traits that are optimised or established by biological or social processes, and thus have a tendency to occur at an expected/established/optimal value

o classic example: biological traits under natural selection o but the only reason Darwin applied the principle of optimisation

to nature is that this was a current concept in Victorian society; human societies also define optimal values of certain features, with deviations (in both directions) being less common

o o o

mean value should be the most likely or frequently observed the furthest from the mean, the less likely a value should be sum of all cases (or probabilities) should be 100% (thats the whole sample)

o o o

> x <- seq(-3, 3, 0.05) > y <- exp(x) > plot(x,y) # what does it look like?

Try others:

o o o

> y <- exp(-x) > y <- exp(x^2) > y <- exp(-x^2) #that works!

5

The normal distribution is just a modified version of our exponential The curve N(0,1) =

-3 -2 -1

0 +1 +2 +3

Is says that for example: the probability of being well above average (+3 standard deviations above mean) is only 0.1% probability of being one standard deviation below average (-1 sd) is 0.1+2.1+13.6=15.8% (i.e. everything below -1)

However, real traits (body height, income, schooling years, number of social media accounts) may have a normal distribution (bell shape), but rarely with mean=0 and standard deviation=1

That is not a problem: we can standardise variables, i.e. transform them so that everything you measure has mean=0 and sd=1

How is this done? With z-scores

7

o

If mean height is 180 cm, someone 170 cm tall now measures 170-180=-10

2) We take all residuals (case minus mean) and divide by standard deviation of sample

o

If sd=10 and mean is 180cm, then someone measuring 190 cm deviates -10 cm/10 cm= -1 standard deviation below the mean

In summary, standardisation or calculation of zscores is simply converting any measurements into standard deviation units using

i.e., if your height, or age, or income, etc. are average, then on the z-scale all those things measure zero

8

So: if in a population

o mean height = 180 cm o standard deviation=10

o you measure 10 cm above the average o you measure z = (170 180)/10 = -1

This means that the probability of being shorter than 170 cm in this population is

o 0.1 + 2.1 + 13.6 = 15.8%

The reason for standardising is clear: it is the theoretical step that allows the application of the normal distribution to many quantifiable aspects of reality

9

We are interested in intervals of the normal curve, not points Why? What does it mean to ask what is the probability of being a millionaire in the UK? (or their frequency) o it does not mean the probability of having exactly 1 million (thats a point) o it means everyone having over 1 million (and thats an interval) Cumulative probability is the probability of an interval of values

10

a lower interval

an upper interval

It is easy to estimate cumulative probability of being smaller than a value o You provide the individual value, the mean, and the sd, and R calculates z-score and the probability of the interval defined by that value Command pnorm(test value, mean, sd) calculates cumulative probability from left to right, i.e. from to a value x; o (thats the blue area) Example: if your height is 170 cm (and average is 180 cm, sd=10 cm), then o > pnorm(170,180,10) o [1] 0.1586553

a lower interval

11

pnorm can estimate upper intervals too (i.e. the probability of being over a given value:

Example: o what is the probability of being at least (i.e. taller than) 190cm in the same population?

Probability of being smaller than 190 cm is > pnorm(190,180,10) [1] 0.8413447 i.e. 0.841=84.1% Therefore the probability of being over 190 is the rest. i.e. > 1-pnorm(190,180,10) [1] 0.1586553 i.e.: probability of being taller than 190 cm is 1 (100%) minus the probability of being smaller than 190 cm

an upper interval

12

Important: we can combine the two things to calculate probability of extreme values (i.e. too large or too small) So what is the probability of being shorter than 170cm OR taller than 190 cm, with N(180, 10)?

(check why)

13

What about probability of not being extreme, i.e. of being between 170 cm and 190 cm? (This means less than 10 cm off average of 180 cm)

o > pnorm(190, 180, 10) pnorm(170, 180, 10)

14

Take the estimates of years at school by country (from the HDR2011 database); this is the variable schoolingyears:

How can we estimate the proportion of countries with children having a) less than 3 years of schooling? b) less than 5 years of schooling? c) at least 7 years of schooling? Hints: -You need to use function pnorm -To use pnorm you need the test value, the mean and the standard deviation of variable schooling years

15

- Normal Distribution NotesUploaded byAnjnaKandari
- COWAT Metanorms Across AGE Education and GenderUploaded bytomor
- Indus EnggUploaded bymannar.mani.2000
- Notes on medical statisticsUploaded byCalvin Yeow-kuan Chong
- communications systemsUploaded byKenan Olgac
- Sample - Lss Green Belt CourseUploaded byAhmed Ragab
- 2-Way-TOS-Reading and Writing -.xlsxUploaded byAngelica Babsa-ay Asiong
- 29. M.E. CSE.Uploaded byCibil.Rittu1719 MBA
- CSSGB_pUploaded byiqbal_abud
- CFM ROss SolutionsUploaded byHai Ha
- J 00818 Paper II CommerceUploaded byAnonymous WtjVcZCg
- Business Stats ReviewUploaded bybasil9
- 160515-A Bayesian Approach to Implied Volatility Analysis of s&p 500Uploaded byteikkheong
- Assgment 2Uploaded byHanya Dhia
- Chapter6 7 SolUploaded bysarvesh009
- ExcelFunctionImprovements 10-05-09 (1)[1]Uploaded bytulyakov
- 2007-06-17_TPC_DD_BasicStatisticsUploaded byRavichandran Srinivasan
- Session 5 Sampling DistributionUploaded bydipen246
- illustrates a random variable.docxUploaded byYvonne Alonzo De Belen
- Statistical Calculations Using CalculatorsUploaded byrichard_ooi
- Experiment (1)Uploaded byMohammed Mohie
- MCA-IVUploaded byangel arushi
- Flow ChartsUploaded byJane Doe
- ProgrammeUploaded byAlfredo Dagostino
- normal.pptUploaded byPaul Dogaru
- Analysis of Cost and ProfitUploaded byVĩnh Liên
- sasnotes13Uploaded byNagesh Khandare
- Tankov Jump Processes Ch4Uploaded byfatcode27
- s3e - probdist - texcorr - doc - rev 2019Uploaded byapi-203629011
- [Andrew Sleeper] Design for Six Sigma Statistics (BookZZ.org)Uploaded bylittlekheong

- Zizek - THE REAL OF SEXUAL DIFFERENCE<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> <STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}PRE{font-family:sans-serif}--></STYLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR noshade size="1px"> <P> While trying to process the request: <PRE> TEXT http://www.scribd.com/titlecleaner?title=Zizek+-+THE+REAL+OF+SEXUAL+DIFFERENCE.pdf HTTP/1.1 Host: www.scribd.com Proxy-Connection: keep-alive Accept: */* Origin: http://www.scribd.com X-CSRF-Token: acc3746c31faf70206745242a377392c3d1d4c24 User-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 X-Requested-With: XMLHttpRequest Referer: http://www.scribUploaded byorlandofurioso2
- 40369817Uploaded byZe Chen
- Chinese RevisionUploaded byZe Chen
- Dissertation Guidelines 2012-13Uploaded byZe Chen
- Louis Althusser Lenin and Philosophy and Other Essays 2001Uploaded byZe Chen
- Agamben, Giorgio - State of ExceptionUploaded byZe Chen
- Lecture 4 T-testsUploaded byZe Chen
- Lecture 2 - Probability DistributionsUploaded byZe Chen
- Lecture 3 - Probability DistributionsUploaded byZe Chen
- Bereczkei and Dunbar 1997. ProcB (1)Uploaded byZe Chen
- Homeless Man Recycling Their PastUploaded byZe Chen
- JLComaroff of Revelation and Revolution V2Uploaded byZe Chen

- Chi Square TestUploaded byfrancisco-perez-1980
- 5.Sas Codes ExtraUploaded bykPrasad8
- Training t Test 16 EnUploaded byRaúl Moreno Gómez
- Getting to Know SPSSUploaded byTarig Gibreel
- discrete prob distributionUploaded bypmal91
- Monte carlo Simulation.xlsxUploaded byAnuj Popli
- 051 ProbabilityUploaded byAndy William
- Non-Inferiority Tests for Two Survival Curves Using Cox's Proportional Hazards ModelUploaded byscjofyWFawlroa2r06YFVabfbaj
- Oneway ANOVA pptUploaded byVikas K Chauhan
- t statisticUploaded byuser31415
- Theory Short 09Uploaded byRheza Mp
- Files-2-Presentations Malhotra Mr05 Ppt 16Uploaded byNeedmaterial123
- Basic Concepts of Discrete Random Variables Solved ProblemsUploaded byMoumita Dey
- eport reflectionUploaded byapi-298888897
- Wilcoxon Mann Whitney or t TestUploaded byRodrigo Chang
- Chapter 14_Work SamplingUploaded byRohan Viswanath
- ch05busstatUploaded byPrashanthi Priyanka Reddy
- Curriculum Vitae of Kapil Kumar.pdfUploaded bykapilstatsdu
- samplingUploaded byscribdpac
- Lecture (hypothesis)printUploaded byMuhammad Umer
- Statistics Notes - Normal distribution, Confidence Interval & Hypothesis TestingUploaded bywxc1252
- Prism Course ManualUploaded byFelix Ezomo
- fstats_ch3Uploaded byFawad Ali Khan
- 10739308_10203914552004027_1698851518_nUploaded byJose Pinto de Abreu
- Discussion1 SolutionUploaded byana_pacios
- Some ANOVA Sample ProblemsUploaded byLoren Albances
- Course Project - Part BUploaded byParker DKD
- Considerations_in_Determining_Sample_Size_for_pilot_Studies-libre.pdfUploaded byAinon Wazir
- Chapter - Confidence IntervalUploaded byAlvin Tung Kwong Choong
- Determination of a Mining Cutoff Grade Strategy Based on an Iterative FactorUploaded byalvaroaac4