157 views

Uploaded by anon283

Lecture slides from ENGRD 2700 Basic Engineering Probability and Statistics

- PGPMS-I Syllabus Batch (2009-11)
- Biostat Manual
- Statistics Basics
- Alasuutari - 2008
- Tintner Review
- Statistics Examples
- 【brm】
- Subject Syllabus of Statistics
- Need for Statistics
- quantitative
- Statistics
- A CQMS 202 Course Outline W2012-1
- general maths unit plan
- A Study on Consumer Behaviour
- 7th - statistical poster project
- Test
- EXAMEN 222
- spring retreat presentation
- SSRN-id2242119
- Dml About Put

You are on page 1of 24

David S. Matteson

School of Operations Research and Information Engineering Rhodes Hall, Cornell University Ithaca NY 14853 USA dm484@cornell.edu January 20, 2009

Title Page

1.

Basis for the scientic method, management science, inteligent empiricism. Tidbits: Fact based decision making. Scientic decision making. In God we trust; everyone else bring data. Conclusion: Hunches can be wrong, intuition can be fooled, instincts may be immature or just wrong.

Title Page

Drug testing: Drug companies are not allowed to market a drug until it it proven safe and eective via clinical trials. (Living link with history: SIR1 was part of the largest public health clinical trial everthe Salk polio vaccine clinical trial.) The drug company employs statisticians to supervise the clinical trial; the FDA employs statisticians to review the drug company results. Digital network trac: Fundamentally dierent from POTS=plain old telephone trac. Prize winning Bellcore study (1993) statistically analyzed data network traces and concluded they exhibit characteristics not compatible with telephone models. Probability models were then constructed to explain the anomolies.

Hydrology: 10,000 year ood. Build the dam so high that water will exceed the height once per 10,000 years. (Small quantile estimation to the cognoscenti.) Finance: Value at riskthe banks risk reserve needs to be high enough to cover a loss so big that it occurs with probability 1/10,000. Environment: City is ruled out of compliance with EPA regulations if pollutant concentration exceeds a specied level more than 5% of the days in a year; that is, the standard is set so the prob of exceeding the standard is 5%.

Weather prediction: Chance of rain tomorrow is 53%. What does this mean?

Title Page

Pick your favorite quote: There are lies, damn lies and statistics. Sir Winston Churchill There are three kinds of lies: Lies, Damn Lies, and Statistics. Benjamin Disraeli You can use statistics to prove anything. Homer Simpson He uses statistics as a drunken man uses a lamppost - for support rather than illumination. Andrew Lang

Page 3 of 25 Go Back Full Screen Close Quit

Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. Aaron Levenstein It is easy to lie with statistics, but it is easier to lie without them. Frederick Mosteller Subjects perceived as dull. (Why?) Tell someone at a social gathering that you are studying statistics and watch their reaction. (Is the reaction the same if you say Engineering?) A statistician is someone who is good with numbers, but lacks the personality to be an accountant. (Or is it the other way round?)

The Greatest Scope More Details EDA Describing sample data

Title Page

Statisticians sometimes perceived as hired guns. (My statistician can beat up your statistician.) Statisticians are often used as expert witnesses in court. Government witnesses in regulatory hearings.

The Greatest

Lowest blow

Best selling book:. How to Lie with Statistics

Title Page

Quit

Title Page

Figure 2: Blowup

Quit

2.

Probability: Build models of random phenomena. Random phenomena: Phenomena whose outcomes cannot be predicted in advance. Outcome of ip of a die. Outcome: tomorrows stock price. Outcome: result of the upcoming presidential election.

The Greatest Scope More Details EDA Describing sample data

Caution: All models are wrong; some models are useful. George Box

Statistics: The science of organizing and summarizing data and using information in the data to draw conclusions; scientic extraction of information from data. Statisticians consumers of probability. Statisticians t parameters in probability models. Statisticians draw conclusions about population based on the partial information contained in a sample. The manner in which the conclusions are drawn use probability tools and reasoning. (The sampling error in the estimate of p is 5%.)

Title Page

3.

More Details

Title Page

Population:

a well-dened set of items under attention and discussion. (It is clear how to decide if an item is or is not in the population.)

Sample: a well-dened subset of the population that has been selected for

study or measurement.

Observation:

Examples: Population: all US universities Sample: ivy league universities. Population: all voters in the US. Sample: all voters with incomes over $200,000. Population: Yearly best times in seconds (minus 3 minutes) in the mile run over 121 years (ending in mid-80s). Sample: Yearly best times which were record times during this time period.

Title Page

100

90

mile

80

Title Page

60

70

Page 10 of 25

50

20

40

60 Time

80

100

120

Four Stages in statistical analysis: Precisely dene the population and then formulate clear answerable questions about the population. Collect data to help answer the questions; requires sampling scheme and experimental design. Exploratory step: Describe and present data using the tools of graphics descriptive statistics such as numerical summaries of datamean, median, variance, quantiles, . . . . Conrmatory step; formal inference: Analyze the data to draw conclusions about the population.

Page 11 of 25 Go Back Full Screen Close Quit The Greatest Scope More Details EDA Describing sample data

Title Page

Types of populations and sets: 1. Large but nite population: Dicult to take a census (ie, sample whole population). Thus, typically only a relatively small number of items are be sampled. Population of the US. Actual census only every 10 years. Even in the presidential elections, the sample consisting of eligible voters who vote is a relatively small percentage of the population. Only 53.9 percent of the voting-age population actually voted in 1980 when Ronald Reagan was elected. Pollsters predict outcomes based on much smaller samples of the order of hundreds. 2. Small and nite population: can be sampled in its entirety. Average age for ENGRD 2700 students this semester. Average number of publications of ORIE professors. ()

Page 12 of 25 Go Back Full Screen Close Quit

Title Page

3. Abstracted population represented by an innite setsay an interval of real numbers (or worse). Population representing time to failure of a machine component; population could be represented by the set of all positive real numbers [0, ).

Population representing the study of the point of maximum pollution concentration in city; population could be represented by a two dimensional set {(x, y) : 0 x 1, 0 y 1}.

Title Page

Types of samples:

A simple random sample of size n: a sample of size n in which each subset of size n in the population has the same likelihood of being selected. Problem with denition Likelihood is undened. For populations from continuous innite sets, the likelihood may be zero; eg, the probability of drawing a sample of size 2 from [0, 1] and getting 1 , 3 is 0. 4 4 Stratied random sample: The population consists of sub-populations. EG: Take a sample of size 700 from the Cornell student population by sampling 100 from each college. (Useful to predict outcome of union vote?) Biased sample. The samples are clearly dependent and unrepresentative. Population: residents of Florida. Determine the average income of residents of Florida. Biased sample: NBA players living in Florida.

The Greatest Scope More Details EDA Describing sample data

Title Page

More later.

4.

Population of US Voters Vote to re-elect Mayor = 1 Vote to elect challenger = 0. Epidemiology Infected =0 Not infected =1 Failure times; population is all real numbers. Data networks: Observe packet counts per unit time. Finance: Observe stock index like DJ.

Page 15 of 25 Go Back Full Screen Close Quit Title Page

The Greatest Scope More Details EDA Describing sample data

Exceptions: Makes of cars sold in the US {Buick Le Sabre, Volkswagon Passat, VW Golf, Honda Accord, Honda Civic, . . . }. Marketing: Brands of toothpaste {Colgate, Crest, Toms, . . . }.

Conclusion:

Result of sampling often yields a set of numbers. Sometimes the set is large. We need to make sense of the set of numbers.

The computer makes handling big data sets tolerable; 30 years ago statistics was frequently done By hand (pencil & paper, abacus, slide rule) or with a mechanical calculator. (Hence elementary stat texts had examples consisting of data sets of 5 points.) Then Electronic hand calculator. (Tedious to enter data.) Then Computer.

Page 16 of 25 Go Back Full Screen Close Quit The Greatest Scope More Details EDA Describing sample data

Title Page

Walmart records and stores a record of each transaction. What to do with so much data? emerging eld of data mining. Internet joke (but true): Question: What network does a research physicist use to ship his data from one facilty to another? Answer: UPS; they ship the hard drive from one location to another by UPS. In Internet studies, sniers and other automatic data collection devices give arbitrarily large data sets. In nance, there is high resolution data (recordings at very small time intervals) and even tic by tic data.

The Greatest Scope More Details EDA Describing sample data

Title Page

Page 17 of 25

Example:

Trace is packet counts per 100 milliseconds=1/10 second for Financial Company Xs wide area network link including USA-UK trac. Length of dataset=288,009; 8 hours of collection from 9am5pm. Top plot too muddy; bottom represents subsets sized 20,000.

Title Page

Conclusion:

First step:

Descriptive statistics:

Graphics

organization and summarization of large amounts of data for the purposes of drawing conclusions. Use

Summary statistics (mean, median, variance, ...) Use of descriptive statistics is often a rst step in an exploratory data analysis. Somewhat informal, pictorial.

Formal inference:

Draw scientic inferences about the population from the data. More formal methods. For example we can formally test hypotheses. (Sometimes the hypotheses are suggested by the EDA.) Most famous clinical trial: Salk vaccine given to sample of kids in early 50s with a control group receiving a placebo. The formal hypothesis was that the Salk vacine was more eective than randomness.

Title Page

5.

5.1.

stem-and-leaf plot

The Greatest Scope More Details EDA Describing sample data

Older method originated for small univariate data sets when analysis was often by hand. This is just a clever arrangement of the data values to reect the shape of the distribution. Advantage: simple, quick, easy to construct. Disadvantage: a little primitive; doesnt capitalize on graphics capabilities of packages and computers.

Title Page

Procedure:

digits.

1. split each xi into a stem of leading digits and a leaf of the remaining digits. 2. List the stem values in the left hand margin column and to the right list the leaves corresponding to each stem, listed in the order they are encountered in the data set.

A simple illustration: Suppose student scores on an exam are 48, 63, 67, 69, 70, 73, 76, 79, 79, 80, 80, 83, 88, 95. A stem-and-leaf plot is below: 9 8 7 6 5 4 | | | | | | 5 0038 033699 379 8

Positive features: The entire data set can be read with ease from the display. Gives a clear indication of the shape of the distribution of data values. R does this automatically with the following commands: > grades<-c( 48, 63, 67, 69, 70, 73, 76, 79, 79, 80, 80, 83, 88, 95) > stem(grades)

Title Page

4 5 6 7 8 9

| | | | | |

Minitab output (check the drop down menu under GRAPH) Stem-and-Leaf Display: C1 Stem-and-leaf of C1 Leaf Unit = 1.0 1 1 4 (5) 5 1 4 5 6 7 8 9 8 379 03699 0038 5 N = 14

The Greatest Scope More Details EDA Describing sample data

Title Page

Minitab output: extra column on the left. Features: The number in parentheses gives the number of observations on the line that contains the median (or the middle value). The 4 in the row above that, gives the total number of observations in the rst three rows, i.e. there were 4 scores below 70. The 1 above that indicates that there was one score in the 50s or below. The 5 below the median line indicates that there are 5 scores in the 80s and above. The 1 on the last line indicates one score 90.

Page 22 of 25 Go Back Full Screen Close Quit

Note The help le gives a detailed explanation. More extensive data sets - particularly those in which three or more digits vary - are dealt with in a variety of ways, but all are similar to the simple case shown above.

Title Page

Contents

The Greatest Scope More Details EDA Describing sample data

Title Page

- PGPMS-I Syllabus Batch (2009-11)Uploaded bysweetorion
- Biostat ManualUploaded byTwaha R. Kabandika
- Statistics BasicsUploaded byJahaan Jafri
- Alasuutari - 2008Uploaded byTheGuroid
- Tintner ReviewUploaded bysimao_sabrosa7794
- Statistics ExamplesUploaded byJae Ceem
- 【brm】Uploaded byHaowei Zhang
- Subject Syllabus of StatisticsUploaded bydiknakusmawati
- Need for StatisticsUploaded byjesaldalal
- quantitativeUploaded byterimaakichootrandi
- StatisticsUploaded byJorie Roco
- A CQMS 202 Course Outline W2012-1Uploaded bycliao123
- general maths unit planUploaded byapi-368306089
- A Study on Consumer BehaviourUploaded byrizwanasheriff
- 7th - statistical poster projectUploaded byapi-264794990
- TestUploaded bySiti NurFalahah
- EXAMEN 222Uploaded byHeiner Polo Alfaro
- spring retreat presentationUploaded byapi-463474314
- SSRN-id2242119Uploaded bytukito35
- Dml About PutUploaded byAyush
- artifact 7Uploaded byapi-354318784
- Chapter 1 - IntroductionUploaded byIsaac Wong
- jentik ekorUploaded byDeasy Silvia
- Intro StadUploaded byajax_telamonio
- research organization paper 060219Uploaded byapi-458501675
- Chap 01Uploaded bypazucena
- Chap i - IntroUploaded byEdna Mae Cruz
- Handbook Intro to Research RevisedUploaded byAgent Junro
- Tempelman Experimental_design.pptUploaded byMedardo Antonio Díaz Céspedes
- lawanda peer review worksheet integrative reviewUploaded byapi-431939757

- Stats: Data and Models Solution Chapter 1-4Uploaded byanon283
- syllabus104Uploaded byanon283
- Human Behavor BjorkUploaded byanon283
- W&H NeurotransmitterUploaded byanon283
- Computer Organization_exercises_lab_1.pptxUploaded byanon283
- NeurotransmittersUploaded byanon283
- Adult HmsocialUploaded byanon283
- Quantum ComputersUploaded byanon283
- positive thoughts for lifeUploaded byanon283
- Facility Management Crossword Assignment SolutionsUploaded byanon283
- Adhd and Brain ChemistryUploaded byMoo Choi
- howtopdfUploaded byvarokas
- caaUploaded byanon283
- SSDUploaded byanon283
- ScheduleUploaded byanon283
- dfgdfgUploaded byanon283
- 531 SinglesUploaded byanon283
- Tibial OsteotomysUploaded byanon283
- Space Planning ReportUploaded byanon283
- 280final-04Uploaded byanon283
- Calculus Study GuideUploaded byanon283
- wallpicUploaded byanon283
- sdsddddUploaded byanon283
- Block+Muros+Concreto+Celular+Castillos+TradicionalesUploaded byanon283

- Minn Kota PartsUploaded byrarepet7619
- IBM SPSS Neural Network 20 64bitUploaded byionizedx
- Placement PreparationUploaded byArjun Ganesh
- 2012.Spring.pocket.guideUploaded byrobbiebeagle
- 3. Well Testing Techniques in Horizontal Gas WellsUploaded by77bgfa
- DETENTION BASIN SIZING FOR SMALL URBAN CATCHMENTS.pdfUploaded byAzharudin Zoechny
- Crystal Reports 8.5_ch27Uploaded byStan Per
- SX008a en EU Example Sway StabilityUploaded byferasalkam
- Storage.pptUploaded byAnonymous oUoJ4A8x
- Tren de Potencia 797F Plano HYD 2016 SIS.pdfUploaded byFabrizio Jesus Morales Salirrosas
- Mechanics of Cold Rolling of Thin StripUploaded bycristobalmonopeluo
- Modul Sistem KomputerUploaded byLeny Novita Sari
- Mind Body ProblemUploaded bydoodoo33
- AE56Uploaded byGulzar Ahamd
- Database SystemsUploaded byJosip Tunić
- Stock Update ListUploaded byricky
- UMTS UE Behavior in Idle Mode Parameter Description.docxUploaded byporwal_vivek
- 134605180-A-Beginner-s-Guide-to-Structural-Engineering-Miscellaneous-Steel-Details.pdfUploaded bymbhanusagar.keynes
- DecisionTheory_Ch8Uploaded byAlex Banton
- 853_Topper_21_116_1_4_7206_The_Laws_of_Motion_up201507231545_1437646501_2479Uploaded byMansi Kakani Burhade
- SensorUploaded byAlex Rivkin
- Rielly E.conrad. SMP93 - Standard Ship Motion Program User ManualUploaded byYuriyAK
- 2d_scattering.pdfUploaded byWard Koory
- 6-24-0006 Std Spec for Filter-CoalescersUploaded byMurli ramchandran
- Lutron Tm 914cUploaded byAfrizal Adi Panuluh
- Fuselagem DesignUploaded byrajumech4
- 0324594690_164619Uploaded bySamantha Lee
- The Comparison of Two Monopulse Tracking Systems: Four – Horn and Multimode (based on the simulation results of HFSS)Uploaded byseventhsensegroup
- SECOND YEAR 4TH SEM QUESTION BANK FOR MICROPROCESSOR AND MICRO CONTROLLERUploaded byPRIYA RAJI
- Electrical Live_ Power SystemUploaded byashish