53 views

Uploaded by Yang Yi

STAT 3022 SLIDES UMN CHAPTER3

- Fama Macbeth Revisited
- New Support Vector Algorithms
- Beta-Robust Solutions for the Fuzzy Open Shop Scheduling
- Robustness Analysis
- Research
- or_0367.pdf
- Getting the Poor Out of the Ghettos - QM II Project
- MCD_2010
- Outliers
- The Accrual Anomaly & The Company Size
- US Federal Trade Commission: 050114halwhite
- Chap9 - Examples Robust Regression
- Identification of Outliersin Time Series Data via Simulation Study
- Tt Test Procedure
- BADM701 Excel1 HW Variance Means Difference MyLab
- Influence of Selected Variables NEW(1)
- 03 TLS-Training Data Registration v1.0!30!01-2012
- spss20p2
- MTH 231 Proactive Tutors/snaptutorial
- BADM 572 - Stats Homework Answers 6

You are on page 1of 27

Robustness

Resistance

Transformation

Outlier

**Chapter 3 A Closer Look at Assumptions
**

STAT 3022 School of Statistic, University of Minnesota

2013 spring

1 / 27

Intrduction

Robustness

Resistance

Transformation

Outlier

Introduction

In Chapter 2, we discussed the mechanics of using t-procedures to perform statistical inference. Namely t-tests and conﬁdence interval. We base these procedures on certain assumptions: we have random samples, representative of populations data come from Normal population samples are drawn independently. in pooled two-sample settings, we have equal variance (σ1 = σ2 = σ) In practice, these assumptions are usually not strictly met. When are these procedures still “appropriate”?

2 / 27

Intrduction Robustness Resistance Transformation Outlier Case Study: Making it Rain Data collected in southern Florida between 1968 . how much? 3 / 27 .1972 to test hypothesis that massive injection of silver iodide (AgI) into cumulus clouds can lead to increased rainfall. Randomly assigned treatment.pilots ﬂew through cloud every day. Researchers were blind to the treatment . whether treatment or control. Over 52 days. either seeded a target cloud or left it unseeded (as control). Question: Did cloud seeding have an eﬀect on rainfall? If so. This process is called “cloud seeding”. and mechanism in plane either seeded the cloud or left it unseeded.

data=case0301) Rainfall (acre−feet) 0 500 1000 1500 2000 2500 Unseeded Seeded 4 / 27 . ylab='Rainfall (acre-feet)'.Intrduction Robustness Resistance Transformation Outlier Graphical Summaries library("Sleuth2") boxplot(Rainfall ~ Treatment.

xlab="") Frequency 0 2 4 6 8 10 12 Seeded − Rainfall 0 500 1000 1500 2000 2500 3000 Frequency 0 5 10 15 20 Unseeded − Rainfall 0 500 1000 1500 2000 2500 3000 5 / 27 . mar=c(4. xlim=c(0. breaks=8.3000).3000). main="Seeded .Rainfall".Rainfall".1.Intrduction Robustness Resistance Transformation Outlier Graphical Summaries par(mfrow=c(2.1).0. xlim=c(0.5)) hist(case0301$Rainfall[case0301$Treatment=="Seeded"]. col="gray".4. xlab="") hist(case0301$Rainfall[case0301$Treatment=="Unseeded"].col="gray". main="Unseeded . breaks=10.

there are problems with our necessary assumptions: both distributions are very skewed both distributions have outliers variability is much greater in the seeded group than in the unseeded group Can we use our usual t-tools to analyze these data? How? 6 / 27 .Intrduction Robustness Resistance Transformation Outlier Numerical Summaries and Interpretations Numerical Summaries: Do it yourself (follow the R-code on page 42 of Chapter 2 slides) Graphical and numerical summaries indicate that rainfall tended to be greater on seeded days. However.

05114 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -556.5885 441. data=case0301) Two Sample t-test data: Rainfall by Treatment t = -1. + var.431851 sample estimates: mean in group Unseeded mean in group Seeded 164.Intrduction Robustness Resistance Transformation Outlier Can we do this? > t.224179 1. alternative="two. df = 50.9846 How much did the violations of our assumptions aﬀect these results? 7 / 27 .9982. p-value = 0.sided".test(Rainfall ~ Treatment.equal=TRUE.

Intrduction Robustness Resistance Transformation Outlier Robustness t-tools may be used even when assumptions are violated. 8 / 27 . because the t-tools are robust. to a certain degree. Robustness: A statistical procedure is robust to departures from a particular assumption if it is valid even when the assumption is not met.

Intrduction Robustness Resistance Transformation Outlier Type 1: Robustness Against Departures from Normality Recall that the Central Limit Theorem (CLT) states that sample averages have approximately Normal sampling distributions. 9 / 27 . As long as samples are “large enough”. regardless of the shape of the population distribution. the t-ratio will follow an approximate t-distribution even if the data is non-Normal. for large samples.

Larger sample size diminish this eﬀect. then validity of t-tools is aﬀected very little by skewness. See Display 3. then validity of t-tools is aﬀected substantially by skewness.4 in the textbook for simulation results. but n1 ̸= n2. 10 / 27 .Intrduction Robustness Resistance Transformation Outlier Type 1: Robustness Against Departures from Normality Eﬀects of Skewness If two populations have same standard deviations and approximately same shapes. and if n1 ≈ n2. If two populations have same standard deviations and approximately same shapes. If skewness in two populations diﬀers considerably. tools can be very misleading with small and moderate sample sizes.

more serious problems may arise: sp no longer estimates any parameter SE(¯1 − ¯2 ) no longer estimates the standard deviation of x x the diﬀerence between averages the t-ratio no longer follows a t-distribution What can we do: If n1 ≈ n2.sided'. t-tools remain fairly valid even when σ1 ̸= σ2 .5 in the textbook for simulation results.Intrduction Robustness Resistance Transformation Outlier Type 2: Robustness Against Diﬀering Standard Deviations When we cannot assume σ1 = σ2 .test(x1.equal = FALSE) 11 / 27 . See Display 3. alternative = 'two. we need the ratio σ1 /σ2 to be between 1/2 and 2 to have reliable results. > t. When n1 and n2 are very diﬀerent. var. x2.

Observations in the same subgroup tend to be more similar in their responses than observations in diﬀerent subgroups. t-tools are usually not recommended in such cases. the standard error becomes very inaccurate. A serial eﬀect occurs when measurements are taken over time and observations close together in time tend to be more similar (or more diﬀerent) than observations collected at distant time points. 12 / 27 .e. 2 When the assumption of independence is violated. lack of independence) that commonly arise: 1 A cluster eﬀect occurs when the data have been collected in subgroups..Intrduction Robustness Resistance Transformation Outlier Type 3: Robustness Against Departures from Independence There are two types of dependence (i.

Question: Can you tell the diﬀerence between “Robustness” and “Resistance”? 13 / 27 . perhaps drastically.Intrduction Robustness Resistance Transformation Outlier Resistance and Outliers An outlier is an observation judged to be far from its group average. Whether or not we should simply remove such observations depend on how resistant our tools are to changes in the data. A statistical procedure is resistant if it does not change very much when a small part of the data changes.

0 −6 3 −0.0 −1.Intrduction Robustness Resistance Transformation Outlier Example of Outlier 1.5 −4 −2 0 x 2 4 6 −3 −3 −2 −1 0 1 2 −2 −1 0 1 2 3 14 / 27 .0 0.5 0.

70 The sample mean is 36. 50. 30. while the sample mean is not. 20.Intrduction Robustness Resistance Transformation Outlier Example of Resistance Consider a hypothetical sample: 10. and the sample median is 30. 30. 50. 20. 700 What happens to the sample mean? What about the sample median? The sample median is resistant to any change in a single observation. Now consider the sample: 10. 15 / 27 .

Compare your results to see how inﬂuential the outlier in question is. 16 / 27 . it is good practice to run your analysis with and without the outlier in the data set. they are not resistant. Small portion of the data can have a major inﬂuence on the results. One or two outliers can aﬀect a 95% CI or change a p-value enough to alter a conclusion.Intrduction Robustness Resistance Transformation Outlier Resistance of t-Tools Since t-tools are based on ’mean’. Solution: When you have an outlier.

Alternative tools that do not require model assumptions (Chapter 4) 3 17 / 27 .5) to see if the transformed data looks “nicer” b. and evaluate appropriateness of t-tools: 1 2 think about possible cluster and serial eﬀects evaluate the suitability of t-tools by examining graphical displays (side-by-side histograms or box plots) consider alternatives a. using available data.Intrduction Robustness Resistance Transformation Outlier Practical Strategies for the Two-Sample Problem Our task is to size up actual conditions. Transform the data (Section 3.

.Intrduction Robustness Resistance Transformation Outlier Transformations of Data For positive data.71828. log(1) = 0 log(ex ) = x log function log(x) −2 0 −1 0 1 2 2 4 x 6 8 10 18 / 27 . the most useful transformation is the logarithm (log). particularly the natural (base e) logarithm (e = 2..).

before transformation 1200 after transformation 800 1000 log(y) −2 0 2 0 200 400 600 4 6 1 2 3 4 5 1 2 3 4 5 19 / 27 . with the group with the larger average having a greater spread. then a log transformation could be a good choice.Intrduction Robustness Resistance Transformation Outlier When Do We Use Log Transformation In a set of data. then a log transformation if samples are skewed. if the ratio might be useful. max min > 10.

772064 6 244.5 Unseeded 5. with the “seeded” days having a larger average and a greater spread.with(case0301. > max(case0301$Rainfall[case0301$Treatment=="Seeded"])/ + min(case0301$Rainfall[case0301$Treatment=="Seeded"]) [1] 669.498397 before transformation 8 after transformation 2500 2000 log(Rainfall) Unseeded Seeded 1500 1000 500 0 0 2 4 6 Unseeded Seeded 20 / 27 .092241 2 830.6586 > max(case0301$Rainfall[case0301$Treatment=="Unseeded"])/ + min(case0301$Rainfall[case0301$Treatment=="Unseeded"]) [1] 1202.6 > case0301$logRain <.6 Unseeded 7.2 Unseeded 5.919969 4 345. log(Rainfall)) > head(case0301) Rainfall Treatment logRain 1 1202.844993 5 321.Intrduction Robustness Resistance Transformation Outlier Cloud Seeding .Transformation Recall both groups are skewed.4 Unseeded 5.1 Unseeded 6.3 Unseeded 5.721546 3 372.

p-value = 0.sided".3904045 sample estimates: mean in group Unseeded mean in group Seeded 3.9846 After: > t. + alternative="less".test(Rainfall ~ Treatment. p-value = 0.5444.9982. df = 50.Intrduction Robustness Resistance Transformation Outlier Two-Sample t-Analysis Before: > t.134187 There is convincing evidence that seeding increased rainfall. + var.224179 1.5885 441. data=case0301) Two Sample t-test data: Rainfall by Treatment t = -1.431851 sample estimates: mean in group Unseeded mean in group Seeded 164.test(logRain ~ Treatment. data=case0301.equal=TRUE. alternative="two.990406 5.equal=TRUE) Two Sample t-test data: logRain by Treatment t = -2. var.05114 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -556.007041 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -0. 21 / 27 . df = 50.

138614 We interpret this in the following way: “The volume of rainfall produced by a seeded cloud is estimated to be 3.14 times as large as the volume that would have been produced in the absence of seeding”. It is estimated that the response ¯ ¯ of an experimental unit to treatment 2 will be eZ2 −Z1 times as ¯ large as its response to treatment 1 (where Z1 = average of log(Y1 )).mean(case0301$logRain[case0301$Treatment=='Seeded']) > m2 <.m2) [1] 1.mean(case0301$logRain[case0301$Treatment=='Unseeded']) > (diffmeans <.143781 > (est. > m1 <.mult.m1 .Intrduction Robustness Resistance Transformation Outlier Multiplicative Treatment Eﬀect Deﬁnition: Suppose Z = log Y.effect <.exp(diffmeans)) [1] 3. 22 / 27 .

var. df = 50.786 times.129 to 0.7859476 attr(.95 > exp(test$conf. p-value = 0. data=case0301.0466973 -0.int [1] -2.equal=TRUE)) Two Sample t-test data: logRain by Treatment t = -2.1291608 0.2408651 sample estimates: mean in group Unseeded mean in group Seeded 3.95 A 95% conﬁdence interval for the multiplicative eﬀect of unseeding/seeding is 0.t.990406 5.test(logRain ~ Treatment.int) [1] 0.Intrduction Robustness Resistance Transformation Outlier Conﬁdence Interval > (test <.level") [1] 0."conf.level") [1] 0.5444.134187 > test$conf."conf. 23 / 27 .01408 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.2408651 attr(.0466973 -0.

The idea is to perform analysis with and without the suspected outliers. 24 / 27 . leave the observation out of the analysis Often there is no way to know what caused the outlier(s).see Display 3. report both results. If both analyses give same answer. Two tools exist: employ resistant statistical tool (Chapter 4) adopt a careful strategy .6 in the text. If not. correct the observation if not.Intrduction Robustness Resistance Transformation Outlier A Strategy for Dealing with Outliers If outlying observation resulted from measurement error or contamination from another population: if the right value is known. only report results INCLUDING suspected outliers.

] Country Life Income Type 15 Portugal 68. data[16.2 10000 Industrialized 17 Sweden 74.10000 # set it as an outlier > data[15:17.1 2963 Industrialized > ### dealing with Missing data ### > (cc <.1 2963 Industrialized TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 25 / 27 . na.1 956 Industrialized 16 South_Africa 68.7 5596 Industrialized > > d1 <. 'Income'] <. ] Country Life Income Type 15 Portugal 68.7 5596 Industrialized 18 Switzerland 72.cases(ex0327)) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE [16] FALSE TRUE TRUE TRUE TRUE TRUE TRUE > data2 <. ex0327[15:17.subset(data. Income < 8000).Intrduction Robustness Resistance Transformation Outlier Removing Outliers and Other Data Points > library(Sleuth2).1 956 Industrialized 17 Sweden 74.1 956 Industrialized 17 Sweden 74.ex0327[cc. ] Country Life Income Type 15 Portugal 68.2 NaN Industrialized 17 Sweden 74. ] Country Life Income Type 15 Portugal 68. data2[15:17.ex0327.7 5596 Industrialized > range(ex0327$Income.1 956 Industrialized 16 South_Africa 68.7 5596 Industrialized 18 Switzerland 72. d1[15:17.rm=TRUE) [1] 110 5596 > data <.complete. ].

Intrduction Robustness Resistance Transformation Outlier Q: How many conservative economists does it take to change a light bulb? 26 / 27 .

27 / 27 .Intrduction Robustness Resistance Transformation Outlier A: None. they’re all waiting for the unseen hand of the market to correct the lighting disequilibrium.

- Fama Macbeth RevisitedUploaded bysundarbalakrishna
- New Support Vector AlgorithmsUploaded byMohammad Rofii
- Beta-Robust Solutions for the Fuzzy Open Shop SchedulingUploaded byinesgr35013
- Robustness AnalysisUploaded byEric Gits
- ResearchUploaded byBilal Rashid
- or_0367.pdfUploaded byBoobalan Dhanabalan
- Getting the Poor Out of the Ghettos - QM II ProjectUploaded bySyed Shoaib
- MCD_2010Uploaded byRahma Anisa
- OutliersUploaded bySt_Lair
- The Accrual Anomaly & The Company SizeUploaded byMuhammad Bahmansyah
- US Federal Trade Commission: 050114halwhiteUploaded byftc
- Chap9 - Examples Robust RegressionUploaded byjcmani1
- Identification of Outliersin Time Series Data via Simulation StudyUploaded byIOSRjournal
- Tt Test ProcedureUploaded bythiensu1177
- BADM701 Excel1 HW Variance Means Difference MyLabUploaded byMahesh Nadkarni
- Influence of Selected Variables NEW(1)Uploaded bynsima
- 03 TLS-Training Data Registration v1.0!30!01-2012Uploaded byyadiom
- spss20p2Uploaded byAnand Nilewar
- MTH 231 Proactive Tutors/snaptutorialUploaded bydisneytomandjerry21
- BADM 572 - Stats Homework Answers 6Uploaded bynicktimmons
- 1050-3200-1-PBUploaded byWahyunis
- ariiiinUploaded bydiekari
- zoboblish ist622 term projectUploaded byapi-319935653
- kajian statistik dmalysiaUploaded byNora Hafizan
- Stt363chapter6Uploaded byPETER

- beamer guideUploaded byAn-Sheng Jhang
- Charles Batts - Beamer TutorialUploaded byAhmet Demir
- Beamer by ExampleUploaded byMoh Nm
- Beamer User GuideUploaded byjoaosilveira8992
- Fancy BoxUploaded byAlfonso Ramirez
- Math FontsUploaded bytoancaobang
- Markov Chain Monte Carlo and Gibbs SamplingUploaded byp1mueller
- Arrays Intersection : Programming FortranUploaded byYang Yi
- 2006-4Uploaded byYang Yi
- 2009-7Uploaded byYang Yi
- Fortran MakefilesUploaded bytojur
- STAT 3022 Data Analysis class slides 1Uploaded byYang Yi

- Adaptation of International Marketing Strategy Components, Competitive Advantage, andFirm Performance: A Study of Hong Kong ExportersUploaded byRaphael Bahamonde
- Articulo 2 h. PyloriUploaded byValeria Ruiz velasco
- 7thgradetimecapsuleassignment autosavedUploaded byapi-326372120
- 2160.pdfUploaded bypollux23
- 6.3 - ANOVA.pptxUploaded byPiyush Singh
- The Dynamics of Climate ChangeUploaded bybhuvaneshkmrs
- Pre and Post Trip Image VariationsUploaded bySumedha Agarwal
- John Hamercheck MadisonVillageCouncilUploaded byThe News-Herald
- Ethical DilemmaUploaded byNikhil Khobragade
- projrct on performence appraisalUploaded byKrishnaa Paul
- New Microsoft Office Word Document (2)Uploaded byVipul Kansal
- DESIGN OF ALUMINIUM BOOM AND ARM FOR AN EXCAVATOR.pdfUploaded byBruno Santos
- ACORN International Remittance Justice CampaignUploaded byacorncan
- Fifth Quadrant Case Study: Customer Experience StrategyUploaded byFifth Quadrant
- Performance Appraisal at Manatec ElectronicsUploaded byViola
- Kuliah 2_b ADL.pdfUploaded byGintang Sulung
- A STUDY ON CUSTOMER SATISFACTION OF HONDA BIKES WITH REFERENCE TO CHINNAMANUR TOWN, THENI DISTRICT.Uploaded byIJAR Journal
- StatisticsUploaded byAnonymous b29gt5
- 8. Research Methodology BRSM 2017 Lecture v2Uploaded byNurain Atiqah
- pembagian jurnalUploaded byHammam Fariz
- Buckling Design for Web-post of Cellular BeamUploaded byCaesarAbdullah
- AmeGS01Uploaded byNIno Lenin Yupanqui Sanchez
- VIV PresenationUploaded bysatkmr
- ESOMAR_GMR2014_FullReportUploaded byMukesh Manwani
- pe20120aa32523-08Uploaded byayuza23
- Communication StrategyUploaded byJoannæ Roxannæ
- DP Statistical ForecastingUploaded bysoma sekhar vallabhaneni
- Trust and Commitment as Mediating Variables in the Relationship BetweenUploaded bychris_68
- Final BoschUploaded byvinaykulagod
- SDA 3E Chapter 6 (1)Uploaded byxinearpinger