8 views

Uploaded by Sammy Ben Menahem

- Cost Estimation Predictive Modeling Regression Verses NN
- SAS Regression
- multiple regression
- Sample Past Paper CVEN2002
- Respiratory Health and PM10 Pollution 2
- PESE Influences on Olympic Performance
- Acce Course Modules
- Bangladeshi BANKS CMPARISON
- Statistics- Multiple Regression
- retail
- ESTIMATING R 2 SHRINKAGE IN REGRESSION
- ISO 21187 DE 2004 IDF 196 CUADRO CONVERSION.pdf
- Signal Strength Based Indoor Geolocation
- Lec03Pt5B
- SM
- Output Spss
- 70-307-1-PB (3)
- MCQs Simple Linear Regression
- hello world
- 7 Types of Regression Techniques You Should Know

You are on page 1of 55

Hongwei Zhang http://www.cs.wayne.edu/~hzhang

Statistics is the art of lying by means of figures. --- Dr. Wilhelm Stekhel

Acknowledgement: this lecture is partially based on the slides of Dr. Raj Jain.

Response Variable: Estimated variable Predictor Variables: Variables used to predict the response

Also called predictors or factors

Regression Model: Predict a response for a given set of predictor variables Linear Regression Models: Response is a linear function of predictors Simple Linear Regression Models: Only one predictor

Outline

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Outline

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Regression models attempt to minimize the distance measured vertically between the observation point and the model line (or curve)

The length of the line segment is called residual, modeling error, or simply error

The negative and positive errors should cancel out => Zero overall error

Many lines will satisfy this criterion

Choose the line that minimizes the sum of squares of the errors

Formally,

where, is the predicted response when the predictor variable is

be determined from the data. Given n observation pairs {(x1, y1), , (xn, yn)}, the estimated response for the i-th observation is:

The best linear model minimizes the sum of squared errors (SSE):

This is equivalent to the unconstrained minimization of the variance of errors (Exercise 14.1)

Outline

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Regression parameters that give minimum error variance are:

where,

Example 14.1

Example (contd.)

Example (contd.)

Derivation (contd.)

Derivation (contd.)

Least Squares Regression Least Absolute Deviations Regression Not very robust to outliers Simple analytical solution Robust to outliers No analytical solving method

(have to use iterative computation-intensive method)

The unstable property of the method of least absolute deviations means that, for any small horizontal adjustment of a data point, the regression line may jump a large amount. In contrast, the least squares solutions is stable in that, for any small horizontal adjustment of a data point, the regression line will always move only slightly, or continuously.

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Allocation of variation

The sum of squared errors without regression would be:

as follows:

Where, SSY is the sum of squares of y (or y2). SS0 is the sum of squares of and is equal to

Example

For the disk I/O-CPU time data of Example 14.1:

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Since errors are obtained after calculating two regression parameters from the data, errors have n-2 degrees of freedom SSE/(n-2) is called mean squared errors or (MSE)

Note:

SSY has n degrees of freedom since it is obtained from n independent observations without estimating any parameters SS0 has just one degree of freedom since it can be computed simply from SST has n-1 degrees of freedom, since one parameter must be calculated from the data before SST can be computed

SSR, which is the difference between SST and SSE, has the remaining one degree of freedom.

Overall,

Notice that the degrees of freedom add just the way the sums of squares do

Example

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Regression coefficients b0 and b1 are estimates from a single sample of size n => 1) Random; 2) Using another sample, the estimates may be different. If 0 and 1 are true parameters of the population (i.e., y = 0 + 1x), then the computed coefficients b0 and b1 are estimates of 0 and 1, respectively. Sample standard deviation of b0 and b1

The 100(1-)% confidence intervals for b0 and b1 can be computed using t[1-/2; n-2] --- the 1-/2 quantile of a t variate with n-2 degrees of freedom. The confidence intervals are:

If a confidence interval includes zero, then the regression parameter cannot be considered different from zero at the 100(1-)% confidence level

Example

Example (contd.)

The 0.95-quantile of a t-variate with 5 degrees of freedom is 2.015 => 90% confidence interval for b0 is:

Since, the confidence interval includes zero, the hypothesis that this parameter is zero cannot be rejected at 0.10 significance level => b0 is essentially zero.

Since the confidence interval does not include zero, the slope b1 is significantly different from zero at this confidence level.

Best linear models are:

The regressions explain 81% and 75% of the variation, respectively. Does ARGUS takes larger time per byte as well as a larger set up time per call than UNIX?

?

Intervals for intercepts overlap while those of the slopes do not. => Set up times are not significantly different in the two systems while the per byte times (slopes) are different.

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

CI for predications

Standard deviation of the prediction is minimal at the center of the measured range (i.e., when x = x); Goodness of the prediction decreases as we move away from the center.

Example

Example (contd.)

Example (contd.)

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters Confidence Intervals for Predictions Visual Tests for verifying Regression Assumption

Regression assumptions:

The true relationship between the response variable y and the predictor variable x is linear. The predictor variable x is non-stochastic and it is measured without any error. The model errors are statistically independent. The errors are normally distributed with zero mean and a

Any trend would imply the dependence of errors on predictor variable => curvilinear model or transformation

Any trend would imply that other factors (such as environmental conditions or side effects) should be considered in the modeling

Example

Summary

Definition of a Good Model Estimation of Model parameters Allocation of Variation Standard deviation of Errors Confidence Intervals for Regression Parameters & Predictions Visual Tests for verifying Regression Assumption

Homework#5

1.

(100 points)

- Cost Estimation Predictive Modeling Regression Verses NNUploaded byMubanga
- SAS RegressionUploaded byAdeyemi Odeneye
- multiple regressionUploaded byWinny Shiru Machira
- Sample Past Paper CVEN2002Uploaded byShean Hee
- Respiratory Health and PM10 Pollution 2Uploaded byhaddig8
- PESE Influences on Olympic PerformanceUploaded byIOSRjournal
- Acce Course ModulesUploaded byMuhammad Jawad Vohra
- Bangladeshi BANKS CMPARISONUploaded byLisan Ahmed
- Statistics- Multiple RegressionUploaded byDr Rushen Singh
- retailUploaded byAmeya Samant
- ESTIMATING R 2 SHRINKAGE IN REGRESSIONUploaded byInternational Jpurnal Of Technical Research And Applications
- ISO 21187 DE 2004 IDF 196 CUADRO CONVERSION.pdfUploaded byCatalina Soto
- Signal Strength Based Indoor GeolocationUploaded byLucas Vilela
- Lec03Pt5BUploaded byngyncloud
- SMUploaded byHarshitha Padmashali
- Output SpssUploaded byYuli Andriyani Halim
- 70-307-1-PB (3)Uploaded byAngelo Leal
- MCQs Simple Linear RegressionUploaded byEngr Mujahid Iqbal
- hello worldUploaded byaskus
- 7 Types of Regression Techniques You Should KnowUploaded byde19ch007
- AsaUploaded byHarsh Gupta
- Lecture 3Uploaded byNoahIssa
- 510-AsmahUploaded byMar Shrf
- render_04Uploaded byMohamed Ahmed Gutale
- Syllabus for MSc Statistics -5 year Integrated-effect from 2018-19-28072018.pdfUploaded byKarthika S
- Transportation Statistics: table 02 03Uploaded byBTS
- HW RegressionUploaded byAmit
- [JULIANO 2]_A Comparative Analysis of Predictors of Sense of Place DimensionsUploaded byGisele Carmézz
- abc vs teori kendala.pdfUploaded byNur Widi Hastuti
- St Chap005Uploaded byimran_chaudhry

- Revision New CostsingsUploaded bySammy Ben Menahem
- ps1Uploaded bySammy Ben Menahem
- Ess8 Im Ch11Uploaded bySammy Ben Menahem
- Managerial Accounting (Solutions)-1Uploaded bySammy Ben Menahem
- Chapt6Uploaded bySammy Ben Menahem
- Cost Accumulation CompUploaded bySammy Ben Menahem
- Appendix dUploaded bySammy Ben Menahem
- Algebra SolUploaded bySammy Ben Menahem
- Pre calUploaded bySammy Ben Menahem
- Problem Set 3 Day 5 Questions and AnswersUploaded bySammy Ben Menahem
- Ed5 IM - Solutions ManualUploaded bySammy Ben Menahem
- Thomson Lear Ning1267Uploaded bySammy Ben Menahem
- 31996 ANSUploaded bySammy Ben Menahem
- ADS_V1Uploaded bySammy Ben Menahem
- slo_02_acc230_08_testUploaded bySammy Ben Menahem
- Management Planning Presentation for BoeingUploaded bySammy Ben Menahem
- Simulation on Pricing for Different MarketsUploaded bySammy Ben Menahem
- Project 1 Summer 2014(1)-1-1Uploaded bySammy Ben Menahem
- 2014744536Uploaded bySammy Ben Menahem
- 05 Optimization ProblemsUploaded bySammy Ben Menahem
- Chapter 3 Worksheet_SolutionsUploaded bySammy Ben Menahem
- 724516538Uploaded bySammy Ben Menahem
- Excel xApplication Capstone Exercise001Uploaded bySammy Ben Menahem
- Risk of Return and BetaUploaded bySammy Ben Menahem
- YangYang StatsUploaded bySammy Ben Menahem
- 1103315079Uploaded bySammy Ben Menahem
- 880165719Uploaded bySammy Ben Menahem

- 17-NetworkModels.pdfUploaded byMatheus Silva
- lec_no._11_n_12....25.3.13Uploaded byAbhishek Sharma
- Radial Basis Function pptUploaded bySid Jain
- Assignment and Transportation ProblrmUploaded byHarish Srinivas
- Bilinear Transform.pdfUploaded byIsh War
- spra322Uploaded bySunilkumar Bhat
- Feature Extraction Using Recursive Cluster-Based Linear Discriminant With Application to Face RecognitionUploaded byPradip Kumar
- 1936420236 a i 21Uploaded byJessé Oliveira
- Runge–Kutta methods - Wikipedia, the free encyclopediaUploaded byRadzi Rasih
- Paper Id276Uploaded byZellagui Energy
- PythonUploaded byHarsh Jain
- Coba2 SummaryUploaded byUmmi Khairun Niswah
- Signals and Systems 01Uploaded byIshi Vempi
- Chap6.1-KernelMethodsUploaded byTruong Thi Tuyet Hoa
- V4I11-0368.pdfUploaded byomravi
- Computational Methods Lab Course PlanUploaded byM.Thirunavukkarasu
- Estimation TheoryUploaded byDebajyoti Datta
- A Quasi Universal Practical IMC AlgorithmUploaded bymaripopescu
- Fence Removal From ImagesUploaded byjerrykorulla91
- D2Uploaded byThomas Smith
- Dual Analysis by Stabilized Displacement and Equilibrium Meshfree MethodsUploaded byphuong2311
- IRJET- Image Steganography Using Lsb Bit-Plane SubstitutionUploaded byIRJET Journal
- Problem Set 1 SolutionsUploaded byAnshu Kumar Gupta
- skima2014_submission_139_1.pdfUploaded byMd. Saidul Islam
- How to Launch a Birthday Attack Against DESUploaded byMassimo Vitiello
- (1998) Recent Achievements in Off-line Handwriting in Recognition System 1998-Ivc98.PsUploaded byselotmani
- IT1251 Information Coding TechniquesUploaded bystudentscorners
- UPDRS tracking using linear regression and neural network for Parkinson’s disease predictionUploaded byAnonymous vQrJlEN
- f1Uploaded bykadriulas
- Project Euler Counting Fractions: SolutionUploaded byAndrew Lee