3 views

Uploaded by HemantGandhi

Project on Regression

- Econometrics Lecture
- Prediction of mandibular growth rotation, Assessment of the skieller, bjork and linde-hansen method.pdf
- 02 Nazia & Yasin
- ch3
- ma22_2015second_ps1 (1)
- The Role of Emotional Advertising on Consumer Buying Intention
- Local KMS activation server for Windows 7, Windows 8 Professional and Enterprise, Office 2
- A Review of Criteria of Fit for Hydrological Models
- chap005 garisson
- Quantitative Methods Summary 2012 - J Stegemann
- A Markov Chain Analysis of Structural Changes
- STAB22 Data Analysis Project Instruction-1-已转档
- MULT_REG
- http://balkanmine2017.com
- 3277941
- ME361_Meaurements_&_Metrology
- Regression Kann Ur 14
- QA CaseStud3&4
- wps3496
- Prediction of Maximal Aerobic Capacity in Severely Burned children nihms274086.pdf

You are on page 1of 18

On

BY

HEMANT GANDHI

2014B4A4PS763H

ACKNOWLEDGEMENT

Any work irrespective of its magnitude or complexity is always a group effort and is never fully

complete unless due gratitude is bestowed upon all who contributed to its success. I would like to

take the opportunity to thank Prof., ADDEPALLI RAMU Associate Professor, Birla Institute of

Technology and Science Pilani, Hyderabad Campus, for having given me this wonderful chance to

work under his guidance.

We are grateful to the administration of BITS Pilani, Hyderabad Campus for providing

opportunities to the students for development of their academic skills and logical thinking through

open ended study oriented activities.

CERTIFICATE

This is to certify that the project report entitled Least Square Regression submitted by Mr.

Hemant Gandhi (2014B4A4763H), in partial fulfilment of the requirements of the course MATH

F266 (Study Oriented Project), embodies the work done by him under my supervision and guidance.

Curve Fitting

Curve fitting is the process of constructing a curve, or mathematical function that has the best fit to

a series of data points, possibly subject to constraints.

It is frequently used in engineering. For example the empirical relations that we use in heat

transfer and fluid mechanics are functions fitted to experimental data.

Regression: Mainly used with experimental data, which might have significant amount of error

(Noise). No need to find a function that passes through all discrete points.

Interpolation: Used if the data is known to be very precise. Find a function (or a

series of functions) that passes through all discrete points.

Polynomial Interpolation Spline Interpolation

Least Square Regression

The method of least squares is a standard approach in regression analysis to the approximate

solution of over determined systems, i.e., sets of equations in which there are more equations than

unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the

residuals made in the results of every single equation.

The most important application is in data fitting. The best fit in the least-squares sense

minimizes the sum of squared residuals (a residual being: the difference between an observed value,

and the fitted value provided by a model).

Least squares problems fall into two categories: linear or ordinary least squares and non-linear least

squares, depending on whether or not the residuals are linear in all unknowns. The linear least-

squares problem occurs in statistical regression analysis; it has a closed-form solution. The non-

linear problem is usually solved by iterative refinement; at each iteration the system is approximated

by a linear one, and thus the core calculation is similar in both cases. Polynomial least

squares describe the variance in a prediction of the dependent variable as a function of the

independent variable and the deviations from the fitted curve.

Linear Regression

Linear least squares regression is by far the most widely used modeling method. It is what most

people mean when they say they have used "regression", "linear regression" or "least squares" to fit

a model to their data. Not only is linear least squares regression the most widely used modeling

method, but it has been adapted to a broad range of situations that are outside its direct scope.

Mathematically, linear least squares is the problem of approximately solving an over determined

system of linear equations, where the best approximation is defined as that which minimizes the

sum of squared differences between the data values and their corresponding modeled values. The

approach is called linear least squares since the assumed function is linear in the parameters to be

estimated.

Several possibilities to minimize the error (deviation) to get a best-fit line (to find a0 and a1) are:

Minimize the sum of squares of individual errors. This is the preferred strategy.

Minimizing the square of individual errors

Sum of squares of the residuals:

To do this set the derivatives of Sr wrt a0 and a1 to zero.

Or,

Solve these for a0 and a1. The results are:

The improvement obtained by using a regression line instead of the mean gives a measure of how

good the regression fit is.

Two extreme cases are

S = 0 -> r=1 describes a perfect fit (straight line passing through all points).

Sr = St -> r=0 describes a case with no improvement.

Usually an r value close to 1 represents a good fit. But be careful and always plot the data points

and the regression line together to see what is going on.

Linear regression is useful to represent a linear relationship.

If the relation is nonlinear either another technique can be used or the data can be transformed so

that linear regression can still be used. The latter technique is frequently used to fit the following

nonlinear equations to a set of data.

1. Exponential Equation

2. Power Equation

Polynomial regression

Polynomial regression is a form of linear regression in which the relationship between the

independent variable x and the dependent variable y is modeled as an nth degree polynomial in x.

Polynomial regression fits a nonlinear relationship between the value of x and the

corresponding conditional mean of y, denoted E(y |x), and has been used to describe nonlinear

phenomena such as the growth rate of tissues, the distribution of carbon isotopes in lake

sediments, and the progression of disease epidemics. Although polynomial regression fits a

nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the

regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For

this reason, polynomial regression is considered to be a special case of multiple linear regressions.

We can model the expected value of y as an nth degree polynomial, yielding the general polynomial

regression model

Table 2. Polynomial regression results for direction a

polynomial model

linear quadratic cubic

RMSE 7.876 3.011 1.295

MAPE 14.9473 4.8526 1.5763

R2 0.9233 0.9902 0.9984

R 2 0.9137 0.9874 0.9977

polynomial model

linear quadratic cubic

RMSE 5.357 2.542 2.732

MAPE 13.5912 3.0394 2.6997

R2 0.9638 0.9929 0.9929

R 2 0.9593 0.9908 0.9894

linear quadratic cubic quartic

RMSE 1.501 1.516 1.319 0.656

MAPE 26.2045 24.0227 19.7495 8.1552

R2 0.9467 0.9524 0.9691 0.9936

R 2 0.94 0.9388 0.9537 0.9885

Direction a:

The cubic polynomial regression model outperforms the other two models with lowest error statistics and

highest deterministic coefficient.

= (9.20 = (9.2000, 56.9503, 12.3007, 1.0521)T .

Least squares parameter estimates for this model

Direction b: We find that the quadratic polynomial regression model appears to fit the data best.

Least squares parameter estimates for this model are = (5.8667, 30.2242, 2.3636)T .

Least squares parameter estimates for this model are = (0.5000, 20.9751, 17.0268, 4.2906, 0.3590)T .

There are several possible uses of a regression model. One is understand the relationship between the

two or more variables. A more common use of a regression analysis is prediction, providing

estimates of values of the dependent variable (variables) by using the prediction equation. Point

predictions are not perfect and are subject to error. The error is due to the uncertainty in estimation as

well as the natural variation of points about the regression line.

We can compute e.g. 95 % prediction interval for strains a, b, c in particular directions marked

as a, b, c Figures 1(b), 2(b), 3(b) show the 95 % prediction interval for strains in particular directions

by using the best polynomial regression model.

R-squared

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also

known as the coefficient of determination, or the coefficient of multiple determination for multiple

regression.

The definition of R-squared is fairly straight-forward; it is the percentage of the response variable

variation that is explained by a linear model. Or:

0% indicates that the model explains none of the variability of the response data around its mean.

100% indicates that the model explains all the variability of the response data around its mean.

Plotting fitted values by observed values graphically illustrates different R-squared values for

regression models.

The regression model on the left accounts for 38.0% of the variance while the one on the right

accounts for 87.4%. The more variance that is accounted for by the regression model the closer the

data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the

variance, the fitted values would always equal the observed values and, therefore, all the data points

would fall on the fitted regression line.

R-squared cannot determine whether the coefficient estimates and predictions are biased, which is

why you must assess the residual plots.

R-squared does not indicate whether a regression model is adequate. You can have a low R-squared

value for a good model, or a high R-squared value for a model that does not fit the data!

Are Low R-squared Values Inherently Bad?

No! There are two major reasons why it can be just fine to have low R-squared values.

In some fields, it is entirely expected that your R-squared values will be low. For example, any field

that attempts to predict human behavior, such as psychology, typically has R-squared values lower

than 50%. Humans are simply harder to predict than, say, physical processes.

Furthermore, if your R-squared value is low but you have statistically significant predictors, you can

still draw important conclusions about how changes in the predictor values are associated with

changes in the response value. Regardless of the R-squared, the significant coefficients still represent

the mean change in the response for one unit of change in the predictor while holding other predictors

in the model constant. Obviously, this type of information can be extremely valuable.

A low R-squared is most problematic when you want to produce predictions that are reasonably

precise (have a small enough prediction interval). How high should the R-squared be for prediction?

Well, that depends on your requirements for the width of a prediction interval and how much

variability is present in your data. While a high R-squared is required for precise predictions, its not

sufficient by itself, as we shall see.

No! A high R-squared does not necessarily indicate that the model has a good fit. That might be a

surprise, but look at the fitted line plot and residual plot below. The fitted line plot displays the

relationship between semiconductor electron mobility and the natural log of the density for real

experimental data.

The fitted line plot shows that these data follow a nice tight function and the R-squared is 98.5%,

which sounds great. However, look closer to see how the regression line systematically over and

under-predicts the data (bias) at different points along the curve. You can also see patterns in the

Residuals versus Fits plot, rather than the randomness that you want to see. This indicates a bad fit,

and serves as a reminder as to why you should always check the residual plots.

Residuals

The difference between the observed value of the dependent variable (y) and the predicted value () is

called the residual (e). Each data point has one residual.

e=y-

Both the sum and the mean of the residuals are equal to zero. That is, e = 0 and e = 0.

Residual Plots

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on

the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a

linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

Non-Linear Regression

While simple and multiple linear regression functions are adequate for modeling a wide variety of

relationships between response variables and predictor variables, many situations require nonlinear

functions. Nonlinear regression is a form of regression analysis in which observational data are

modeled by a function which is a nonlinear combination of the model parameters and depends on one

or more independent variables. The data are fitted by a method of successive approximations.

Conclusion

Regression analysis is a statistical tool for the investigation of relationships between variables. The

multiple regression analysis is a useful method for generating mathematical models where there are

several (more than two) variables involved. Polynomial regression model is consisting of successive

power terms. Each model will include the highest order term plus all lower order terms (significant or

not). We can view polynomial regression as a particular case of multiple linear regression. Polynomial

models are an effective and flexible curve fitting technique. The most widely used method of

regression analysis is ordinary least squares analysis. This method works by creating a best fit line

through all of the available data points and parameter estimates are chosen to minimize error sum of

squares. Fitting a regression model requires several assumptions. Estimation of the model parameters

requires the assumption that the errors are uncorrelated random variables with mean zero and constant

variance. Tests of hypotheses and interval estimation require that the errors are normally distributed.

There are a number of advanced statistical tests that can be used to examine whether or not these

assumptions are true for any given regression equation.

Bibliography

http://users.metu.edu.tr/csert/me310/me310_5_regression.pdf

https://en.wikipedia.org/wiki/Least_squares

http://www.sciencedirect.com/science/article/pii/S1877705812046085

https://en.wikipedia.org/wiki/Linear_regression

https://en.wikipedia.org/wiki/Polynomial_regression

http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd141.htm

http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-

squared-and-assess-the-goodness-of-fit

- Econometrics LectureUploaded bySharif Jan
- Prediction of mandibular growth rotation, Assessment of the skieller, bjork and linde-hansen method.pdfUploaded byJose Collazos
- 02 Nazia & YasinUploaded byShehbaz Yaseen
- ch3Uploaded byAhmed Mohye
- ma22_2015second_ps1 (1)Uploaded byRiemann Soliven
- The Role of Emotional Advertising on Consumer Buying IntentionUploaded byMaria Hassan
- Local KMS activation server for Windows 7, Windows 8 Professional and Enterprise, Office 2Uploaded byVimukthi Twk
- A Review of Criteria of Fit for Hydrological ModelsUploaded byIRJET Journal
- chap005 garissonUploaded byJanelle Gollaba
- Quantitative Methods Summary 2012 - J StegemannUploaded byismuz
- A Markov Chain Analysis of Structural ChangesUploaded bymhxtv
- STAB22 Data Analysis Project Instruction-1-已转档Uploaded byRenu Kumari
- MULT_REGUploaded byKinjal Sheth
- http://balkanmine2017.comUploaded byMikan Radenko Mikanović
- 3277941Uploaded byawidyas
- ME361_Meaurements_&_MetrologyUploaded byKartikeya
- Regression Kann Ur 14Uploaded byAmer Rahmah
- QA CaseStud3&4Uploaded byAnn Marian Cadao
- wps3496Uploaded byDawin Morna
- Prediction of Maximal Aerobic Capacity in Severely Burned children nihms274086.pdfUploaded byerickmattos
- Final 79Uploaded byLavina Agrawal
- Determinants of Industrial ConcentrationUploaded byRainier Justine Ridao
- SBE12ch 14aUploaded byM
- 20081171104108Uploaded byJose Antonio Castillo Cardenas
- Audit Pricing India Pakistan BangladeshUploaded byMonirul Alam Hossain
- Describing the Food Sigmoidal Behaviour during Hydration Based on a Second-Order Autocatalytic KineticUploaded byAngel Marca Quispe
- HedgeFundOfFund[1]Uploaded byJames Liu
- 7-New-Trend-of-Promotional-Strategies.pdfUploaded byMD Rifat Zahir
- 17cbUploaded byShubhamAgarwal
- Cox Regression MethodsUploaded bySzu-Yu Kao

- Sensors Kalman FiltersUploaded byabdullahnisar92
- Oracle-0001.pdfUploaded byHeang Chantha
- Newton and LimitsUploaded byJackDunn135
- Civic EliasophUploaded byErik Mann
- Biochemical Cycles ENVIROUploaded byRomel Leo Alojado
- MEDIDAS DEL TRANSPORTE DE SEDIMENTOS EN RÍOS.Uploaded byIsidro Córdova
- Online Quiz 4 Efficient Portfolios Q&AUploaded byjon
- LTAP_MERZIFONUploaded byYigit Omer Ciftci
- How Does a Boomerang WorkUploaded byspaceskipper
- Daily Market Update 5Uploaded byTowfick Kamal
- Victory Briefs Framework Pt 3Uploaded byDanielHepworth
- BrianCarr_TresholdsOfTheHumanUploaded byRuhaia Petra
- Live Load Stresses in Railway BridgeUploaded bycivengbridge64
- US Food and Drug Administration: 02d-0526-emc0009-04-review-of-formal-comments-vol2Uploaded byFDA
- frontier Molecular Orbitals...Sigma-ligandsUploaded byRojo John
- Cct for Rate ReactionUploaded byFatinAzahra
- light wwsUploaded bymspalem
- A2Uploaded byakshay
- Chapter 1 Introduction 1Uploaded byMaverick Caermare Agulto
- BACAAN_ID_M03_2013 (2)Uploaded bysetiawan07
- Q 22Uploaded byPraba Karan
- [12] Where Are We in Topology OptimizationUploaded byLuisaGP
- Copy of Vinod CvUploaded byJinal Rudani
- Sage Chap17.pdfUploaded byDaniel Esteban Matapi Gómez
- SM Instruction Guide With Screenshots SLED Self MaintainerUploaded byNoel Mendoza
- Software Quality Assurance PlanUploaded byAhmad Shahid
- 10 Importing TopologiesUploaded bypuneeth560
- Statistical Key Figures - SAPUploaded byAni Nalitayui Lifitya
- Nikola Tesla Museum - Wikipedia, The Free EncyclopediaUploaded bybmxengineering
- Banerjee, Ayan_ Dutta Gupta, Subhasish_ Ghosh, Nirmalya-Wave Optics _ Basic Concepts and Contemporary Trends-CRC Press (2015)Uploaded byNitish Kumar