# Regression Analysis

Scatter plots
• Regression analysis requires interval and ratio-level data. • To see if your data fits the models of regression, it is wise to conduct a scatter plot analysis. • The reason?
– Regression analysis assumes a linear relationship. If you have a curvilinear relationship or no relationship, regression analysis is of little use.

Types of Lines

Scatter plot
Percent of Population with Bachelor's Degree by Personal Income Per Capita
Personal Income Per Capita, current dollars, 1999
40000

35000

30000

25000

20000 15.0 20.0 25.0 30.0 35.0

•This is a linear relationship •It is a positive relationship. •As population with BA’s increases so does the personal income per capita.
Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

Regression Line
Percent of Population with Bachelor's Degree by Personal Income Per Capita

Personal Income Per Capita, current dollars, 1999

40000

35000

30000

25000 R Sq Linear = 0.542

•Regression line is the best straight line description of the plotted points and use can use it to describe the association between the variables.

20000 15.0 20.0 25.0 30.0 35.0

Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

Things to remember
• • Regressions are still focuses on association, not causation. Association is a necessary prerequisite for inferring causation, but also:
1. The independent variable must preceded the dependent variable in time. 2. The two variables must be plausibly lined by a theory, 3. Competing independent variables must be eliminated.

Regression Table
Percent of Population with Bachelor's Degree by Personal Income Per Capita

Personal Income Per Capita, current dollars, 1999

•The regression coefficient is not a good indicator for the strength of the relationship. •Two scatter plots with very different dispersions could produce the same regression line.

Personal Income Per Capita, current dollars, 1999

40000

35000

30000

25000 R Sq Linear = 0.542

20000 15.0 20.0 25.0 30.0 35.0

Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

Percent of Population with Bachelor's Degree by Personal Income Per Capita

40000

35000

30000

25000 R Sq Linear = 0.463

20000 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00

Population Per Square Mile

Regression coefficient
• The regression coefficient is the slope of the regression line and tells you what the nature of the relationship between the variables is. • How much change in the independent variables is associated with how much change in the dependent variable. • The larger the regression coefficient the more change.

Pearson’s r
• To determine strength you look at how closely the dots are clustered around the line. The more tightly the cases are clustered, the stronger the relationship, while the more distant, the weaker. • Pearson’s r is given a range of -1 to + 1 with 0 being no linear relationship at all.

Model Summary Model 1 R .736 a R Square .542 Adjusted R Square .532 Std. Error of the Estimate 2760.003

a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

•When you run regression analysis on SPSS you get a 3 tables. Each tells you something about the relationship. •The first is the model summary. •The R is the Pearson Product Moment Correlation Coefficient. •In this case R is .736 •R is the square root of R-Squared and is the correlation between the observed and predicted values of dependent variable.

R-Square
M odel Summary M odel 1 R .736 a R Square .542 Adj usted R Square .532 Std. Error of the Esti mate 2760.003

a. Predi ctors: (Constant), Percent of Popul ati on 25 years and Over wi th Bachelor's Degree or M ore, March 2000 esti mates

•R-Square is the proportion of variance in the dependent variable (income per capita) which can be predicted from the independent variable (level of education). •This value indicates that 54.2% of the variance in income can be predicted from the variable education. •R-Square is also called the coefficient of determination.

M odel Summary M odel 1 R .736 a R Square .542 Adj usted R Square .532 Std. Error of the Esti mate 2760.003

a. Predi ctors: (Constant), Percent of Popul ati on 25 years and Over wi th Bachelor's Degree or M ore, March 2000 esti mates

•The adjusted R-square attempts to yield a more honest value to estimate the R-squared for the population. The value of R-square was .542, while the value of Adjusted R-square was .532. There isn’t much difference because we are dealing with only one variable. •When the number of observations is small, there will be a much greater difference between R-square and adjusted R-square. •By contrast, when the number of observations is very large, the value of R-square and adjusted R-square will be much closer.

ANOVA
b ANOVA

M odel 1

Regressi on Resi dual T otal

Sum of Squares 4.32E+08 3.66E+08 7.98E+08

df 1 48 49

M ean Square 432493775.8 7617618.586

F 56.775

Si g. .000 a

a. Predi ctors: (Constant), Percent of Popul ati on 25 years and Over wi th Bachelor's Degree or More, M arch 2000 esti m ates b. Dependent Vari abl e: Personal Incom e Per Capi ta, current dol l ars, 1999

•The p-value associated with this F value is very small (0.0000). •These values are used to answer the question "Do the independent variables reliably predict the dependent variable?". •The p-value is compared to your alpha level (typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably predict the dependent variable". •If the p-value were greater than 0.05, you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable, or that the group of independent variables does not reliably predict the dependent variable.

Coefficients
Coefficientsa Unstandardized Coefficients B Std. Error 10078.565 2312.771 Standardized Coefficients Beta Model 1 t 4.358 Sig. .000

(Constant) Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

688.939

91.433

.736

7.535

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

•B - These are the values for the regression equation for predicting the dependent variable from the independent variable. •These are called unstandardized coefficients because they are measured in their natural units. As such, the coefficients cannot be compared with one another to determine which one is more influential in the model, because they can be measured on different scales.

Coefficients
Coefficientsa Unstandardized Coefficients B Std. Error 13032.847 1902.700 Standardized Coefficients Beta Model 1 (Constant) Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates Population Per Square Mile t 6.850 Sig. .000

517.628

78.613

.553

6.584

.000

7.953

1.450

.461

5.486

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

•This chart looks at two variables and shows how the different bases affect the B value. That is why you need to look at the standardized Beta to see the differences.

Coefficients
Coefficientsa Unstandardized Coefficients B Std. Error 10078.565 2312.771 Standardized Coefficients Beta M odel 1 (Constant) Percent of Population 25 years and Over with Bachelor's Degree or More, M arch 2000 estimates t 4.358 Sig. .000

688.939

91.433

.736

7.535

.000

a. Dependent Variable: Personal Incom e Per Capita, current dollars, 1999

•Beta - The are the standardized coefficients. •These are the coefficients that you would obtain if you standardized all of the variables in the regression, including the dependent and all of the independent variables, and ran the regression. •By standardizing the variables before running the regression, you have put all of the variables on the same scale, and you can compare the magnitude of the coefficients to see which one has more of an effect. •You will also notice that the larger betas are associated with the larger t-values.

The Basic Regression Model Simple Linear Regression Y = a + bX

Multiple Linear Regression Y = a + b1X1 + b2X2 +b3X3

+…

Multiple Regression
M odel Summary M odel 1 R .849 a R Square .721 Adj usted R Square .709 Std. Error of the Esti mate 2177.791 a. Predi ctors: (Constant), Populati on Per Square Mi l e, Percent of Popul ati on 25 years and Over with Bachel or's Degree or M ore, March 2000 esti mates

ANOVAb Model 1 Sum of Squares 5.75E+08 2.23E+08 7.98E+08 df 2 47 49 Mean Square 287614518.2 4742775.141 F 60.643 Sig. .000 a

Regression Residual Total

a. Predictors: (Constant), Population Per Square Mile, Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates b. Dependent Variable: Personal Income Per Capita, current dollars, 1999

Multiple Regression
Coefficientsa Unstandardized Coefficients B Std. Error 13032.847 1902.700 Standardized Coefficients Beta

Model 1 (Constant) Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates Population Per Square Mile

t 6.850

Sig. .000

517.628

78.613

.553

6.584

.000

7.953

1.450

.461

5.486

.000

a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

Normal P-P Plot of Regression Standardized Residual

Dependent Variable: Jumlah Penjualan
1.0

0.8

Expected Cum Prob

0.6

0.4

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0

Observed Cum Prob

Tugas
• Pilih salah satu kasus pada bidang apa saja yang dapat analisis dengan Regresi Berganda • Olah Data anda menggunakan salah satu perangkat lunak yang sesuai. • Maknai hasil pengolahan data tersebut.