Professional Documents
Culture Documents
Estimation
of How Salary and Expenditure is related?
Causal Relationships How GDP of a country depends on
various economic parameters?
I
Salary vs. Years of Experience
Salary vs. Levels of Educational
Regression Analysis & Diagnosis
attainments
1 2
Illustration 1 Illustration 2
The product manager in charge of a particular brand of A gold speculator is considering a major purchase of gold
children’s breakfast cereal would like to predict the bullion. He would like to forecast the price of gold 2 years
demand for the cereal during the next year. To use a from now (his planning horizon), using a forecasting
forecast technique, she and her staff list the following Technique. In preparation, he produces the following list
variables as likely to affect sales:
of variables:
3 4
5 6
1
Simple Linear Regression Simple Linear Regression Model
Simple linear regression involves one independent The equation that describes how y is related to x and
variable and one dependent variable. an error term is called the regression model.
The relationship between the two variables is The simple linear regression model is:
approximated by a straight line.
Regression analysis involving two or more y = b0 + b1x +e
independent variables is called multiple regression. where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
7 8
E(y) = b 0 + b 1x E(y)
9 10
E(y) E(y)
Intercept
b0 Regression line Intercept Regression line
b0
Slope b 1
Slope b 1 is 0
is negative
x x
11 12
2
Estimated Simple Linear Regression Equation Estimation Process
The estimated simple linear regression equation Regression Model Sample Data:
y = b0 + b1x +e x y
Regression Equation x1 y1
ŷ b0 b1 x E(y) = b0 + b1x . .
Unknown Parameters . .
b 0, b 1 xn yn
• The graph is called the estimated regression line.
• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value. Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b0 b1 x
b0 and b1 Sample Statistics
b0 , b1
13 14
min (y i y i ) 2
b1
( x x )( y y )
i i
where: (x x )
i
2
15 16
y-Intercept for the Estimated Regression Equation Example: Reed Auto Sales
Reed Auto periodically has a special week-long sale.
b0 y b1 x As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
17 18
3
Simple Linear Regression Estimated Regression Equation
Example: Reed Auto Sales Slope for the Estimated Regression Equation
Number of Number of b1 ( x x )( y y ) 20 5
i i
19 20
It helps to justify the regression model from all the aspect Relationship Among SST, SSR, SSE
of the study.
SST = SSR + SSE
The Diagnosis Parameters need to be tested are:
Coefficient of determination (R-square value) (y i y )2 ( yˆ i y )2 ( y i yˆ i )2
Sign of the correlation coefficient
where:
Testing the significance of b 1. (F-test)
SST = total sum of squares
Check whether b 1 =0 or not and the C.I of b 1. (t-test) SSR = sum of squares due to regression
P-values for explanatory variable(s) SSE = sum of squares due to error
21 22
23 24
4
Sample Correlation Coefficient Sample Correlation Coefficient
25 26
1. The error e is a random variable with mean of zero. To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
2. The variance of e , denoted by is the same for
2,
the value of b 1 is zero.
all values of the independent variable.
Two tests are commonly used:
3. The values of e are independent.
t Test and F Test
4. The error e is a normally distributed random
variable.
Both the t test and F test require an estimate of 2,
the variance of e in the regression model.
27 28
An Estimate of 2 An Estimate of
The mean square error (MSE) provides the estimate • To estimate we take the square root of 2.
of 2, and the notation s2 is also used. • The resulting s is called the standard error of
the estimate.
s 2 = MSE = SSE/(n 2)
where: SSE
s MSE
n2
SSE ( yi yˆ i ) 2 ( yi b0 b1 xi ) 2
29 30
5
Testing for Significance: t Test Testing for Significance: t Test
31 32
33 34
We can use a 95% confidence interval for b 1 to test The form of a confidence interval for b 1 is: t /2 sb1
the hypotheses just used in the t test. is the
b1 t /2 sb1 margin
H0 is rejected if the hypothesized value of b 1 is not b1 is the of error
included in the confidence interval for b 1. point where t / 2 is the t value providing an area
estimator
of /2 in the upper tail of a t distribution
with n - 2 degrees of freedom
35 36
6
Confidence Interval for b1 Testing for Significance: F Test
37 38
Rejection Rule
1. Determine the hypotheses. H0 : b1 0
Reject H0 if H a : b1 0
p-value <
or F > F 2. Specify the level of significance. = .05
where:
3. Select the test statistic. F = MSR/MSE
F is based on an F distribution with
1 degree of freedom in the numerator and
4. State the rejection rule. Reject H0 if p-value < .05
n - 2 degrees of freedom in the denominator or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)
39 40
41 42
7
Illustration 3
Car dealers across North America use the so-called Blue Book to help
them determine the value of used cars that their customers trade in
when purchasing new cars. The book, which is published monthly,
lists the trade-in values for all basic models of cars. It provides
alternative values for each car model according to its condition and
optional features. The values are determined on the basis of the
average paid at recent used-car auctions, the source of supply for
many used-car dealers. However, the Blue Book does not indicate the
value determined by the odometer reading and color, despite the fact
that a critical factor for used-car buyers is how far the car has been
driven and color choice. To examine this issue, a used-car dealer
randomly selected 100 3-year old Toyota Camrys that were sold at
auction during the past month. Each car was in top condition and
equipped with all the features that come standard with this car. The
dealer recorded the price ($1,000) and the number of miles
(thousands) on the odometer and color. The dealer wants to establish
the relationship through a regression model. #DATA
43 44