Professional Documents
Culture Documents
ANALYSIS
ANALYSIS
….
regressions.
SIMPLE LINEAR REGRESSION
form of Y = α + βX + ε
...
Where Y is dependent variable
Since we cannot fit exactly the line we need, as the case of inferential statistics is, we estimate
the relationship by Y = a + bX
….
a is estimating the α
b is estimating the β
b = n ΣXY – ΣX ΣY
n ΣX2 - ( ΣX )2
a = { ΣY - b ΣX }
n
INTERPRETATION OF THE REGRESSION COEFFIENTS
‘a’ and ‘b’ are sometime called regression coefficient and have the
following interpretations:
‘a’ ( Y – intercept shows the value that dependent variable will take
First, it shows the direction of the relationship. If its value is positive, then we say
the value of the slope is negative, then we understand that the two variables are
negatively related.
The second interpretation is that, it shows the amount by which Y will change by
●
Regression equation: The equation
of the regression line.
Example.....
●
Distance and transport cost from
the industry to the market are
presented in the table below;
….
Distance 2 4 6 8
(km) x
Transport 3 7 5 10
cost
(Million
Tsh) y
…...
●
a) Find the linear regression equation for the
following two sets of data.
●
b)Graph the regression equation using the data
points.
●
c) Describe the apparent relationship between
distance and transport cost.
●
d)Interpret the slope of the regression line in
terms of transport.
….
●
e) Use the regression equation to
predict the transport cost for the
market which is 3km away from the
industry.
SOLUTION
●
a) from,
●
b = n ΣXY – ΣX ΣY
n ΣX2 - ( ΣX )2
a = { ΣY - b ΣX }
n
…..
x 2 4 6 8
y 3 7 5 10
xy 6 28 30 80
x2 4 16 36 64
…..
●
b= 0.95 ,
●
a= 1.5
●
From Y = a + bX
●
The linear regression equation will be;
●
Y = 1.5 + 0.95X
….
●
b) Plot the graph.......
●
c) Because the slope of the
regression line is positive, transport
cost tends to increase as the distance
increases, which is no particular
suprise.
…..
●
d) Because x represents distance in Km and y
represents transport cost in million Tsh, the
slope of 0.95 indicates that for each unit change
in distance the cost increase by Tsh. 0.95
million.
●
e) For a 3km, x=3, and the regression equation
yields the predicted cost of
●
Y= 1.5 + 0.95 *3= 4.35
...
●
Interpretation: The estimated
transport cost for the market which
is 3km place is Tsh 4.35 million.
Individual Activity....
●
Given the following two sets of
data;
…..
x 1 2 3 4
y 2 4 5 7
…....
●
a) Find the least square
regression line.
●
b) Estimate the value of y when
x=10.
SIMPLE CORRELATION ANALYSIS
●
Definition........
●
The linear correlation coefficient is
a descriptive measure of the
strength and direction of the linear
(straight-line) relationship between
two variables.
…........
r = n ΣXY – ΣX ΣY
-1 ≤ r ≤ +1
INTERPRETATION OF THE CORRELATION
COEFFICIENT
-When ‘r’ is 1 or very close to it, then we say that, there is a very strong
Age 5 4 6 5 5 5 6 6 2 7 7
(X)
xy 425 412 420 410 445 490 396 570 338 490 336
x2 25 16 36 25 25 25 36 36 4 49 49
y2 7225 1060 4900 6724 7921 9604 4356 9025 2856 4900 2304
9 1
….
●
b = n ΣXY – ΣX ΣY
n ΣX2 - ( ΣX )2
a = { ΣY - b ΣX }
n
….
●
Y= a + bX,
●
Y=195.46 – 20.26 X
●
b) Draw the graph..... of price Vs Age.
….
●
c) Since the slope of the regression line is
negative, price tends to decrease as age
increases, which is no particular surprise.
r = n ΣXY – ΣX ΣY
{n ΣX2 - ( ΣX )2 }{ n ΣY2 - ( ΣY )2 }
…..
●
R = 11*(4732) – 58*975
sqr{11*326-(58)2 (11*96129-(975)2)}
●
R= -0.927
...
●
g) Interpretation:The linear correlation
coefficient, r = −0.924, suggests a strong
negative linear correlation between age and
price of car. In particular, it indicates that as age
increases, there is a strong tendency for price
to decrease, which is not surprising. It also
implies that the regression equation,
y ˆ = 195.47 − 20.26x is extremely useful for
making predictions.
...
●
h) Because the correlation coefficient,
r = −0.927., is quite close to −1, the data points
should be clustered closely about the
regression line
SCATTER DIAGRAMS
●
More specifically, R-squared gives you the
percentage variation in y explained by x-variables.
The range is 0 to 1 (i.e. 0% to 100%) of the variation
in y can be explained by the x-variables).
….
●
The coefficient of determination, R2, is similar to
the correlation coefficient, R. The correlation
coefficient formula will tell you how strong of a
linear relationship there is between two
variables. R Squared is the square of the
correlation coefficient, r (hence the term r
squared).
Finding R Squared / The
Coefficient of Determination
●
Step 1: Find the correlation coefficient, r (it may
be given to you in the question). Example, r =
0.543.
●
Step 2: Square the correlation coefficient.
0.5432 = .295
●
Step 3: Convert the correlation coefficient to a
percentage.
.295 = 29.5%
Meaning of the Coefficient of
Determination
●
The coefficient of determination can be thought of as a
percent. It gives you an idea of how many data points
fall within the results of the line formed by the
regression equation. The higher the coefficient, the
higher percentage of points the line passes through
when the data points and line are plotted. If the
coefficient is 0.80, then 80% of the points should fall
within the regression line. Values of 1 or 0 would
indicate the regression line represents all or none of the
data, respectively. A higher coefficient is an indicator of
a better goodness of fit for the observations.