You are on page 1of 54

Chapter Two

Bivariate Regression Mode


OR
Simple Linear Regression Model(LRM)

Misgana D.(Bsc,MSc)
The classical simple regression model
• After completing this unit the student will, among
others,
– be able to differentiate regression analysis from
correlation analysis
– be able to apply ordinary least squares (OLS)
method in a two variable regression analysis and
interpret the results.
– conduct a measure of goodness of fit of regression
estimates.
– construct hypothesis testing procedure for regression
coefficients
– apply the regression result to forecasting (prediction)
2.1 REGRESSION Vs CORRELATION
• Correlation Analysis: Measures the strength/degree and direction of
linear association between two variables (both are assumed to be
random)
• r = n 𝑿𝒀 − 𝑿 𝒀
𝒏 𝑿𝟐 − 𝑿 𝟐. 𝒏 𝒀𝟐 − 𝒚 𝟐
• Pearson‟s Correlation Coefficient (r) is a number between -1&1
which measures the degree to which two variables are linearly related.
• It also tells you three things about the relationship:
1. Strength? 2. Direction? 3. Significant?
• How strong is the relationship? How big is the number?
 1.0 (-1.0) = Perfect Correlation
 0.60 to 0.99 (-0.60 to -0.99) = Strong
 0.30 to 0.59 (-0.30 to -0.59) = Moderate
 0.01 to 0.29 (-0.01 to -0.29) = Weak
 0 = No Correlation
• When P-value is below 0.05, the correlation is statistically significant.
Con’t...
• How to interpret correlation value
• If correlation is < 0.3: Weak correlation
• If correlation is between 0.3 and 0.7: Moderate correlation
• If correlation is > 0.7: Strong correlation
• Regression Analysis: The process of estimating or predicting the
average value of one Dependent variable (assumed to be stochastic) on the
basis of the other independent variables (assumed to be non-stochastic).
• Regression is an equation that allows us to express the relationship
between two or more variables algebraically.
• In regression the independent variables help us to predict the values of the
dependent variable.
• Correlation shows magnitude and direction of relationship only and
prediction of one variable based on the other is not possible.
• The simple linear regression analysis helps you to find out the
relationship between two variables, its strength and direction.
The relationship is explained by Pearson correlation coefficient (r)
The strength is explained by coefficient of determination (R2) and
Coefficient of determination R2 = (r2)
Finally the direction of relationship is interpreted by “b” coefficient.
2.2 Simple Linear Regression Model
• Simple Linear regression model (Two-variable model) is the
single most useful tool in econometrician’s kit.
• The model has one input and one output variable only.
• It is the most elementary type of regression model that can
be expressed by the following equation:

Yi =a + bXi + ui
where
 i =used to index the observation of sample data (i= 1,2,3,...n)
Yi =dependent variable
Xi = Explanatory (independent) variable
a = constant (intercept) value and
b = slope of relationship
i =disturbance term or error term
 With simple regression analysis, we can predict the future value
based on historical data.
Dependent Variable Y; Explanatory Variable Xs
1. Y = Son’s Height; X = Father’s Height
2. Y = Height of boys; X = Age of boys
3. Y = Personal Consumption Expenditure
X = Personal Disposable Income
4. Y = Demand; X = Price
5. Y = Rate of Change of Wages
X = Unemployment Rate
6. Y = Money/Income; X = Inflation Rate
7. Y = % Change in Demand; X = % Change in the
advertising budget
8. Y = Crop yield; Xs = temperature, rainfall, sunshine,
fertilizer
6
Terminology and Notation
Dependent Variable Independent Variable(s)
 
Explained Variable Explanatory Variable(s)


Predictand Predictor(s)
 
Regressand Regressor(s)
 
Response Stimulus or control
 variable(s)

Endogenous Exogenous(es)
7
Example of Simple linear regression model…
Given: Salary = a + b Edu + Ui
where
o Salary is measured in birr per year
oEdu is measured in years of schooling
Q1. What are the factors included in the error term(Ui)?
Answer: Work experience, Age, Gender , marital status
(married or single), race(white or non- white) etc.
Two-variable Regression Model…
o The observed value of Yi is the sum of two parts.

Why should we add the Error Term (Stochastic component ) to


the Econometric model ?
o The term i (i)is a random disturbance, so named as it
disturbs the stable relationships and added to the model b/s it
serve three main purposes:
 Captures the unobserved effect of all other influences on Y.
 Captures any approximation error that arises.
 Captures any elements of random behavior present in each
individual.
.
Two-Variable Regression Model…
Two-variable Regression Model…
• The line represents the exact part of the relationship
and the deviation of the observation from the line
represents the random component of the
relationship.
 The derivation of the observation from the line may
be attributed to several factors.
 Omission of variables from the function
 Random behavior of human beings
 Imperfect specification of the mathematical form
of the model
 Error of measurement
 Error of aggregation or collective error
2.4 Methods of Parameter Estimation
• The parameters of the simple linear regression model
can be estimated by various methods.
• Three of the most commonly used methods are:
1.Ordinary least square method (OLS)
2.Maximum likelihood method (MLM)
3.Method of moments (MM)
 All the three methods Listed above use different techniques
(i.e., different routes) but all of them leading to the same
destination (i.e., same conclusion), which is fitting the best
(or parsimonious) model to the data under investigation.
• The most common method used to fit a line to the data is
known as Ordinary Least Squares (OLS) method.
• Hence, here we will deal only with the OLS.
The ordinary least square (OLS) method
• The model is called the true
relationship between Y and X because Y and X
represent their respective population value, and are
called the true parameters since they are estimated from
the population value of Y and X. But it is difficult to
obtain the population value of Y and X so we are
forced to take the sample value of Y and X.
• Hence, the model , is called
estimated relationship between Y & X.
• The parameters estimated from the sample value of Y
and X are called the estimators of the true parameters
and are symbolized as 𝛼 and 𝛽 .
2.5 Estimation: Deriving the OLS estimates
• How do OLS works?
• The starting point: PRF:

SRF:
• Estimation of by OLS involves
finding values for the estimates 𝛼 and 𝛽 which
will minimize the sum of square of the squared
residuals ( ).
Conti……
Estimation: Deriving the OLS estimates…

To find the values of 𝛼 and 𝛽 that minimize this


sum( ) we have to partially differentiate
with respect to 𝜶 and 𝜷 and set the partial
derivatives equal to zero.
Estimation: Deriving the OLS estimates…
Derived Formula for the OLS estimates…

Therefore, the estimated simple regression model


is
Conti….
• The derived Formula to obtain the values of 𝛼 and 𝛽 :
( X  X )(Y  Y )
ˆ 
( X  X ) 2

• where = =

 The Estimated simple regression model is


Method2: Drive the short cut formula
Let Yi=α +Xi+ Ui……………(1) Population reg.fcn
• Applying summation notation to both sides of equ.(1),we
have Yi=  α +  Xi+  Ui
• Dividing both sides by n, we have
Yi/n=  α/n +  Xi/n+  Ui/n
Yi = α + β(Xi) + (ui) …….(2)
• Subtracting equation 2 from equation 1,gives
Yi-Y= (X-Xi) + Ui- Ui
 Yi=xi+ei ----------------------(3)
• The OLS estimator of  from equation (3) is given by
 =∑xi yi Where xi = Xi−X , yi = Yi−Y
∑xi2
and
Example
Worked examples
Hypothetical data on weekly family consumption expenditure Y and weekly
income X in thousands of birr for a sample of 3 HHs is given below.
Yi Xi
10 30
20 50
30 60
a. Develop the simple regression model using OLS method.
b. Interpret your result.
c. If the value of weekly income (X) is 45, predict the value of weekly
consumption(Y).
d. Find the correlation coefficient r and interpret your result
Solution
To fit the regression equation we do the following computations.
Yi Xi YiXi Xi2
10 30 300 900
20 50 1000 2500
30 60 1800 3600
sum 60 140 3100 7000
mean Y=20 X=140
3

= - =3(3100)-(140)(60) =0.64
2
n -( 3(7000)-(140)2
0= Y- 1X =20-0.64(140/3) =20-29.87=-9.87
Therefore, the fitted regression model is given by: Yi = -9.87+0.64 Xi
b) Interpretation,
 The value of the intercept term, 0=-9.87 thousands, means, when the HHS
disposable income is zero, the HHS consumption expenditure is ten
thousand birr through dissaving.
 The value of the slope coefficient (=0.64) means, when the HHS disposable
income increases by 1 birr, their consumption increases by 0.64 cent.
c) At x= 45 thousand birr, Yi = -9.87 +0.64 Xi =-9.87+ (0.64)(45)=18.93
thousand birr= 18,930 birr.
Example for Derived formula
Q2. Find the Regression equation for the data under example 3.1,
using the shortcut formula.
To solve this problem we proceed as follows.
Y X Yi=Y-y xi=X-x xiyi xi2
10 30 -10 -16.67 166.67 277.78
20 50 0 3.33 0.00 11.11
30 60 10 13.33 133.33 177.78
Sum 60 140 0 0 300 466.67
mean 20 46.67

= = 300 = 0.64
466.67

= Y-X =20-0.64(140/3) =20-29.87=-9.87

Therefore, the fitted regression model is given by:


Yi = -9.87+0.64 Xi
Individual Assignment(10%)
1. The following table gives the advertisement cost and sales volume
in thousands of birr for a sample of 10 HHs.
Adv cost 2 6 8 8 12 16 20 20 22 26

Sales volume 58 105 88 188 117 137 157 169 149 202
a) Develop the regression model and Compute the values of
parameters  and  and interprets the values of  and .
b) If the level of advertisement cost be 27 thousand birr, what will
be the predicted sales volume?
c) Compute the Pearson correlation coefficient r and coefficient
of determination (R2) and interpret their results.
d) Use the deviation formula (Method 2) and calculate the value
of error term in the model.
2. Suppose you examined the effect of Education on salary and
formulated the econometric model as follows.
Salary=20+2.1edu+e
24
What kinds of variables do “e” in the above model represents?
Individual Assignment(10%)
1. Suppose a manager has been spending money year after year on
advertisement to promote the sales of his firm’s product. The annual sales figures
are in thousands of birr and ad-expenditure is in millions of birr, as presented
below.
Ad-Exp : 5 8 10 12 10 15 18 20 21 25
Sales : 45 50 55 58 58 72 70 85 72 85
Required: By OLS method,
a) Develop the simple Regression model and interpret the value of a and b
b) If managers decided to spend 28 million birr in year 2000 ,then predict the
approximate sales volume in this year
c) Compute the Pearson correlation coefficient r and coefficient of determination
(R2) and interpret their results.
d) Use the deviation formula (Method 2) and calculate the value of error term in
the model.

2. Suppose you examined the effect of Education on salary and formulated the
econometric model as follows. Salary=20+2.1edu+e
25
What kinds of variables do “e” in the above model represents?
2.6 Assumptions of the LCRM
o The classical linear regression model(CLRM) consists
of a set of assumptions (commonly known as the
Gauss-Markov Assumptions) that describes:
o Forms of the model and relationships among its parts
and appropriate estimation and inference procedures.
A1:The regression model is linear in parameter.
i.e It may or may not be linear in Y and X.
Ex: Which of the ff model satisfy the assumption?
a) Yi = α+ βXi + Ui
b) LnYi = α+ β lnXi + Ui
c) Yi = α+ β Xi2+ Ui
d) Yi = α+ β2Xi+ Ui
A2: The mean value of the error term Ui is zero.
E(Ui)=0
Assumptions ….
A3: The variance of the error term Ui is constant called
Homoscedasticity. i.e Var(Ui) = 2
 A violation of this assumption widely known as
Heteroskedasticity (non-constant variance) leads to a very
high standard errors and inconsistent sample estimates,
which may lead to a wider confidence interval.
A4:There is no correlation between two error terms.
i.e Cov(Ui,Uj)=0 for ij.
 Of course, if i=j, there is autocorrelation problem and
occurs where successive disturbance terms are associated
with each other.
 This perhaps leads to high mean error and hypothesis-testing
problem, as well as, F-value could be meaningless.
Assumptions ….
A5: There are no perfect or exact relationship among
X- variables. i.e no multicolinearity problem.
 However, low correlations do not lead to inconsistence of
parameter estimates.
 A violation of independence assumption indicates that there is
multi-collinearity problem among the explanatory variables,
which leads to a very high value of coefficient of determination
and inconsistent parameter estimates.
A6: The error term Ui follows normal distribution.
i.e Ui (0, 2)
 A violation of this assumption occurs when there are
outliers in data set, and leads to problems of wider
confidence intervals and wrong hypothesis testing.
Assumptions ….
A7: Non-Endogeniety: any of the independent variables
should not be correlated with any error term.
that is, 𝐶𝑜𝑣 𝑋𝑖 , 𝜇𝑖 = 0
 A departure from this assumption known as
endogeniety problem, occurs where irrelevant
variables or lagged dependent variable (s) are
introduced as independent variable(s) in a model. This
leads to high standard error and inefficient parameter
estimates
Properties of OLS estimators
 If assumptions1-6 holds true,  and  determined by OLS
are BLUE.
 What do BLUE stands for?
B= Estimators are best
L= Estimators are linear
U= Estimators are unbiased
E= Estimators are efficient
 An estimator is called BLUE if:
A. Linear: Estimators are a linear function of the dependent
variable Y.
B. Unbiased: on average ,the estimators approach the true
population parameters.
C. Best: OLS estimators have minimum variance under the
class of linear and unbiased estimators.
D. Efficient: An unbiased estimator with the least variance is
known as an efficient estimator.
2.7 Model Validity Test
• How do you test whether the fit (or estimates) is good?
• The adequacy or validity of a regression model has been
checked using:
1) Coefficient of determination (R2) as Goodness of fit
2) ANOVA-Test (or F-statistic test ) as over all significance
3) T-statistic test as individual coeff. significance test
F-statistic is an overall test of the explanatory (or independent)
variables, while t-statistic is a test of significance for each (or
individual) explanatory variable, including the slope coefficient
and the constant term in a model.
a) R2-Tests of the „Goodness of fit‟
 This method determines whether a regression model is valid or
adequately fit the data under investigation.
 Now the total variation in Y, called TSS=Total Sum of Squares is
the decomposition of: RSS= Regression Sum of Squares and ESS=
Error Sum of Squares
• Mathematically, it is formulated as :

𝒚𝟐𝒊 𝒚𝟐𝒊 ℮𝟐𝒊


= +
𝑻𝒐𝒕𝒂𝒍 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏

Since OLS method estimates the parameters by minimizing


the ESS,
2
Computation of R
𝑹𝑺𝑺
1. R =
2 = 𝟏 − 𝑬𝑺𝑺/𝑻𝑺𝑺
𝑻𝑺𝑺
Where TSS=yi2 ,RSS= βˆ2xi2 = and ESS =TSS – RSS

2.

where xi = Xi − X and yi = Yi − Y)
3. R2=(r)2 Where r is the correlation coeffecient
• The coefficient of determination ranges between 0 and 1
inclusive, while correlation coefficient ranges between -1
and 1 inclusive.
–When R2=1→the model fits perfectly. i.e ESS=0
–When R2=0→the model explains nothing.ie RSS=0
The domain of R2
• The largest value that R2 can assume is 1
(in which case all observations fall on the regression line and
this can not happen in the Empirical work)
 R2 Closer to one indicates that the model is strong.
 A low value of R2 (R2 Closer to zero) indicate that:
 X is a poor explanatory variable in the sense that variation in
X leaves Y unaffected, or
 While X is a relevant variable, its influence on Y is weak as
compared to some other variables that are omitted from the
regression equation, or
 the regression equation is mis-specified (for example, an
exponential relationship might be more appropriate.
Interpretation of R2
 R2 measures the percentage of total variation of the
dependent variable that can be explained by the
changes in the explanatory variable(s) included in the
model.
What do we mean when R2 =0.9?
About 90% of the variation in the dependent variable Y ,is
explained by the regression line but 10% of the variation in y is
due to other factors included in the error term.
Notice:
 The proportion of total variation in the dependent
variable (Y) that is explained by X or by the regression
line is equal to: R2 x100%.
 The proportion of total variation in the dependent
variable (Y) that is not explained by X or due to
factors other than X is equal to: (1– R2) x 100%.
B) The Analysis of Variance (ANOVA)-F test
• A small value of R2 casts doubt about the usefulness of the
regression equation. We do not, however, pass final judgment on
the equation until it has been subjected to an objective statistical
test.
• Such a test is accomplished by means of analysis of variance
(ANOVA) which enables us to test the significance of R2 (i.e., the
adequacy of the linear regression model).
 To test for the significance of R2, we compare the variance ratio
with the critical value from the F distribution with k-1 and df =n-2
in the numerator and denominator, respectively, for a given
significance level α.
 Decision: If the calculated variance ratio exceeds the tabulated
value, that is, if Fcal > Fα (k-1,n-2), we conclude that R2 is
significant (or that the linear regression model is adequate).
 The F-test is designed to test the significance of all variables in a
regression model. In the two-variable model, however, it is used to
test the explanatory power of a single variable (X), and at the same
time, is equivalent to the test of significance of R2.
(ANOVA)-Table
• The ANOVA table for simple linear regression is given below
Analysis of Variance (ANOVA)-F test….
• F-value - measures the variance ratio of two independent
mean square errors estimates namely Regression Xi and
Residual i
• The ratio of these two values is known as F-value, which
assesses the significance of the overall effects of the
variables involved in the regression model; thereby testing
whether the model adequately represents or explains the
data.

where n = no of observation
k=no parameters estimated
Self test Exercise
Consider the following data on the percentage rate of change in
electricity consumption (millions KWH) (Y) and the rate of
change in the price of electricity (Birr/KWH) (X) for the years
1979 – 1994.
Summarized data:
• n = 16, x= 1.281, y= 23.427, xi2 = 92.201
• yi2= 13228.70, ∑xy = –779.235
Where xi = Xi − X and yi = Yi − Y)
Required
a. Estimation of regression coefficients βˆand the intercept αˆ
b. Test of model adequacy using R2. Is the regression model
adequate and useful for prediction ? Why? Justify your answer.
c. Test the overall significance of the estimated regression line using
F-test. What can you conclude?
Solution
a) Estimation of regression coefficients
•The slope β
ˆ and the intercept αˆ are computed as:
•βˆ = ∑xi yi=−779.235 =– 8.451
∑xi2 92.201

 αˆ =Y − βˆX

 αˆ =23.427- (− 8.451)(1.281) = 34.25


The Model: Y= 34.25–8.451X

b) Test of model adequacy using R2


 TSS=yi2 = 13228.7
 RSS= βˆ2xi2= (-8.451)2(92.201) =6585.679

 ESS =TSS – RSS = 13228.7 – 6585.679 = 6643.016


Therefore:
R2 = RSS = Explained variation=6643.016=0.4978
TSS Total variation 13228.70
 R2=0.4987=50%
•Interpretation
• About 50% of the variation in electricity consumption is due to changes in the
price of electricity. The remaining 50% of the variation in electricity
consumption is not due to changes in the price of electricity, but instead due to
chance and other factors not included in the model.
Solution…
c. Test of the overall significance = F-test
We want to test that the independent variable has no impact on Y.
Step1:State Ho and H1 as follows
(There is no relationship between X and Y)
(X has significant impact on y)
Step2: Read the F-tabled value at =5% with numerator dF=1 and
denominator DF =n-2.
 For α = 0.05, the critical value from the F-distribution is:
Fα (1, n-2 )= F0.05 (1,14) = 4.60
Step3: Calculate the F- test statistic (Variance Ratio)
Fcal= RSS/k = 6585.679 = 13.87916
ESS/n-2 474.5011
Decision: Since the calculated variance ratio exceeds the critical value,
we reject the null hypothesis of no linear relationship between price
and consumption of electricity at the 5% level of significance. Thus, we
then conclude that R2 is significant, that is, the linear regression model
is adequate and is useful for prediction purposes
Measuring the Standard error (SE) of: σˆ, 𝜶 and 𝜷
 Since population variance of the error term (homoscedastic variance) 2
is not known, it has to be estimated using sample variance of the error
term and is given by:

• The SE of Ui (σˆ ) is the square root of σˆ2 ).


 Moreover, since the population variance of error term σ2 is rarely
known, and in practice it is determined by the unbiased estimator σˆ2.
For this case, we use t-test instead of than Z-test.
 Thus, the standard error of an estimator is often described as a measure
of the precision of the estimator (i.e., how precisely the estimator
measures the true population value).
 The larger the standard error of the estimator, the greater is the
uncertainty of estimating the true value of the unknown parameters 
and .
Calculation of SE of 𝜶 and 𝜷
• The Standard error of OLS estimates 𝛼 and 𝛽 are computed as
follows given the standard error of an estimator (σˆ).
• Case 1: For Non-deviated or Normal data
2
 SE(𝛼 )= σˆ ∑X
𝑛∑X2 − (∑Xi)2
1
 SE(𝛽 )= σˆ ∑X2 − (∑Xi)2

Case 2:For deviated data from mean


 SE(𝛼 )= ∑X2 /σˆ
 SE(𝛽 )= ∑X2 /σˆ
 The SE(𝛼 ) and SE(𝛽 ) are often described as a measure of the
precision of the estimator (i.e., how precisely the estimator
measures the true population value).
Calculation of (SE) of 𝜶 and 𝜷
Example: Assume we have the following data
calculated from a regression of Y on a single variable X
and a constant over 22 observations.
∑XY =830,102 ∑X/n= 416.5 ∑Y/n= 86.65
∑X2=3,919,654 ESS= ∑e2 = 130.6 n=22
Compute the following
a) Find estimates 𝛼 and 𝛽 and set the regression model.
b) Compute the standard errors (SE) of 𝛼 and 𝛽
c) Test of model adequacy using R2. Based on R2, is the regression
model adequate and useful for prediction purposes? Why? Justify
your answer.
Solution:
• 𝛽 =22(830102) - (9163*1906.3)=0.35
22(3919654) – (416.5)2
 𝛼 = 8168-0.35(416.5)=-59.12
Hence,
𝑦 = -59.12+0.35X
1. SE of σˆ= 𝒖ˆí𝟐= 𝟏𝟑𝟎. 𝟔= 𝟏𝟑𝟎. 𝟔= 𝟏𝟑𝟎. 𝟔=2.55
n−2 22-2 20
2. The SE of OLS estimators are calculated as follows

SE(𝛼)= 2.55* 3919654


22(3919654)−(9163)2 =3.35

1
SE(𝛽 )= 2.55* (3919654)−(9163)2
=0.0079
Solution …
c) 𝑦 = -59.12 + 0.35X
(Se) (0.35) (0.0079)
t ? ?
Required: Calculate t-value for 𝛼 and 𝛽 from the above model.
Solution:
We can calculate the t-value for 𝛼 and 𝛽 as follows.
• tcal for 𝛼 = 𝛼 /se(𝛼)=-59.12/.35=-1.69
𝛽
• tcal for 𝛽 = 𝑆𝑒(𝛽)
= =0.35/0.0079=44.3
The Model summary report:
c) 𝑦 = -59.12 + 0.35X
SE (0.35) (0.0079)
t (-1.69) (44.3)
Case 3: An individual Significance t-test Approach
The interest of an econometrician is not only in obtaining the
standard estimator 𝛼 and 𝛽 but also using it to make inferences
about the true parameter  and .
• For this purpose, the error term is assumed to be normally
distributed and hence, the estimators are linear functions of the
error term, and can also be normally distributed.
 The theory of estimation consists of two parts: Point estimation
and Interval estimation.
 However, instead of relying on the point estimate alone, we may
construct an interval around the point estimator say, 95 percent
probability of including the true parameter value called interval
estimation. i.e.  lies in 𝜷1 ± tα/2 se (𝜷1)
NB: In statistics the reliability of point estimator is measured by its
standard error (SE).
Example 1
Suppose that we have the following regression results from
a sample size n=20 of consumption function.

 𝛽 = 0.7 how “reliable” the estimate?


 Required:
1) Using both the t-test of significance and confidence interval test
approaches, check the reliability of the model.
2) Check that both test approaches give the same answer.
NB: In measuring Precision, estimation is half battle and hypothesis
testing is another battle.
Significance test Approach
Case one: Point estimation and hypothesis testing
Step 1: State Ho and H1 for  i
• (There is no relationship between X and Y)
• (There is a significant relationship between X and Y)
Test the hypothesis i=0 at 5% significance level with df=n-k
𝛽−0 𝛽 0.7
Step 2: Compute 𝒕𝒄𝒂𝒍 = = = ≅ 3.3
𝑆𝑒(𝛽) 𝑆𝑒(𝛽) 0.21
𝛼 0.05
 Since  ≠ 0, it is a two tail test, hence we divide = =
2 2
0.025 and the t-tabled at t0.025 and df=18 from t-table is 2.10
Step3: Make a decision to accept Ho or to reject Ho
Since, tcal > ttable , reject H0 and accept H1, and conclude that:
 Bi is significant (i0).
 There is significant relation b/n X and Y.
Confidence interval test Approach
Case 2: Interval estimation and hypothesis testing
Step 1: state Ho or H1 for 𝜷1
(No relationship between X and Y)
• (There is a relationship between X and Y)
Step 2: Construct a 95 % CI for i using point estimate 𝜷 .
βˆ± tα/2 table se (𝜷1)
0.7 + 2.1(0.21)
0.7 + 0.441
(0.259,1.141)
Step3: Test the significance of the slope parameter using constructed
confidence interval at 5% significance level.
Decision: Make a decision to accept Ho or to reject Ho.
Since β=0 lies outside the confidence interval. So we can reject Ho,
and conclude that i0 is significant. (i,.e X and Y has relations).
Self-Test Exercise
• Suppose we have estimated the following
regression line from a sample of 20 observations.

where the values in the bracket are standard errors.


 Since 𝛽=2.88 is a single (point) estimate of the
unknown population parameter, how “reliable” the
estimate?
a) Construct 95% confidence interval for the slope of
parameter 𝛽=2.88
b) Test the significance of the slope parameter using the
constructed confidence interval at 5% level.
STATA/SPSS -Software significance test approach
• In the significance test procedure, the test statistic usually follows
a well-defined probability distribution such as the normal, t, F, or
chi-square.
• Once a test statistic (e.g., the t statistic) is computed from the data
at hand, its p value can be easily obtained. The p value gives the
exact probability of obtaining the estimated test statistic under the
null hypothesis. It is the lowest significance level at which a null
hypothesis can be rejected.
• In statistics, when we reject the null hypothesis, we say that our
finding is statistically significant. Some readers may want to fix α at
some level 1% or 5% or 10% and reject the null hypothesis if the p
value is less than α. That is their choice.
• In choosing the p value the investigator has to bear in mind the
probabilities of committing Type I and Type II errors.
Thank you!

You might also like