207 Ch14

F-207: CHAPTER 14
If you find any mistake, rebuke KMAT
If you are confused, contact SSD
If you are benefited, pray for MAL

Multiple Linear Regression Analysis (Cross-sectional data)
Types of data used for analysis

There are 3 types of data that econometricians might use for analysis:
1. Time series data
– Data on one or more variables of a single entity (i.e. individual, company or country etc)
over multiple time periods
– Ex: আশিকের দৈশিে শিগাকরট খাওয়ার ররের্ড।
2. Cross-sectional data
– Data on one or more variables of multiple entities at one point in time
– Ex: তাশিম আর আশমিু কের এই রিশমস্টাকরর ররজাল্ট।
3. Panel/Pooled/Longitudinal data
– Combination of time series and cross section data
– Data on one or more variables of multiple entities for multiple time periods
– E.g. আশমিু ে আর তাশিকমর িব রিশমস্টাকরর ররজাল্ট।
What is regression analysis?

Population Multiple Linear Regression Model:
Yi = β0 + β1x1i + β2x2i + β3x3i + ……. Βnxni + ei
Sample Multiple Regression Model: (এরর চকে যাকব আর রবটাকৈর আর y এর মাথায় টুশি বিকব)
𝑦̂i = 𝛽̂ o + 𝛽̂ 1x1i + 𝛽̂ 2x2i + 𝛽̂ 3x3i + ……. 𝛽̂ nxni
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships
between a dependent variable and one or more independent variables.
Regression analysis is a form of predictive modelling technique which investigates the relationship between
a dependent (target/regressed/explained) and independent variable (s) (predictor/regressor/explanatory).
⚫ One independent variable- Simple linear regression
⚫ More than one independent variables- Multiple linear regression

⚫ Regression analysis is used for-
✓ forecasting,
✓ time series modelling and
✓ finding the causal effect relationship between the variables
In regression analysis we use the independent variable(s) (X) to estimate the dependent variable (Y).
⚫ The relationship between the variables is linear.
⚫ Both variables must be at least interval scale.
⚫ The least squares criterion is used to determine the equation.
 Regression Equation: An equation that expresses the linear relationship between two variables.
 Least Square Principle: Determining a regression equation by minimizing the sum of the squares
of the vertical distances between the actual Y values ant the predicted values of Y. শরকেিি ইেুকয়িকি
মাি বিাই রয ফোফে আকি তার িাকথ আিে ফোফকের িাথডেয থােকব, এই িাথডেযগুোর বকগডর রযাগফে রযি িবকচকয়
েম হয়, অইভাকবই শরকেিি ইেুকয়িি বািাকিা হয়। িাথডেযটাকে Residual/Error বকে।
OLS (Ordinary Least Squares) estimator minimizes the average squared difference between the actual
values of Yi and the predicted value based on the estimated line।
The residual/error/disturbance term can capture a number of features:
– Model misspecification (i.e., We always omit some determinants of 𝑦 )
– There may be errors in the measurement of 𝑦 that cannot be modelled
– Random outside influences on 𝑦 which we cannot model
Assumptions of OLS (or linear regression model)

Assumptions: If it is not held/is broken:
Linear parameters OLS can’t be used
Zero Mean of Errors Intercept might be biased
Homoskedasticity Heteroscedasticity
Independent Errors Autocorrelation
No relation between Error and X Autoregressive models
Normally distributed Errors There must be outlier in the dataset
No Multicollinearity Multicollinearity
Assumption 1: Linear parameters (িবগুোর িাওয়ার ১ হকব। স্কয়ার, শেউব ইতযাশৈ থাো যাকব িা।)
- Fundamental issue: If this linearity assumption does not hold, OLS can’t be used! (অথডাৎ এটা িা হকে
এই মকর্ে বযবহারই েরা যাকব িা)
Assumption 2: Zero Mean of Errors
- Given the value of X, the mean, or expected, value of the random disturbance therm ui is zero.
- Meaning if we took a large number of samples, the mean disturbance/error would be zero
- If this assumption breaks, (i.e., if errors DON’T have zero mean, the regression intercept might be
biased.) (মাকি যশৈ আিে মাি আর মকর্কের মাকির িাথডেযগুোর গড় যশৈ িূ িয িা হয় তাহকে ধকর শিকত হকব রয
ইন্টারকিপ্টটাকত গন্ডকগাে আকে, এটা বায়াির্)
Assumption 3: Homoskedasticity
- Given the values of X, the variance of the errors is the same of all variables (i.e., error variance is
constant and finite)
- If this assumption breaks, (i.e., if variance of errors is NOT constant, the problem is called
Heteroscedasticity.)
- ধরোম মে চত্ত্বকরর চা শবশির িাকথ তািমাত্রার িম্পেড আকে। তািমাত্রা রবশি হকে চা েম শবশি হয়, েম হকে রবশি শবশি
হয়। এখি তািমাত্রার িাকথ চা শবশির শরকেিি েরকে আমরা রয ররজাল্টগুো িাব আর আিকে যত চা শবশি হয়, একৈর
মকধয িাথডেযগুোর ভযাশরকয়ন্স রবর েরকে রৈখব একৈর মকধয অত িাথডেয িাই, েন্সটযান্ট বয়াে যায়।
- শেন্তু তািমাত্রার িাকথ ধরোম রোি শটচার ক্লাকি একি েয়জিকে ঝাশড় রৈয় তার শরকেিি রবর েরকত চাশি, রৈখা যাকব
শরকেিি মকর্কের রেশর্েিকির িাকথ আিে ঝাশড়র িশরমাকের িাথডেযগুোর ভযাশরকয়ন্স শিে িাই, রোি শৈি অকিে রবশি
িাথডেয, রোিশৈি অকিে েম িাথডেয। এটা Heteroscedasticity। এটা হকে অই মকর্ে শৈকয়কতা রোি োভ িাই, মকর্কের
রেশর্শটিং এর রোি ক্ষমতাই িাই।
Assumption 4: Independent Errors
- The errors are linearly independent of each other

- We require that the error/disturbance terms are independently distributed (i.e., errors are not
correlated with each other). Also called, Serial independence
- If this assumption breaks, (i.e., if errors are correlated) the problem is called Autocorrelation.
- িাথডেযগুোর মকধয িম্পেড থাো যাকব িা। রযমি রোিশৈি তািমাত্রা েম, তাও স্টুকর্ন্ট িা আিায় চা শবশি েম হকো, আবার
রোিশৈি আশিে শিগাকরট রেকড় শিগাকরকটর শিশিে চা রখকয় রমটাকত শগকয় টািা চা খাওয়ায় চা শবশি রবকড় রগকে, তািমাত্রা
রবশি থাোর িকরও, এরেম ঘটিা ঘটকত িারকব। তকব আশিকের শটশজিংইকয় অশতষ্ট হকয় রোেজি ও আিকে চা রখকত
এশৈকে আকি িা, আবার িা আিকে চা রখকত আকি, এরেম হকে আশিকে আিা িা আিাটা শিকজই এেটা ভযাশরকয়বে হকয়
যাকব িাথডকেযর বৈকে।
Assumption 5: No relation between Error and X
- No relationship between the error term and the independent variables (X)
- If this assumption breaks, (i.e., if errors and X are related, there might be Autoregression problem
and we need to use Autoregressive models (Necessary in Time series modelling).
- রযটা বেোম, আশিে শিকজই িাথডকেযর োরে িা হকয় ভযাশরকয়বে হকয় যাওয়াটা। তখি শরকেিি আর োগকব িা, ওকর রৈকখ
চা ওয়াো বুকঝ যাকব আজকে শৈি খারাি।
Assumption 6: Normally distributed Errors
- Errors (𝑢𝑖 ) are normally distributed with mean zero and variance 𝜎 2 । অই রবে রিইি শর্শিশবউিকি
রযমি হয় আর শে।
- This assumption is needed to be able to make valid inferences about the population parameters
from the sample parameters, 𝛽̂0 , 𝛽̂1 অথডাৎ িযাম্পে রথকে িিুকেিকির স্বভাব যাচাই েরার জিয এটা ৈরোর।
- If this assumption breaks, (i.e., if errors are NOT normally distributed, there might be Outlier
problem in the dataset.) যশৈ এরেমতা িা হয়, তাহকে ধকর শিকত হকব রয ইয়াম্পে োকেেিি শিে িাই, শেেু বাৈ
রগকে, যার োরকে এরেম উদ্ভুতুকড় মাি আিকতকে।
- i.i.d errors = independent and identically distributed errors। By assuming that the errors are
independent and identically distributed, we can estimate the parameters of the model (such as the
coefficients) using maximum likelihood estimation or other statistical techniques. However, if the
assumption of IID errors is violated, the model may not be accurate and the estimated parameters
may be biased or inconsistent.
Assumption 7: No Multicollinearity (applicable for multiple linear regression)
- No (perfect/high) multicollinearity between two or more independent variables

- When there are multiple independent variables in the regression model, those independent
variables should not be highly correlated with each other.
- If this assumption breaks, (i.e., if Xs are highly correlated, there might be Multicollinearity problem
in the dataset.)
- চা শবশি হওয়ার িাকথ ধরোম তািমাত্রা আর েযাম্পাকি স্টুকর্ন্ট আিার িশরমাে িম্পশেডত। এখি এমি হওা যাকব িা রয
েযাম্পাকি উিশিশত আর তািমাত্রার িম্পেড আকে। মাকি িীকতর শৈি স্টুকর্ন্ট আকি িা, গরকমর শৈি এশির হাওা খাইকত
আকি। হযাাঁ, শেেু টা িম্পেড থােকত িাকর। শেন্তু খুব রবশি থােকে িমিযা।
Why are these OLS assumptions important

These assumptions are extremely important because the violation of any of these assumptions would make
OLS estimates unreliable and incorrect.
Specifically, a violation would result in-

– incorrect signs of OLS estimates, or
– the variance of OLS estimates would be unreliable, leading to confidence intervals that are
too wide or too narrow.
Properties of OLS estimator (OLS is BLUE!)

OLS Property# 1: Linear
OLS estimators are linear only with respect to the dependent variable and not necessarily with respect to
the independent variables.
OLS Property# 2: Unbiased
The estimator should ideally be an unbiased estimator of true parameter/population values. ধরোম, ১ হাজার
স্টুকর্ন্ট রথকে ওরা শৈকি েয় ঘণ্টা েকর িকড় জািার জিয ৫০ জি েকর েকর িযামিে শিকয় শরিাচড েরা হকব। শর্কিকন্ডন্ট ভযাশরকয়বে
হকি িড়ার ঘণ্টা, ইশন্ডকিকন্ডন্ট হকি, যাতায়াকত িময় বযয়, িামাশজে মাধযকম িময় বযয়, শটউিকি িময় বযয়। এখি এই ৫০ জি
েকর েএর েকয়েবার িযাম্পে শিকয় শরিাচড েরকে ১হাজার জিই গকড় েতটা িময় যাতায়াকত, িামাশজে মাধযকম আর শটউকিাকি
বযয় েকর জািা যাকব। এখি OLS estimator এ রয ভযােু গুো আিকব অইগুো িিুকেিকির েশতফেি হকব বো যায়।
OLS Property# 3: Best (i.e., Minimum Variance)
An estimator is best when it has the smallest variance among all estimators that are unbiased and
consistent.
আকগর উৈাহরেটা রটকিই বশে, েথকম ৫০ জি শিকয় এেটা মকর্ে আিকো, আবার িতুি েকর ৫০ জি শিকয় আকরেটা মকর্ে
আিকো, আর ৫০ জকির শিকয় আকরেটা, এভাকব শভন্ন শভন্ন আিকো, তকব একৈর মাি োোোশেই, একৈর মকধয রবশি ভযাশরকয়ন্স
িাই। এটা OLS এ োরকে িম্ভব,
Efficient OLS estimator
OLS estimator is called “efficient” when it is both unbiased and best (minimum variance).
1. If the estimator is unbiased but doesn’t have the least variance – it’s not the best!
2. If the estimator has the least variance but is biased – it’s again not the best!
3. If the estimator is both unbiased and has the least variance – it’s the best estimator.
Regression Statistics
Multiple R 0.896755299
R Square 0.804170066
Adjusted R Square 0.767451954
Standard Error 51.04855358
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.49 21.90118 6.56178E-06
Residual 16 41695.27717 2605.955
Total 19 212915.75
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 427.1938033 59.60142931 7.167509 2.24E-06 300.8444175 553.5431892 300.8444175 553.5431892
Temp (DF) -4.582662626 0.772319353 -5.93364 2.1E-05 -6.219906516 -2.945418736 -6.219906516 -2.945418736
Insulation (inch) -14.83086269 4.754412281 -3.11939 0.006606 -24.90976648 -4.751958899 -24.90976648 -4.751958899
Age (Yr) 6.101032061 4.012120166 1.52065 0.147862 -2.404282741 14.60634686 -2.404282741 14.60634686
Question 1: Make a Population Multiple Regression Model.
Yi = β0 + β1x1i + β2x2i + β3x3i + ei
Heating Costi = β0 + β1 Temperature(DF)i + β2 Insulation(Inch)i + β3 Age(Year)i + ei
Question 2: Make a Sample Multiple Regression Line.
𝑦̂i = 𝛽̂ o + 𝛽̂ 1x1i + 𝛽̂ 2x2i + 𝛽̂ 3x3i
̂ 𝐶𝑜𝑠𝑡i = 𝛽̂ o + 𝛽̂ 1 Temperature(DF) i + 𝛽̂ 2 Insulation(Inch)i + 𝛽̂ 3Age(Year)i

𝐻𝑒𝑎𝑡𝑖𝑛𝑔
Or, 𝐻𝑒𝑎𝑡𝑖𝑛𝑔
̂ 𝐶𝑜𝑠𝑡 i = 427.19 - 4.58*Temperature(DF)i – 14.83*Insulation(Inch)i + 6.10*Age(Year)i
Question 3: Interpret the intercept/constant, the coefficients/slopes. (Describe the Economic Significance
of the intercept/constant, the coefficients/slopes.)
Intercept, 𝛽̂ o = 427.19 means that when the values of all independent variables are 0, the estimated cost
is/will be 427.19 dollars.
Slope/Coefficient of ___(ইশন্ডকিকন্ডন্ট ভযাশরকয়বকের িাম) = ___ (অই ইশন্ডকিকন্ডন্ট ভায়শরকয়বকের রলাি) means that if ___
(অই ইশন্ডকিকন্ডন্ট ভায়শরকয়বে) increases 1 ___ (অই ইশন্ডকিকন্ডন্ট ভায়শরকয়বকের এেে, রযমি এখাকি ফাকরিহাইট, ইশি, বের)
, the heating cost will on an average (এই েথাটা গুরুত্বিূ েড) increase/decrease by ___ dollars, keeping all other
variables unchanged.
Ex: Slope/Coefficient of Temperature = -4.58 means that if temperature increases 1 DF, the heating cost
will on an average decrease by 4.58 dollars, keeping all other variables unchanged.
Question 4: What is the heating const of a house that has a temperature of 35 DF, 3 inch of insulation
and ages 6 year. What is the error for this house?
̂ 𝐶𝑜𝑠𝑡i = 427.19 - 4.58*35 – 14.83*3 + 6.10*6 = 259

𝐻𝑒𝑎𝑡𝑖𝑛𝑔
Error/residual, 𝑒̂ i = yi – 𝑦̂I = 250 – 259 = -9
Question 5: Statistical Significance.
⚫ This test is used to determine which independent variables have nonzero regression coefficients.
(রোি রবটার মাি ০ শেিা রিটা রচে েকর)
⚫ The variables that have zero regression coefficients are usually dropped from the analysis.
⚫ The test statistic is the t distribution with n-(k+1) degrees of freedom.
⚫ The hypothesis test is as follows:
H0: βi = 0
H1: βi ≠ 0
Reject H0 if t > t/2,n-k-1 or t < -t/2,n-k-1 (এখাকি k হকো িাম্বার অফ ইশন্ডকিকন্ডন্ট ভযাশরকয়বে)
𝑪𝒐𝒆𝒇𝒇𝒄𝒊𝒆𝒏𝒕−𝟎
= 𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑬𝒓𝒓𝒐𝒓
*িযার েশ্ন এটাও বেকত িাকর রয, Check whether the bi = 1/-1/রয রোি মাি, তখি ০ এর জায়গায় িযাকরর রৈয়া মাি বিাই
t value রবর েকর রটস্ট েরব।
এই েশ্ন আিকে ধাি ধাি েকর বেডিার ৈরোর িাই।

Residual 16 41695.27717 2605.955
Total 19 212915.75
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 427.1938033 59.60142931 7.167509 2.24E-06 300.8444175 553.5431892 300.8444175 553.5431892
Temp (DF) -4.582662626 0.772319353 -5.93364 2.1E-05 -6.219906516 -2.945418736 -6.219906516 -2.945418736
Insulation (inch) -14.83086269 4.754412281 -3.11939 0.006606 -24.90976648 -4.751958899 -24.90976648 -4.751958899
Age (Yr) 6.101032061 4.012120166 1.52065 0.147862 -2.404282741 14.60634686 -2.404282741 14.60634686
আর েকশ্ন এরেম শৈকয় রাখকে t value রবর েরারও ৈরোর িাই। িরািশর রটশবে ভযােু র িাকথ তুেিা েকর ররজাল্ট বকে শৈব।
If the null hypothesis is rejected, then the coefficient is statistically significant at 5% significance level.
েকশ্ন P ভযােু থােকে অইটা শৈকয়ও স্টাশটশস্টোে শিশিশফকেন্স রৈখাকিা যায়।
If, P ≤ Alpha/Significance Level, then Null Hypothesis is rejected. শতিটা রেকভকে শিশিশফকেন্ট হকত িাকর, 1%,
5% and 10% level রেকভকে। রযমি এই উৈাহরকে Age 10% level এ শিশিশফকেন্ট িা। রিকহতু 10% significance level এ
শহিাব েরকত বেকে িাে রে শরকজট েরা যাকব িা, এবিং বেকত হকব রয There is no relationship between Age and
heating cost.
Question 6: Using global test, comment on the validity of the whole regression model/equation. / Interpret
the global test. / Is the regression model valid? / Does any of the independent variable have any predicting
power/explaining power on the dependent variable? / Do the global test and comment.
The global test is used to investigate whether any of the independent variables have significant
coefficients.
The hypotheses are:
The test statistic is the F distribution with k (number of independent variables) and
n-(k+1) degrees of freedom, where n is the sample size.
Decision Rule:
Reject H0 if F > F,k,n-k-1
িরীক্ষায় একিাভা রটশবকে Fএর ভযােু রৈওা থােকব, িা থােকে এই

িূ ত্র শৈকয় রবর েরকত হকব। রবর েরার জিয ৈরোশর তথয েকশ্ন থােকব।
⚫ The computed value of F is 21.90, which is in the rejection region.
⚫ The null hypothesis that all the multiple regression coefficients are
zero is therefore rejected.
Regression Statistics
Multiple R 0.896755299
⚫ Interpretation: some of the independent variables (amount of insulation,
R Square etc.)
0.804170066do have the ability
to explain the variation in the dependent variable (heating cost). Adjusted

Or, RAt Squareleast
0.767451954
one of the variable is
Standard Error 51.04855358
not equal to zero. That means the regression is valid/ statisticallyObservations
significant. 20
আর যশৈ Probability এর মাি শৈকয় রৈওয়া হয়, (োে ৈাকগ শচশিত) তাহকে
ANOVA
এই মাি যশৈ শিশিশফকেন্স রেকভে রথকে েম হয়
তাহকে িাে শরকজট েরব। এখাকি, df SS MS F Significance F
Regression 3 171220.4728 57073.49 21.90118 6.56178E-06
Since the probability of the significance of F statistic is lower Residual 16 41695.27717 2605.955
Total 19 212915.75
than or smaller than 5%, then Null is rejected.
Question 7: Explain the stepwise regression (shown in the given picture).Coefficients

এটা েী Standard
ভাকোErrorহকয়কে
t Stat P-value
িা খারািLower 95%
হকয়কে? Upper 95%
Intercept 427.1938033 59.60142931 7.167509 2.24E-06 300.8444175 553.54318
Temp (DF) -4.582662626 0.772319353 -5.93364 2.1E-05 -6.219906516 -2.9454187
Insulation (inch) -14.83086269 4.754412281 -3.11939 0.006606 -24.90976648 -4.7519588
Age (Yr) 6.101032061 4.012120166 1.52065 0.147862 -2.404282741 14.606346
Temperature is selected first. This variable explains more of the variation in heating cost than any of the
other three proposed independent variables. Coefficient of this variable was significant. Garage is selected
next, followed by Insulation. Coefficient of these variables also came out to be significant in those steps.
As Standard error (S) is reducing and R-sq is increasing, we can comment that the regression model is
good/ getting better at predicting heating cost.
েথকম তািমাত্রা শৈকয় শরকেিি েকর রৈকখ তািমাত্রার রোএশফশিকয়ন্ট শিশিশফকেন্ট। আবার গযারাজ িহ শিকয় মাশল্টিে শরকেকিাি
েকর রৈকখ রয গযারাকজরটাও শিশিশফকেন্ট, এরির ইন্সু কেিি শিকয়ও এেই োশহশি। এখাকি রোি ধাকি িতুি ভযাশরকয়বে আিার
ির যশৈ রৈখতাম রয অই ভযাশরকয়বে শিশিশফকেন্ট িা, তাহকে বেতাম রয এটার explaining power িাই, এটার োরকে তখি
রৈখতাম standard error বা R-sq এরও রোি িশরবতডি িাই।
Question 8: Standard Error./ Comment on the effectiveness of the regression equation.
Standar Error = USD 51.05,
এেটা মাি শৈকয় োজ িাই, এরেম যশৈ েকয়েটা মাি থাকে তখি যশৈ রৈশখ রয এটা েমকতকে েশতটায়, তাহকে ভাকো। আর যশৈ
রৈশখ রয বাড়কতকে, তাহকে খারাি।
Question 9: Interpret Multiple R/ Correlation coefficient.
Multiple R denotes the type and strength of the relationship between independent and dependent
variables.
The value of the Correlation coefficient, Multiple R = 0.8968 means that there is a strong positive
relationship between the dependent variable and the independent variable.
* Multiple R can be any number between -1 and +1.
- 0 Means No Relationship
- +1/-1 হকে Perfect Positive/Negative Relationship
- +0.50/-0.50 এর েম হকে Weak Positive/Negative Relationship
- +0.50/-0.50 এর আকি িাকি হকে Moderate Positive/Negative Relationship
- +0.75/-0.75 এর রবশি হকে Strong Positive/Negative Relationship
Question 8: What percentage of the variation in the independent variable can be explained by the
combined variation in the independent variables? Coefficient of determination/ Goodness to fit. Comment
on the Coefficient of determination? Calculate the Adjusted R-sq.
R2=0.8041 means that the combined variation in the independent variables can explain about 80.41% of
the variation in the dependent variable.
Or, R2
Characteristics of the coefficient of multiple determination:
1. It is symbolized by a capital R squared. In other words, it is written as because it behaves like the
square of a correlation coefficient.
2. It can range from 0 to 1. A value near 0 indicates little association between the set of independent
variables and the dependent variable. A value near 1 means a strong association.
3. It cannot assume negative values. Any number that is squared or raised to the second power cannot
be negative.
4. It is easy to interpret. Because is a value between 0 and 1 it is easy to interpret, compare, and
understand
এটাও েকশ্ন িরািশর থােকত িাকর, িা থােকে িূ ত্র শৈকয় রবর েরকত হকব।
 Adjusted Coefficient of Determination (Adj. R-squared)
The number of independent variables in a multiple regression equation makes the coefficient of
determination larger. Each new independent variable causes the predictions to be more accurate.
If the number of variables, k, and the sample size, n, are equal, the coefficient of determination is
1.0. In practice, this situation is rare and would also be ethically questionable.
To balance the effect that the number of independent variables has on the coefficient of multiple
determination, statistical software packages use an adjusted coefficient of multiple determination.
Interpretation of regression coefficients (Levels & Logs)
⚫ 𝑌̂𝑖 = 𝛽̂0 + 𝛽̂1 𝑋𝑖 ; Both y and x are not logged.
– “If x changes by 1 unit, y changes by 𝛽̂1units”
মাকি ইশন্ডকিকন্ডন্ট ১ এেে (টাো/শর্শে/শেটার/রেশজ/রয রোি এেটা এেে) িশরবতডি হকে, শর্কিকন্ডন্টটা রোএশফশিকয়কন্টর মাকির
িাকথ গুে হকয় যত হয় তত এেে (টাো/শর্শে/শেটার/রেশজ/রয এেে টা ইশন্ডকিকন্ডকন্টর জিয বযবহার েরা হইকে, রিই এেই
এেকে) িশরবতডি হকব। আমাকৈরকে রযটা ক্লাকি েরাইকে অইটা এইটা।
⚫ log (𝑌̂)𝑖 = 𝛽̂0 + 𝛽̂1 log (𝑋𝑖 ); both are logged.
– “If x changes by 1%, y changes by 𝛽̂1%”. This is analogous to elasticity interpretation (e.g.
price elasticity of demand).
েকগর বযাখযা আর ররশিও ভযাশরকয়বকের বযাখযা percentage-এ হয়। ইশন্ডকিকন্ডন্ট এত িাকিডন্ট িশরবতডি হকে শর্কিকন্ডন্ট এত
percent িশরবতডি হকব। রযমি এখাকি, “Coefficient of age = -0.07729 means that if age increases 1%, the heating
cost will on an average decrease 0.07729%, keeping all other variables unchanged.”
⚫ 𝑌̂𝑖 = 𝛽̂0 + 𝛽̂1 log (𝑋𝑖 ); Only x is logged.
– “If x changes by 1%, y changes by (𝛽̂1/100) units”
যশৈ Dependent Variable িরমাে ইউশিকট হয়, আর independent variable েগ আোকর হয়, তখি coeffcient অকিে রবশি
রৈখায়। তখি coefficient রে ১০০ শৈকয় ভাগ েকর িাকিডকন্টকজ বেকত হয়। রযমিঃ “Coefficient of age = 14.50958 means
that if age increases 1%, the heating cost will on an average decrease (14.50958/100) or 0.1451 dollar,
keeping all other variables unchanged.”
⚫ log (𝑌̂)𝑖 = 𝛽̂0 + 𝛽̂1 𝑋𝑖 ; only y is logged.
– “If x changes by 1 unit, y changes by (100*𝛽̂1)%”
যশৈ Dependent Variable েকগ হয়, তাহকে independent variableগুোর coeffcient অকিে েম রৈখায়। উিকরর েশবকত
রিটা রৈখা যাকি। এমকিকত তািমাত্রার রোএশফশিকয়ন্ট শেে -4.58। Dependent Variable েকগ হওয়ায় রিটা হকয় রগকে -
0.0321। এমি অবিায় যশৈ বশে রয, “Coefficient of Temperature = -0.0321 means that if temperature increases
1 DF, the heating cost will on an average decrease by 0.0321 dollars, keeping all other variables
unchanged.” তাহকে ভুে হকব। রিজিয রটম্পারাচাকরর রোএশফশিকয়ন্টকে ১০০ শৈকয় গুে েরকত হকব। আর বেকত হকব,
“Coefficient of Temperature = -0.0321 means that if temperature increases 1 DF, the heating cost will on
an average decrease (0.0321*100) or 3.21%, keeping all other variables unchanged.”
িরীক্ষায় এই ভযাশরকয়বেগুো এেকে আকে িাশে েকগ আকে, এটা রৈকখ উত্তর েরকত হকব।
(Normally the values in absolute form needs to be converted into log form. -MIH)
Given Interpretation
Dependent Variable (y) Independent Variable(x) Change of X will be in Change of Y will be in
Unit Unit 1 Unit β Unit
Log Log 1 Percent β Percent
Unit Log 1 Percent (β÷100) Unit
Log Unit 1 Unit (β ×100) Percent
Regression Models with Interaction
এতক্ষেকতা এেটা এেটা ভযাশরকয়বকের রোএশফশিকয়ন্ট ইন্টারকেট েরশে। এখি ৈু ইটা ইশন্ডকিকন্ডন্ট ভযাশরকয়বে যশৈ এেিাকথ
িশরবতডি হয় তাহকে েী হকব? তখি জাস্ট ৈু ইটা ইকফট গুে েকর শৈব। রযমি এখাকি ধরোম রটম্পাকরচার আর ইন্সু কেকিাকির
ইন্টাকরেিি েরব, তাহকে 3×35 = 105। এভাকব গুে েকর েকর িতুি এেটা ভযাশরকয়বে শিকয়ট েকর আবার শরকেিি েরকত হকব।
The regression equation is:
Is the interaction variable significant at 0.05 significance level?
মাকি এই অই িতুি ভযাশরকয়বকের t-test েকর রৈখা যাকি রয, িাে হাইকিাশথশিি শরকজট েরা যাকি িা। অতএব এই িতুি দতরী
েরা ভযাশরকয়বেটা ৫% শিশিশফকেন্স রেকভকে শিগশিশফকেন্ট িা।

207 Ch14

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

207 Ch14

Uploaded by

Copyright:

Available Formats

F-207: CHAPTER 14

If you find any mistake, rebuke KMAT

If you are confused, contact SSD

If you are benefited, pray for MAL

Types of data used for analysis

1. Time series data

– Ex: আশিকের দৈশিে শিগাকরট খাওয়ার ররের্ড।

– Data on one or more variables of multiple entities at one point in time

– Ex: তাশিম আর আশমিু কের এই রিশমস্টাকরর ররজাল্ট।

– Combination of time series and cross section data

– E.g. আশমিু ে আর তাশিকমর িব রিশমস্টাকরর ররজাল্ট।

What is regression analysis?

Yi = β0 + β1x1i + β2x2i + β3x3i + ……. Βnxni + ei

𝑦̂i = 𝛽̂ o + 𝛽̂ 1x1i + 𝛽̂ 2x2i + 𝛽̂ 3x3i + ……. 𝛽̂ nxni

⚫ One independent variable- Simple linear regression

⚫ More than one independent variables- Multiple linear regression

✓ time series modelling and

✓ finding the causal effect relationship between the variables

⚫ The relationship between the variables is linear.

⚫ Both variables must be at least interval scale.

⚫ The least squares criterion is used to determine the equation.

The residual/error/disturbance term can capture a number of features:

– Model misspecification (i.e., We always omit some determinants of 𝑦 )

– There may be errors in the measurement of 𝑦 that cannot be modelled

– Random outside influences on 𝑦 which we cannot model

Assumptions of OLS (or linear regression model)

Assumption 2: Zero Mean of Errors

Assumption 4: Independent Errors

- The errors are linearly independent of each other

Assumption 5: No relation between Error and X

Assumption 6: Normally distributed Errors

Assumption 7: No Multicollinearity (applicable for multiple linear regression)

- No (perfect/high) multicollinearity between two or more independent variables

Why are these OLS assumptions important

Specifically, a violation would result in-

Properties of OLS estimator (OLS is BLUE!)

OLS Property# 2: Unbiased

OLS Property# 3: Best (i.e., Minimum Variance)

Efficient OLS estimator

Question 1: Make a Population Multiple Regression Model.

Yi = β0 + β1x1i + β2x2i + β3x3i + ei

Heating Costi = β0 + β1 Temperature(DF)i + β2 Insulation(Inch)i + β3 Age(Year)i + ei

Question 2: Make a Sample Multiple Regression Line.

𝑦̂i = 𝛽̂ o + 𝛽̂ 1x1i + 𝛽̂ 2x2i + 𝛽̂ 3x3i

̂ 𝐶𝑜𝑠𝑡i = 𝛽̂ o + 𝛽̂ 1 Temperature(DF) i + 𝛽̂ 2 Insulation(Inch)i + 𝛽̂ 3Age(Year)i

̂ 𝐶𝑜𝑠𝑡i = 427.19 - 4.58*35 – 14.83*3 + 6.10*6 = 259

Error/residual, 𝑒̂ i = yi – 𝑦̂I = 250 – 259 = -9

Question 5: Statistical Significance.

⚫ The test statistic is the t distribution with n-(k+1) degrees of freedom.

⚫ The hypothesis test is as follows:

এই েশ্ন আিকে ধাি ধাি েকর বেডিার ৈরোর িাই।

েকশ্ন P ভযােু থােকে অইটা শৈকয়ও স্টাশটশস্টোে শিশিশফকেন্স রৈখাকিা যায়।

The hypotheses are:

Reject H0 if F > F,k,n-k-1

িরীক্ষায় একিাভা রটশবকে Fএর ভযােু রৈওা থােকব, িা থােকে এই

⚫ The computed value of F is 21.90, which is in the rejection region.

to explain the variation in the dependent variable (heating cost). Adjusted

Question 7: Explain the stepwise regression (shown in the given picture).Coefficients

Standar Error = USD 51.05,

Question 9: Interpret Multiple R/ Correlation coefficient.

* Multiple R can be any number between -1 and +1.

- +1/-1 হকে Perfect Positive/Negative Relationship

- +0.50/-0.50 এর েম হকে Weak Positive/Negative Relationship

̂ 𝐶𝑜𝑠𝑡i = 427.19 - 4.5835 – 14.833 + 6.10*6 = 259