You are on page 1of 6

QM (UM20MB502) - Unit 4: Hypothesis Testing and Linear Regression – Notes

(UM20MB502) - Unit 4
Hypothesis Testing and Linear Regression – Notes
1. Define regression analysis and list their characteristics.
In the word of Ya Lun Chou, “Regression analysis attempts to establish the
nature of the relationship between variable that is to study the functional
relationship between the variable and thereby provide mechanism for prediction
or forecasting.

In the words of M. M. Blair “regression analysis is a mathematical measure of


the average relationship between two or more variables in terms of the original
units of th data”

Regression analysis, in the general sense, means the estimation or prediction


of the unknown value of one variable from the know value of another variable

The essential characteristics of regression analysis are:

1. It consists of mathematical devices that are used to measure the average


relationship between two or more related variables.
2. It is used for estimating the unknown values of some dependant variables
with reference to the known values of its related independent variables.
3. It provides a mechanism for prediction or forecast of the values of one
variable in terms of the values of the other variable.
4. It consists of two lines of equation viz., (i) equation of X on Y and
(ii) equation of Y on X.
2. What are the utilities of Regression Analysis?
1) It provides a functional relationship between two or more related variables with
the help of which one can easily estimate or predict the unknown values of one
variable from the known values of another variable.
2) It provides a measure of errors of estimates made through the regression lines. A
little scatter of the observed (actual values) around the relevant regression line
indicates good estimate of the values f a variable and less degree of errors
involved therein. On the other hand, a great deal of scatter of the observed
values, around the relevant regression line indicates inaccurate estimates of the
values of a variable and high degree of errors involved therein.
3) It provides measure of coefficient of correlation between two variables which
can be calculated by taking the square root of the product of the two regression
coefficients
4) It provides measure of coefficient of determination which speaks of the effect of
the independent variable (explanatory or regressing variable) on the dependent
variable (explained or regression variable) which in turn gives us an idea about
the predictive values of the regression analysis. The coefficient of determination
is computed by taking the product if the two regression coefficients Greater the
value of coefficient of determination, the better is the ft and more useful are the
regression equations as estimating devices.
QM (UM20MB502) - Unit 4: Hypothesis Testing and Linear Regression – Notes

5) It provides a formidable tool of statistical analysis in the field of business and


commerce where people are interested in predicting future events viz,
consumption, product, investment, prices, sales, profits etc., and success of
businessman depends very much on the degree of accuracy in the various
estimates.
6) It provides a valuable tool for measuring and estimating the cause and effect
relationship among the economic variable that constitute the essence of
economic theory and economic life. It is highly used in the estimate of the
demand curves, supply curves, production function; cost function, consumption
function etc.
7) The technique is highly used in our day-to-day life and sociologist studies as well
to estimate the various factors viz birth rate, death rate, tax rate, yield rate etc.
8) Regression analysis techniques give us an idea about the relative variation of a
series.
3. What are the properties of regression coefficient?

1. Correlation coefficient is the geometric mean between the regression


coefficients.
𝑟 2 = 𝑏𝑦𝑥 ⋅ 𝑏𝑥𝑦
2. If one of the regression coefficients is greater than unity, the other must be less
than unity.
3. Arithmetic mean of the regression coefficients is greater than the correlation
coefficient r, provided r > 0.
4. Regression coefficients are independent of the changes of origin but not of
scale.
5. The regression coefficient is the slope of the line of the regression equation.
6. Both regression coefficients should have same sign. This property rules out the
case of opposite sign of two regression coefficients.
7. The correlation coefficient will have the same sign as that of the two regression
coefficients. For ex., if byx= -0.664 and bxy= -0.234, then r= -0.394.

4. What are the assumptions of Student’s t test?


- The parent population from which the sample is drawn is normal.
- The sample observations are independent i.e., the given sample is random
- The population standard deviation is unknown.

5. What is statistical hypothesis?


It is some assumption or statement which may or may not be true about a
population or equivalently about the probability distribution characterising the
given population, which we want to test on the basis of evidence from a random
sample. If the hypothesis completely specifies the population, then it is known as
simple hypothesis, otherwise it is known as composite hypothesis.

6. What is null hypothesis?


QM (UM20MB502) - Unit 4: Hypothesis Testing and Linear Regression – Notes

It is a statement that proposes that no statistical significance exists in an asset of


given observations, i.e., no variation exists between variables.
It is the hypothesis tested for possible rejection under the assumption that it is
true. It is represented by 𝐻0 .
Example: If we want to test the effectiveness of a drug, we shall take a neutral
attitude and set up the hypothesis that it is not effective.
For testing if out of the two foodstuffs, one is better than the other, null
hypothesis would be there is no difference between them.
If we want to find out if population mean has specified value 𝜇0 , then null
hypothesis is 𝐻0 : 𝜇 = 𝜇0

Any hypothesis which is complementary to the null hypothesis is called an


alternate hypothesis. It is denoted by 𝐻1 .

7. What are the types of errors in testing of hypothesis?


The four possible mutually disjoint and exhaustive decisions in any test
procedure are:
- Reject 𝐻0 when it is true
- Accept 𝐻0 when it is not true
- Reject 𝐻0 when it is not true
- Accept 𝐻0 when it is true
The first two cases are wrong decisions. Thus, we can commit two types of
errors. The error of rejecting 𝐻0 when 𝐻0 is true is known as Type I error and
the error of accepting 𝐻0 when 𝐻0 is false is known as Type II error.

They can be expressed in the following table.

P[Reject 𝐻0 when it is true] = P[Type I error] = α


P[Accept 𝐻0 when it is false] = P[Type II error] = β
α and β are also called sizes of type I and type II error respectively.
α: Producer’s risk
QM (UM20MB502) - Unit 4: Hypothesis Testing and Linear Regression – Notes

β: Consumer’s risk

8. Write a note on i) level of significance ii) critical region iii) P value


i) The probability with which we will reject a null hypothesis when it is true
is the level of significance. It is denoted α
P[Rejecting 𝐻0 when 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒] = α
It is fixed before collecting the sample information.
Commonly used significance levels are 5% (0.05) and 1% (0.01). A significance
level of 0.05 indicates a 5% risk of concluding that a difference exists when there
is no actual difference. In 5 samples out of 100, we are likely to reject a correct
null hypothesis.
The probability with which we will accept a null hypothesis when it is true is the
confidence level. It is denoted by 1- α. In standard terms, we use 99% and 95%
confidence level.

ii) A critical region, also known as the rejection region, is a set of values for
observed test statistic for which the null hypothesis is rejected i.e., if the
observed test statistic is in the critical region then we reject the null
hypothesis and accept the alternative hypothesis.

The value of the test statistic that separates the critical and the acceptance
region is called critical value or significant value. It depends on

- The level of significance


- The alternate hypothesis, whether it is two tailed or single tailed.
QM (UM20MB502) - Unit 4: Hypothesis Testing and Linear Regression – Notes

iii) The p-value is the probability of obtaining results at least as extreme as


the observed results of a statistical hypothesis test, assuming that the null
hypothesis is correct. A smaller p-value means that there is stronger
evidence in favour of the alternative hypothesis.
It can take any value between 0 to 1.
P value = 0.1 means, 10 out of 100, null hypothesis will be true.
p value > level of significance, null hypothesis is accepted
p value < level of significance, null hypothesis is rejected

9. Elaborate on the procedure for testing of hypothesis?


- Set up the null hypothesis
- Set up the alternate hypothesis (this decides whether it is one tailed or two
tailed)
- Level of significance: choose appropriate α value depending on the
permissible risk before drawing the sample.
- Identify the sample statistic to be used and its sampling distribution.
QM (UM20MB502) - Unit 4: Hypothesis Testing and Linear Regression – Notes

- Define and compute test statistic under 𝐻0 .


- Obtain the critical values and the critical regions of the test statistic from the
appropriate tables.
- If the compared value of the test statistic lies outside the rejection region, we
fail to reject 𝐻0 at level of significance α, else accept it.
- Write the conclusion of the test in simple language.

Note: 1. The mean values can be obtained as the point of intersection of the two
regression lines.

2. In case of perfect correlation, positive or negative, the two lines of regression


coincide.

3. If the variables are uncorrelated, the two lines of regression become


perpendicular to each other.

4. Independent variables are also called as regressor or predictor or explanator.


Dependent variables are also called as regressed or explained variables.

5. Acceptance of a statistical hypothesis is due to insufficient evidence by the sample


to reject it and does not necessarily imply that it is true.

You might also like