You are on page 1of 21

12

CHAPTER

Hypothesis Testing in
Regression Analysis

LEARNING OBJECTIVES
After reading this chapter, the readers will be able to
.The understand
concept of
hypothesis testing
erforming hypothesis testing to test the
coefficients using: significance of the regression
o The t-statistics
approach
The p-value approach
oConfidence interval approach
Interpreting the statistical significance of the regression model
coefficients with the help of regression results and the
obtained in Excel

n the last chapter, we have observed that the


population refers to the of all the items
under study, but at times, it may not be possible to collect the data for aggregate
the entire population. In
T that
case, we select a sample from the
population. The sample is the set of items taken from the
POpulation and is representative of it. The
findings rom this sample are generalized to the
population as a whole, and on its basis, we can inter about the
ne
population.
population parameters describe the characteristics of the entire population, while sample
Staristics describe the
characteristics of the sample. The regression coefficients estimated from the
gression model are based on the sample data and are the estimates for the population parameters
true
v u e s but
unknown). Therefore, we use hypothesis testing to test the significance of the
TECHNIQUES
FU
232 BASIC COMPUTATIONAL
sample belongs to the povulati.
whether the
regression cocfficients
or to test
of statistic: (a) 1-statistie
ation reterred to.
in Excel exhibits the following type ics, (b)
regression output
intervals. The following
sections attempt to undderstand the meani p-val
(c) confidence nese tes t
are interpreted aptly
so that thecy

UsING THE t-STATISTIcs


12.1. HYPOTHESIS TESTING
or statements about the population
hypothesis testing, we make assumptions
In
However, in reality, the assumptions made may these assu
not be true. Based on

,the
parameters
sumption
be tested using statistical tests wherein the validire
hypothesis is formed, which can based on the data is tested. The fir belie
claim or assumption made about the population approa
associated with the calculation of t-statistic,
which tests the individual significanh
the regression model. Hypothesis testing involves the followithe
parameters estimated from step
Step 1. Formulation of the Hypothesis
The first step involved is the formulation of the null hypothesis (H) and the alt.
ernative
hypothesis (denoted by H, or H). The null hypothesis is assumed to be true, and the asSumne
is tested for rejection. With the rejection of the nul hypothesis, we accept the alternaie mption
ative
hypothesis. The alternative hypothesis, on the other hand, is the opposite of the null hyDothac
which states that the population parameter is smaller or greater than the null value (one-taild
othesis
it is non-directional (two-tailed). Note that it is the null hypothesis only that is tested
and
not the alternative hypothesis.

H: 4 =
0
H :4> or < 0 (one-tailed test)
H:4+ 0 (two-tailed test)

The null hypothesis always has strict equality. The following examples will helpP get a clear picture
of the two tests. In the one-tailed test, for example, an Indian website claims that the average salar
of finance managers is greater than 10 lakh per annum. The hypotheses can be stated as:

H :4 s 10
(no change, i.e., the average salary of finance managers is equal to or less than 10 lakh)

H:4> 10
(the average salary of finance managers is greater than 10 lakh)

On the other hand, in a rwo-tailed test, for example, if the objective is to examine the perceptiot
of the average typing speed of a student is 50 words per minute, the hypotheses for the study can

be stated as:
H : 4 = 50

(no difference, i.e., the average typing speed of a student is equal to 50 words perminu

H:4* 50
(average typing speed ofa student is not equal to S0 words per minute)
Hypothesis Testing in Regression
Analysis 233
TABLE 12.1

and Type 2 Errors


Type 1

True State of Null


Hypothesis Null
Statistical Decision
hypothesis is true Null
bypothesis is false
Reject the null hypothesis
Type 1 error
C incorrect decision) Power
(1-B; correct decision)
Do not reject the null hypothesis
Confidence level
lype 2 error
(1-0 correct decision)
(6; incorrect decision)

Step 2. Setting the Significance Level


Secondly, we set significance level which is the
a

uthen it is true. This is termed as the probability at which reject the null we
1pe by alpha (a). For instance,hypothesis
l error, denoted
indicates that we are willing to take 5 a a 0.05 =
per cent chance that be we
null hypothesis when it is may
true. 1pe 2 error, denoted by beta (5), is the wrong when reject the we

the null hypothesis when it is false. The lype l error and Type 2 error probability not rejecting
of
other, which means that as one of the
errors increases, the
other
are
inversely related to each
Power l tends to decrease.
or -

ß refersS to the
probability rejecting a null hypothesis if it is false or
of
probability of not making a Type 2 error. Similarly, the confidence level or 1 - a is the
of not rejecting the null the probability
hypothesis when it is true or
probability of
Table 12.l summarizes the type of errors and their
not
making Typea 1 error.

Table 12.1 illustrates the significance.


significance level
and the rejection
region in a wo-tailed test. The
rejection area is
both sides of the distribution since the
on
alternative hypothesis in this test is non-
directional. Here, if a 5 per cent, then the distribution will have two
=

cent (a /2) on each side of the distribution rejection areas of 2.5 per
(Figure 12.1). On the other hand, a one-tailed t
(Figure 12.2) implies that there will be a single rejection region on either side of the distribution,
that is, if oa 5 per cent, then the distribution has one
=

rejection region of 5 per cent on either side,


left or right, depending on the sign of the null value in the alternative
hypothesis. A significance
level 5 per cent is the most commonly used for making decisions in the field of finance,
of
economics or medicine. In Excel, the confidence level is, by default, set at 95 per cent, which
means a significance level of 5 per cent.

Step 3. Selecting the Suitable Test, Calculating the Test Statistic and Rejection Region
After considering several factors such as population distribution or sample size, we choose t
Based the statistical selected,
aPpropriate statistical test to carry out the hypothesis testing. on test

we can calcul. re the test statistic value. Some common hypothesis tests are z-test, t-test, chi-squared4
statistics arnd
statistics are 3-statistics, t-statistics, chi-square
Lest and ANOVA, and their test

P-statistics, respectively.
TECHNIQUES FUR DAI
COMPUTATIONAL
234 BASIC

IGURE 121

fora Two-tailed Test


Rejection Areas

Rejection
Do not reject Rejection
area
area
o/2 = 2.5%
a/2 2.5%
Confidence level
95%

FIGURE 12.2
Rejection Areas for a One-tailed Test

Do not reject
H Rejection
area
O = 5%
Confidence level
95%

In the caseof regression analysis, we translate our


problem into -distribution or z-distribution,
based on
sample size. A t-distribution is similar to a normal distribution with a bel-shaped
but having relatively heavier tails. The t-test can be used to cun
compare the means of populationan
sample means and determine whether the means of the two groups are significantly different ro
each other. The t-test is used for estimating the statistics when the
sample size is small (7 <
and the population standard deviation is unknown.
Hypothesis Testing in
formula
for calculatin
t-statistics is given Regression Analysis
The
Difference between the
by: 235
sample and the
Standard error population meascan
=(- u)
Here,
F Sample mean
l(Gln)
mean
= Population
u
standard deviatio
s= Sample
= Sample size
n

be noted that many


[tIt needs
necds to samples could be
taken from the
have a
mean,
possible think of distribution
andi is to a
of the population. Each sample would
If the
sample mean. If
mean.
sample size is sample mean
ofthe ld have smaller variation. small, then the and standard
samples wo
A measure
ot how
variation would be high, and deviation
be from
the 1oDulation mean is provided by the
populatic ditterent the larger
of the sample mean. standard error of the sample mean is expected to
deviation mean, which is the
rect can
The -test also be used to conduct the
Can alsoi standard
ifthe slope coeffici is hypothesis testing on the
determine

-statistic can be calculated as: significantly different from zero. In theregression coefficients to
regression context, the

Here,
b, Estimated regression coefficient
6. Slope coefticient against which we conduct the
=

S= Standard error of B, hypothesis test or the null value

SS
The formula for standard error is: n-2
2x-
The following table summarizes the t-statistic and the rejection region.
Null Hypothesis: Hgi 4 = #o

Test statistic value: t =


J
s/n

Alternative Hypothesis Rejection Region at Levela


t2a-(upper-tailed)
H:> o tS-a-, (lower-tailed)
Hi4<o either f2 an- or
t S au-1
-
(wo-tailed)

H:p* %
for the rejection region
is the calculated for given
critical value (,-) We have already
he
cut-off value or the significance level using the t-distribution table. the idea of the degree
egrees of freedom and
the us n o w go through
significance level in Srep 2, let
T S t o o d the concept
of
BASIC COMPUTATIONAL TECHNIQUES FOR DATA ANALYSIS
236
of freedom. The degree of freedom (df) represents how many values in a data ser a.
It depends upon the number of parameters to be estimated as we lose one deor free to choose.
every parameter
estimated. The degree of freedom is calculated by subtracs freedom
numbe.
fon
estimated (k) trom the sample size (n). Generally, for ordinan
parameters to be ttest,
In the case of regression analys the of degree the degree
offreedom is taken as n - 1. freedom is
n-k- 1. For a simple linear regression (with one independent variable), the e taken as
The degree of freedom is given in the regression summary output obtained usin. -2
(df). As shown in Figure ing the data
analysis option in Excel as a residual degree of freedom 12.3
residual degree of freedom (df) for n =28 and one X variable is equal to 28 - 1 - 1 = 20 elow, the
26.
fall in the Rejection Region
Step 4. Check whether the
t-statistics

Ifthe -statistics calculated in Step 3 above fall in either of the rejection areas, we reie the null
hypothesis, and if the t-statistics falls outside the rejection area, we do not reject the nuli h
thesis.
For example, the hypothesis stated as:

0 (no significant relationship between the dependent and independent variaki.l


H:B, =

H:, #0 (a significant relationship between the dependent and independent variable


The cut-off value for say a = 0.05, and df = 26 comes out to be t 2.056 as obtained
from
is than +2.056 or les
Table 12.2.' If the t-statistics calculated using the formula:
t = more
than .

-2.056, that is, the value of the t-statistics falls yn


to the left of negative cut-off value or the right
of the positive cut-off value, we reject the null
hypothesis and conclude that there is a signifi- FIGURE12.3
cant relationship between the dependent and
independent variable. On the other hand,if the Regression Analysis Output: Residual
t-statistic does not fall in either of the rejection Degree of Freedom

areas, do not reject the null hypothesis and


we
ANOVA
conclude that there is a significant relationship
between the dependent and independent vari- df
ables. It is important to note here that if a Regression
critical value is significant at 1 per cent, then it Residual 26

must be significant at 5 per cent also. However, Total 27

the converse is not true.

12.2. HYPOTHESIS TESTING USING THE p-VALUE


The concept of p-value plays a vital role in interpreting the regression results and assesing tnc
betwen
signiticance of the estimated regression coefficients. The p-value is the probability that lies
and I and is based on the assumption that the null hypothesis is true. The p-values for every coetficien
bth
The t-distribution table has degrees of freedom in the rows and the significance level in the columns tor e

the one-tail and rwo-tail tests. The intersection two-tail test, if the degree
cell, in a of freedom is 26, and thesigniu
cance level is at 5 per cent, the critical value read from the Table 12.2 is 2.056.
Hypothesis Testing in
Regression Analysis
237
LE 12.2
The t-distribution Table

0.50 0.25 0.20 0.15


one-tail

0.10 0.05 0.025


fvo-talls 1.00 0.50 0.40 0.30 0.20 0.01 0.005 0.001 0.0005
0.000 1.000 1.376 1.963 3.078
0.100.05 0.02 0.01 0.002
0.000
0.000
0.816
0.765
1.061 1.386 1.886
6.314 12.71 31.8
0.001
0.978 1.250 2.920 4.303 63.66 318.31
0.000 0.741 1.638 2.353 6.965 636.62
0.941 1.190 3.182 9.925 22.327 31.599
0.000 0.727 1.533 2.132 4.541 5.841
0.920 1.156 1.476 2.778 3.747 10.215 12.924
0.000 0.718 0.906 1.134 2.015 2.571 4.804 7.173
0.000 0.711 0.896 1440 1.943 3.365 4.032 8.810
1.119 1415 2.447 3.143 5.893 6.869
0.000 0.706 0.889 1.895 3.707
1.108 1.397 2.385 2.998 5.208 5.959
0.000 0.703 0.883 1.100 1.860 2.306 3.499 4.785
10 0.000 0.7000.8791.093 1.383 1.833 2.896 3.355 5.408
0.000 0.697 1.372 1.812 2.262 2.821 3.250
4.501 5.041
2 0.000 0.695
0.876
0.873
1.088 1.363 2.228
1.796 2.764 3.169 4297
4.144 4.781
1.083 1.356 2.201 2.718 4.587
0.000 0.694 0.870 1.782 3.106 4.025
1.079 1.350 2.179 2.681 4.437
4 0.000 0.692 0.868 1.771 2.160 3.055 3.930
1.076 1.345 2.650 3.012 4.318
5 0.000 0.691 1.761 3.852
866 1.074 1.341 2.145 2.624 2.977
4.221
0.000 0.690 0.865 1.753 3.787
1.071 2.131 2.602 4.140
0.000 0.689 0.863 1.337 1.746 2.947 3.733
1.069 1.333 2.120 2.583 4.073
0.000 0.688 1.740 2.921 3.886
0.862 1.067 2.110 2.567 4.015
0.000 0.688 1.330 1.734 2.898 3.648 3.965
0.861 1.066 1.328 2.101 2.552 2.878
0.000 0.687 0.860 1.064 1.325
1.729 2.093 2.539 2.861
3.810
3.579 3.922
21 0.000 0.686 0.859 1.063 1.725 2.086 2.528 2.845 3.883
22 0.000 1.323 3.552
0.686 0.858 1.061 1.321
1.721 2.080 2.518 2.831
3.850
23 0.000 0.685 1.717 2.074 3.527 3.819
0.858 1.060 2.508 2.819
24 0.000 0.685 1.319 1.714 2.069 3.505 3.792
0.857 1.059 2.500 2.807
0.000 0.684 0.856 1.058 1.318 1.711 2.064 2.492 2.797
3.485
3.467
3.768
0.000 0.884 0.856 1.058 1.316 1.708 2.060
1.315 1.7082.056 2.479
2.485 2.787 3.450
3.745
3.725
2.779 3.435 3.707

is automatically produced by Excel in the regression summary


option. This output obtained using the data
that Excel conducts a
means
hypothesis testing for every coefficient and producesanalysis
results in the form of p-values. Tlo the
is used. The first three
interpret the p-value, it is
important to find out which test
statistic
steps for conducting the hypothesis testing,
namely
hypothesis, setting the significance level and selecting the suitable test remain
the formulation of
the same.
Suppose the given test statistic is t-statistic. The p-values can be interpreted as follows. In a two-
tailed hypothesis test, the
the p-value is
rejection region on both sides is calculated using a/2. In Figure 12.4, it
greater than the significance level, it does not fall in the rejection region. In that case,
we do not
reject the null hypothesis when p>a. Similarly, in Figure 12.5, the t-statistics falls in
the
rejection region as we observe that the p-value is less than the signiticance level. Thus, we reject
the null
hypothesis when p < a.
ror example, for a two-tail test say a. = 0.05, residual df = 26 and -statistics = 24.6545, the

P-value is equal to 0.0001. Since the p-value is less than a, that is, 0.0001 « 0.05, we reject the
On the other hand, say for another two-tail test, the a =
0.05,
4 ypothesis (Figure 12.6).
residual df= 26, and -statistics is 0.85 then p-value is cqual to 0.40. Since thep-value is greater
nan a, that is, 0.40 > 0.05, we do not reject the null bypothesis.
BASIc COMPUTATIONAL TECHNIQUES FOR DATA ANALYSISs
238

FIGURE 12.4

The t-distribution p-value > Alpha (a)

p-value/2

Rejection area (o/2) Rejection area (a/2)

- cutoff value 0 +Cutoff value


+t-statistics

FIGURE 12.5

The t-distribution p-value < Alpha ()

p-value/2

Rejection area (/2) Rejection area (a/2)

- cutoff value 0 + Cutoff value


+t-statistics

FIGURE 12.6

The t-distribution Computed p-value < Alpha ()

Coefficients Standord Error tStat P-Value


5.68922E-09
-1648.103786 194.1050806 -8.490781286
Intercept
XVariable 1 306.9271804 12.44910252 24.65456284 1.47084E-19
y in
Hegression Analysis
230
TESTING
1 2 . 3 .H Y P POTHESIS USING THE CONFIDENCE
ach used for hypothesis testing is the
INTERVAL
confidence interval
he d obtained
casiest for
s i m p l ea n d .
conduct
to
he estimated regression hypothesis testing, It involves approach. It is the most
intervals
0.95 eans that if
coeticients in interpreting the confidence
the regression
contidence
of
level
he
we keep on
repeating the event output in Excel, The
oercentof
t h e events,
dto
true
population
be
again and again, then in
parameter will lie in the range
Thefollowingbein the same followed for conducting
need to of 95
steps
confidence
StepI above.as hypothesis testing using interval
i n t e r v a

confidence
Consider the 95 Per Cent Confidence
ofthe Variable by the Regression Output in Interval Excel as Produced for the
we checkcheck whether or not the null value
we
Slope Coefficient
this step,
In this
stenit talls in the 95
The lower and the
limit and the upper limit for the 95 per cent confidence
If the null per
value indeed talls within cent
confidence interval.
gression output.
rge interval are
it. or we may say that we do not
mullhypothesis, or
the contidence intervals, we specified in the
do not
limits of the contidence
specifie
reject the null hypothesis for reject the
rwo
interval. On the other any value between
othesis does not fall in the confidence
intervals, hand, if the value in the the
We can conclude that: we
reject the null null
hypothesis.
. If the value
specified in H, lies outside
.If the value specified in H, lies within the interval, we can reject H
the interval, do we not
reject H
Continuing on the
previous example, we need to test the
and interpret the results obtained based significance of the regression model
on contidence
regression output displays the upper and the lower limit intervals. The highlighted part in the
CEigure 12.7). We notice that the reference value tor the 95
per cent confidence interval
the specified range; it talls outside the in the null
95 per cent confidencehypothesis, that is, 0 does not fall in
hypothesis, which further implies that there is a interval. Hence, we
reject the null
and independent variables. significant relationship between the
dependent
12.3.1. Illustration
The regression output realized in the example of
expenses boost their revenue) is Chapter 11 (a company incurring
to
reproduced below in Figure 12.8. promotion

FIGURE 12.7

Regression Analysis Output (95% Confidence Intervals)


Coefficients Standord Error t stat
Intercept 1648.103786 194.1050806 -8.49078129 P-volue
Lower 95% Upper 95% Lower95.0%Upper 95.0%
5.69E-09
XVariable 1 306.9271804 12.44910252 24.65456284 147084E-19 2047.092494
281.3376836
1249.115079 -2047.092494 -1249.115079
332.5166771 281.3376836 332.5166771
240 BASIC COMPUTATIONAL TECHNIQUES FOR DATA ANALYSIs

Test Results of the Regression Equation Displayed in the Excel Sheet

SUMMARY OUTPUT
Reyression Stotistics
Mutiple R 0.809705674
R Square
Adjusted R Square
0.655623278
0.63996979
Standard Eror 317483346
Observations
ANOVA
Regression
Slonificonce
4221678.483 4221678483 41.883528071.6435E-06
Residual 2217504.85 100795.675
Total 6439183.333
Coefficients Stondard Eor Stat P-value Lower95% Upper 5 |Lower9S0
Intercept 3183.35799 450.0836347 7.072814364.282E-07 2249.941666 4116.7I14322 2249.941666 Upper
41695.03
Promotion Expenses(X) 5.101726264 0.788307256 6.471748456 1.6435E-06 3.466877077 6.73657545| 3A66877077 79
57545

We have already developed and estimated the regression model for the promotion expenses a
the variable and the
independent
model is:
revenue as the dependent variable. The estimated
ssion

Revenue =
3,183.3579 +5.101726 x Promotion expenses
Now, the sales manager wishesto test the
significance of the regression model developed and assess
the importance of the promotion expenses in explaining the variation in the revenues.

Solution and Interpretation


First Method (the t-value Approach)
The null hypothesis assumes that the beta coefficient is
equal to zero (null value), which means
that there is no significant relationship between the
promotion expenses and revenues. The
hypotheses can be stated as:
H:, = 0

H:B, #0
The significance level (a) is set at 5 per cent.
Excel provides the t-statistic (t-value) as 6.4717.
The critical t-value is calculated using the table. The rejection region for this two-tailed test Is
either t 2 tar- or ts--1 The table value of
taa25.23 is 2.074. Since 6.4717 > 2.074, we reject
the null hypothesis. This implies that there is a significant relationship between the revenue earned
by the company (dependent variable) and promotion expense incurred (independent variable).
Second Method (the p-value Approach)
The p-value that Excel provides us is 1.6435E-06 which denotes a very small number equal to
0.000001643. We observe that this p-value is less than the significance level of0.05, thatis,p <

therefore, we reject the nul hyposhesis and conclude that there is a significant relationship berween
JUMg in
arned by the company
(dependent Regression Analysis
ces
che revenu
(independent variable). variable) and 241
(Confidence Interval
promotion expense incurred
Method Approach)
Third
limit andthe er limit for the 95
lower
per
cent
The
utput fall
at are 3.466877 and
6.75657545, confiden
ence
interval are
respectively
regression
within the
does n does not
confidenceint (Figure
intervals, we reject the 12.8). Sincespecified in the
Pes hat
implies th ere is a
signiticant relationship null our null
and promotion expense
r variabie)
endent variable) between
etween the
revenues earned
value
bypothesis, which further
hat
that the
I ti s i m p o r t a n t
incurred
to note
outcomes in all these
outcomes (independent variable). by the
che nul" hypothesis.
the null
approaches remain the same, company
reject
sum
To sum
To up, this. hanter introduced a which isis to
making inferences boutvery
rucial role whether
a crucia
in important concept of
the true
to

ecked, there is population hypothesis testing, which


estimates.
significant linear relationship The estimated
independent variables, using different
a
plays
approaches. between coefficients the
dependent and

PRACTICE QUESTIONS
The area of the house (in sq.
eionificance of the ft) and the
electricity
level = 5%). regression coetticients estimated bill (in ) is
given below.
using the t-statistics Test the
approach. (Significance
Area of the House (in sq. fi)
Electricity Bill
1,716 (
1,886 10,800
2,056 18,000
2,057 16,000
2,095 20,000
2,403 21,000
3,291 23,000
3,333 24,500
3,747 27,000
26,000
3,838
28,500
4,726
29,500
4,956
31,000
5,561
31,600
7,550
32,000
7,815
31,900
7,863
32,500
7,899
36,000
8,128
34,000
242 BASIC CoOMPUTATIONAL TECHNIQUES FOR DATA ANALYSIS

Area of the Houe (in sq. f) Electricity Bill

8,215 39,000
8,500
38,311
Ans. H:B, = 0
H:B, #0
t-statistics = 9.403576788, residual df = 18 and critical value = 2.101
Since the value of the t-statistics falls to the right of the positive cut-off
value, we reiect th.
hypothesis and conclude that there is a significant relationship between the area of the e null
and electricity bill. house
2. The average annual income of 25 people and the taxes paid by them are
given in the
below. Develop the regression model and test the significance of the table
regression mol
developed using the p-values approach. (Significance level = 5%). model
Individual Annual Income Taxes Paid
(
700,000 70,000
750,000 90,000
770,000 123,200
850,000 850000
854,000 128,100
870,000 69,600
1,030,000 206,000
1,030,000 206,000
1,080,000 216,000
1,100,000 220,000
1,120,000 240,000
1,130,000 226,000
1,160,000 230,000
1.240,000 248,000
1,300,000 260,000
1,360,000 272,000
1,460,000 280,000
1,500,000 300,000
1,650,000 495,000
1,780,000 534,000
1,800,000 548,000
1,825,000 550,000
1,970,000 591,000
1,980,000 594,000
2,042,000 612,600
Hypothesis Testing in
0
Regression Analysis
Ans.
H:P,
=
243
H,:B0
H:P 22.95098679. p-value = 0.0000
-statistics = 22.95

stats is less than


o 0.00and
of the signiticance
nce level of 0.05, that
p-value
bypothesis nd conclude
conclude tha
that there is
o

is. p
the.mull significant
a
. Therefore,
<
the
lependent
variable, and average annual
ncome relationshin
hip between the
earned by the
we
reject
taxes
variable). individual (ind paid
.
av
Abhinav uploads videos You'Tube
uploa on on statistics and
economics concepts.
20 of his videos given below.
are
Abhinav believes that The views and
on

impacton the. tmber of likes on the videos. Conduct


the
the number of views
has significant
a
likes
views (the Variable) are usetul to the variation significance test to determine if
the p-values approach. Is the
explain
belief of Abhinav
in the
number the
of likes (Y
ng
correct? variable)
Views Likes
(Significance level =
5%)
(Y)
212,000 106,000
139,000 55,600
298,000 89,400
216,000 9,000
301,000 15,000
315,000 157,500
350,000 140,000
370,000 111,000
460,000 25,000
480,000 10,000
475,000 237,500
500,000 200,000
520,000 156,000
562,000 112,400
630,000 20,000
600,000 300,000
860,000 344,000
880,000 264,000
950,000 50,000
930,000 70,000
Ans. H: B, =0
:B, #0
t-statistics =1.617637
p-value = 0.1231
= 5%
244 BASIC COMPUTATIONAL TECHNIQUES FOR DATA ANALYSIS

The p-value of 0.1231 is more than the significance level of 0.05, that is, p > a. Theret.
do not reject the null hypothesis and conclude that there is no significant relationshin o e e
the likes (dependent variable) and the views (independent variable) on a video. The
Abhinav that the number of views has a significant impact on the number
nber of
of likesof likes on
videos is incorrect. the
4. The literacy level (in 6) and the number of people below the poverty line (as % of tha .
population) for 20 states are given below. Develop the regression model and total
test
significance of the regression model developed using a 95 per cent confidence inte
approach. (Significance level = 5%). Is your answer the same if the test of significanceis
using a 99 per cent confidence interval approach? (Signiticance level = 1%). done
Literacy Level(%: X) People below the Povertry Line (%: Y
94
92
89 8

86
82 9

81 12

78 15
17
76 19
73 21
72 22

68 26

67 28
65 25
62 29
61 32
59 35
52 27
51 36
50 39

Ans. H : , = 0

H:B0
Regression output (95 per cent confidence interval)

Coefficients Standard Error t-stat p-value Lower95% Upper 956


82.7719555
Intercept 74.99175186 3.703232912 20.2503471 7.76988E-14 67.21154821

Literacy -0.751277881 0.050736835 -14.80734613 1.59998E-11 -0.857872016-0.644685/2


level
Hypothesis Testing in
Regression Analysis
limit and
the upper limit for the 95 per cent 245
utput are(-0.8578) and
6446) confidence
nce
interval are
ssion ou
the regression

does not
tall within
the confidence respectively.
intervals, we reject the our specified
ctively. Since
Since in
valueB,=0 0 do
implies
isis signiticant
that there
a
relationship between the null hypothesis, hypothesized
which
line er
further variable) and literacy
ent variabl
(dependent rate
pendent variable). people below the erty
99 per cent confidence interval)
output
Regression

Standard Error tstat


Coefticients p-value Lower 99.0%
74.99175186
3.703232912 20.2503471 Upper 99.0%
7.76988E-14 64.33221637 85.65128735
I n t e r c e p t

0.751277881 0.050736835 -14.80734613 1.59998E-110.897320841 -0.605234921


Literacy

Tevel

The lower
limit and the upper limit for the 99 per cent confidence interval are
Dut are (-0.8973) and
(-0.6052), respectively. specified in
0 does not fall within the confidence intervals, we Since our hypothesized
va reject the null bypothesis, which
mnlies that there is a significant relationship between the
variable) and literacy rate (independent people
P below
Dclow the
the poverty
poverty
line (dependent variable).
ainfall (in inches) and production of rice (in thousand million
5. tonnes) are given below.
Lnrernret the regression output as provide by Excel to test the
significance of the regression
roefficients. The agriculture oficer claims that when rainfall increases by 1 inch, the rice
nraduction increase by 1,400 thousand million tonnes. Check the validity of this claim?
(Significance level = 5%).

Rice Production
Rainfall
(in nches X (in Thousand Million Tonnes; Y

30 117,939
28 116,480
29 112,760
25 109,698

23 106,646
16 105,482
22 105,301
20 105,241
17 104,408

15 99,172
96,682
13
12 95,970
11 93,345

12 91,785
14 89,083

Ans. Summary output.


BASIC CoMPUTATIONAL TECHNIQUES FOR DATA ANALYSIs
246

Regression Statistics
0.868283369
R
0.858151321
Adjusted R
3,325.932545
Standard error

Observations
15
Lower 95%
Standard ErTo t-stat p-value Upper 9596
Coefficients 73,749.60429
79,592.30045 2,704.490341 29.4296856
2.77836E-13 85,434.99662
134.0346561 9.257252531
4.37177E-07 951.2283893 1,530.356929
1,240.792659
Rainfall (X variable) coefficient: B, = 1,240.7926 indicates that it the rainfall increases by one

of the independent variable)


the rice production increases by 1,240797
inch (a unit
thousand million tonnes (dependent variable):

H: B, =
0
H,:B, 0
9.2572 and cut-off value
= 2.160
t-statistics =

p-value = 0.1231
a = 5%

of the positive cut-off value, we reject the


a. Since the value of the t-statistics falls to the right
between the rice
null bypothesis and conclude that there is a significant relationship
and the rainfall (independent variable).
production (dependent variable)
level of 0.05, that is, p < a. Therefore, we
b. The p-value of 0.00 is less than the significance
that there is a significant relationship berween rice
reject the null bypothesis and conclude
production (dependent variable) and the rainfall (independent variable).
c. Since our null value p, = 0 does not fall within the confidence intervals, we reject the null

hypothesis which further implies that there is a significant relationship between the rice
the rainfall (independent variable).
production (dependent variable) and
The claim made by agriculture officer is that if rainfall increases by 1 inch, the rice production
increase by 1,400 thousand million tonnes.

The hypothesis can be stated as:

H: B, =
1400
H: B, # 1400

Using the 95 per cent confidence interval approach, we observe that the hypothesized value of
and
1,400 falls within the lower and upper limit of the interval, that is, 951.2283893
of
1,530.356929. Hence, we do not reject the null hypothesis. It is possible that the true value
the increase in production can be 1,400 thousand million tonnes. The claim of the officer may
be true.
LL
EE SE EE

EBE
Future Value Interest Factors for One Rupee Annuity Compounded a t r Percent f o r n periods FVIFA = [(1+r)" - 1]/r

Period 1% 3% 4% 5% 6% 7% 9 9% 10% 11% 12% 13% 14% 15%% 16% 20% 24% 25% 30%
1.0000 1.0200 1.03001.04001.0500 1.0600 1.070010800 1.0900 1.1000 1.1100 1.12001.1300 1.14001.15001.1600 1.2000 1.2400 1.2500 13000
2.0100 2.0200 2.0300 2.04002.05002.06002.0700 2.0800 2.09002.10002.1100 2.12002.13002.1400 2.15002.1600 2.20002.2400 2.2500 3000
3.0301 3.06043.0909 3.12163.1525 3.18363.2149 3,2464 3.2781 3.3100 3.3421 3.3744 3.40693.4396 3.4725| 35056 3.6400 3.7776 3.8125 39900
4.06044.1216 4.18364.24654.3101 4.37464.439945061 4.5731 4.6410 4.7097477934.84984.9211 4.99345.0665 5.36805.6842576566.1870
5.10105.20405.3091541635.52565.63715.75075B6665.98476.1051 6.2278 l635286.4803 6.6101 6.7424 6.87717.44168.0484 3.2070 90431
6.15206.30816.4684 6.6330 6.8019 6.9763 71533 7.3359 7.52337.71567.9129 8.1152.3227 8.5355 8.7537 8.9775 9.929910.98011.259 12.756
7.21357.43437.662578983 8.14208.39388.65408.92289.2004 9.48729.7833 10.089 10.40510.73011.06711414 12.916 14.61515.073 17.583
8.2857 8.5830 8.8923921429.54919.8975 10.260106371i.028 11.436 11.859 12.30012.757 13.233 13.727 1424016.49919.12319.84223.858
9.36859.754610.15910.583 11.027 11.49111.978 1248813.021 13.57914.164 14.776 15.416 16.085 16.78617519 20.79924.71225.802 32.015
0.462 10.95011464 1200612.57813.181 13.816|14487 15.193 15.93716.722 7.549 18.420 19.337 20.304 21324 25.969 31.64333.253 42619
11.56712.16912808 1340614.207 14.97215.784 16.645 17.560 18.53119.561 20655 21.814 23.04524.34925.733 32.150 40.23842.56656A05
12.683 13.412 14.19215.026 15.917 16.870 17.838 18.97720.141 21.384 22.71324.133 25.650 27.271 29.002 30850 39.581 50.89554.208 74327
13.80914.680 15.6181662717.71318.832 20.141 21495 22.953 24.52326.212 28.029 29.935 32.089 34.352 36.786 48.497 64.11068.760 97625
14.947 15.974 17.086 18.292 19.599 015 22.55024215 26.019 27.975 30.09532.39334.883 37.58140.50543572 196 80.49686.949127.913
16.09717293 18.59920024 21.579 23.276 25.12927.152 29.36131.772 34.4053728040.417 43.842 47.580 51660 72.035 100.815 109.687 167 288
16 17.258 18.639 20.157 21825 23.657
25.673 27.688 30.324 33.003 35.950 39.190 42.753 46.672
18.430 20.01221.76223.69825840 28.213
30.840 33.750 36.974 40.545 44.501
50.980 S5.717 60.325|87442126.011 138.109 218AT2
48884 53.739 59.11855.07571S73|105.931 157.253 173.636 285.014
19.615 21.41223.41425.645 28.132 30.906 33.999 3745041.301 45.599 50.396 55.75061.725 68.39475.83684.141 128.117 195.994 218.045 371518
20.81122.84125.11727S7130.53933.760 37.379 41446 46.018 51.159 56.939 63440 70.749 78.969 88.212 98.503 154.740 244.033 273.556483.973
22.01924.29726.870 29.778 33.066 36.78640.99545.76251.160 57.27564.203 72052 80.947 91.025 102.444 115.380 186.688 303.601 342.945|630.165
23.239 25.783 28.67631.96935.719 39.993 44.865
50423 56.76564.002 72.265 8159 92.470 104.768118.810 134841 225.026 3/7465 | 429.581 820215
24.172| 27.299 30.537 34248 38.50543.392 49.006 55457 62.873 71.403 81.214 92.503 105491 120.436 137.632 157415 271.031 469.056 538.101
25.716 26.845 32.453 36.61841.430 46.99653.436 60.893 69.532 79.543 91.148 104603 120.205 138.297 159.276 183.601 326.237 562.630673.626
26.9733042234.42639.06344.502 50.81658.177 66.765 76.790 88.497 102.174 118.155 136.831 158.659 184.168|213.978 392.484|723.461843.033
2628.243 32.03036.459 4164647.727 54.365 63.249 73.106 84.70198.347 114.413 133.334 155.620 181.871 212.793 249214 471.981 898.092
34.78540.56847.57556.085 66.439 79.058 94.461 113.283 136.308 164.494 199.021|241333 293.199 356.787 434.745 530312
41.66049.99460462 73.65290.320 111.435 138.237172317 215.711 271.024 341.590 431 663 546.681 693.573 881.17o
43.077 51.99463.276 7759895.836119.121|148.913 187.102 236.125 299.127 380.164 484463 618.749 791.673
48.88660.40275.401 95.026 120.8o 154.762 199.635 259.057 337.882 442.593 581.826 767091
64.46334.579 112.797 152667 209.348 290.336 406.529 573.770 815.084
EEE EEEE
5
DELE EEEE E EE
E
TABLE OF THE
STUDENT's
t-DISTRIBUTIOON
tTable
50 t,75 t85 t 90
95
cum. prob t.975
0.50 0.25 0.20 0.15 0.10 t 985
t999
one-tall 0.05 0.025 0.01
0. 0 0.40 0.30 0.005 0.001
two-tails
1.00 w.ee
0.20 0.10 0.05 0.02 0.01D.0 0.002
0.0005
0.001
d 0.000 .000 1.376 1.963 3.078 6.314 12.71 1.82
0.000 0.816 1.061 1..386 1.886 2.920
63.6 318.31 636.62
0.765 0.978 4.303 6.965 9.925
0.000 1.250 1.638 2.353 3.1
22.327 31.599
0.741 0.941 1.190 4.541 5.841 10.215
0.000 1.533 2.132 2.776 3.747
12.924
0.000 .727 0.920 1.156 1.476 4.604 7.173 8.610
2.015 2.571 3.365
4.032 5.893
0.000 1.134 1.440 1.943 2.447 3.143 3.707
6.869
0.000 0.741 0.896 1119 1415 895 3.208 5.959
0.000 0.706 0.889 1.108 1.397 1.860 2.365 2.998 3.499 4.785 5.408
308 2.896 3.355 4.501
O.000 0.703 0.883 1.100 1.383 1.833 5.041
0.000 0.700 0.879 0931.372 1.812 2.228 2.821 3.250 4.297 4.781
0.000 0.697 0.876 1.088 1.363
2.764 3.169 4 144 4.587
11 1.796 2.201 2.718 3.106 4.025
0.000 0.695 0.873 1.083 1.356
4.437
12 1.782 2.179 2.681 3.055 3.930 4.318
0.000 0.694 0.870 1.079 1.350
13 1.771 2.160 2.650 3.012 3.852 4.221
14 0.000 0.692 .868 1.076 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 0.000 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.733 4.073
o000 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.686 4.015
0.000 0.689 0.863 1.069 1333 1.740 2.110 2.567 2.898 3.646 S965
0.000 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.610 3.922
0.000 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.579 3.883
O.000 0.687 0.860 .064 1.325 1.725 2.086 2.528 2.845 3.552 3.850
21 0.000 0.686 0.859 1.063 323 1.721 2.080 2.518 .831 3.527 3.819
22 0.000 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 0.000 0.685 858 1.060 1.319 1.714 2.069 2.500 2.807 3.485 3.768
24 0.000 0.685 0.857 1.059 1.313 1.711 2.064 2.492 2.797 3.467 3.745
25 0.000 0.684 0.856 1.058 1.316 1.708 2..060 2.485 2.787 3.450
3.7
o.000 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.435
3.707
0.000 0.684 0.855 1.057 1.314 1.703 052 2.473 2.774 3.421
690
0.000 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674
0.000 0.683 0.854 1.055 1.699 2.045 2462 2.756 3.659
0.000 0.683 0.854 1.055 10 1.697 2.042 2.457 2.750
2.704
3.38
.307
3.646
3.551
0.000 0.681 0.851 i.050 1.303 1.684 2.021 2.423
40 3.232 3.460
0.000 0.679 0.848 i.045 1.296 1.671 2.000 2.390 2.660
60 3.195 3.416
0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639
80 0.000
1.984 2.364 2.626 3.174 3.390
0.845 1.042 1.290 1.660
100 0.000 0.677
1.646 1.962 2.330 2.581 3.098 3.300
1000 0.000 0.675 0.842 1.037 1.282

Z 0.000 0.674 0.842 1.036 3.090


w www

50% 70% 80%ww*wwww*90% 98%


95% wwwwewwoo 99% 99.8% 99.9%
0% 60% ***
***

Confidence Level
eeeewonossveeueaoe**

You might also like