You are on page 1of 5

How to interpret a minitab output of a regression analysis:

Step I:
Model: From the description of the problem, it says that this a time series data where the weight of soap
depends on the number of days it had been used. Thus dependent variable(y) is weight of the soap and
independent variable is the number of days (x).
We wish to fit a liner model Y = + x

Step II:
The following scatter diagram shows that
1. there is a inverse relationship between x and y, that is as the number of days increase, weight of the
soap decreases.
2. We see a distinct liner trend among the data points supporting our model in step I.
Scatterplot of Weight vs Day
140
120

Weight

100
80
60
40
20
0
0

10

15

20

25

Day

3. Pearson correlation of Day and Weight = -0.998


P-Value = 0.000. This tells us that the sample estimates of Pearson correlation of Day and
Weight is -0.998 based on 14 observations. When test for significance, a low p-value rejects
the null that rho=0 and we conclude that the sample estimates just did not come from the
noise. There is a meaningful linear relationship between the two variables.

Step III & IV:


Estimates and evaluation:
We estimate the model using least square method. The computation from the minitab is as follows:
The regression equation is
Weight = 123 - 5.57 Day

Interpretation: the line intersects y axis at 123 with a slope of -5.57. that is on the day=0,
weight is 123gm and for each increase in a day, the weight of the soap decreases on the
average by 5.57 grams.

Predictor
Constant
Day

Coef
123.141
-5.5748

SE Coef
1.382
0.1068

T
89.09
-52.19

P
0.000
0.000

Interpretation: the sample estimates of alpha and beta are 123.141 and -5.57
respectively. The corresponding test statistics are 89.09 and -52.10 indicating that these
are too large values of t-statististics and lie on the extreme ends of t-curve.
Thus we reject the null hypothesis of alpha =o and beta=o. And conclude that the beta
and alpha play a significant role in the regression model.
S = 2.94921

R-Sq = 99.5%

R-Sq(adj) = 99.5%

Interpretation: the standard deviation of the error terms is 2.94. A 99.5% R-sqadj
indicates that when ever we observe a variation in the value of y, 99.5% of it is due to
the model (or due to change in x) and only .5% is due error or some unexplained factor.
That is this data fits well to the linear model.
Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
13
14

SS
23694
113
23807

MS
23694
9

F
2724.11

P
0.000

Interpretation: In this case ANOVA tests the hypothesis that beta=0. In fact F is nothing
but T-square. A low p-value suggest that beta plays a significant role in the model, this is
just reassurance of the t-test.
Unusual Observations
Obs
10
15

Day
12.0
22.0

Weight
50.000
6.000

Fit
56.244
0.496

SE Fit
0.772
1.418

Residual
-6.244
5.504

St Resid
-2.19R
2.13R

R denotes an observation with a large standardized residual.

Interpretation: the observation number 10 and number 15 are outliers. We need to go


back and review what happened on those days , either soap is used too much or too less.
To improve the model, we would like to delete those observation and recompute the
line.

Step 5:
Checking the validity of the assumptions:
We made the assumptions that the all the error terms are identically and independently
normally distributed with mean 0 and common variance sigma square.

Residual Plots for Weight


Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values

99

5.0
Residual

Percent

90
50
10
1

0.0
-2.5
-5.0

-5.0

-2.5

0.0
Residual

2.5

5.0

Histogram of the Residuals

5.0

4.5

2.5

3.0
1.5
0.0

30

60
Fitted Value

90

120

Residuals Versus the Order of the Data

6.0
Residual

Frequency

2.5

0.0
-2.5
-5.0

-6

-4

-2

0
2
Residual

9 10 11 12 13 14 15

Observation Order

Interpretation:
1. the graph on top left checks the assumption of normality of error terms. In this
case we see that most of the points are clustered around blue line indication that
the error terms are approximately normal. Thus our assumption of normality is
valid.
2. The graph on top right plots the error terms against the fitted values. There are
approximately half of them are above and half are below the zero line indicating
that our assumption of error terms having mean zero is valid.
3. On the same graph we see the clear cyclic pattern among the error terms
indicating that they are violating the assumption of independence of error. Error
terms are not independent. May be there is another factor present in this
example which we need to find out.
4. The bottom left graph again re-emphasizes the normality assumption. Though
our sample size is just 15.
5. The bottom right graph is also important in this case because data is a time
series and order of the data is important. A clear cyclic pattern indicates that
error terms are dependent on the time variable.

Step VI:
Although the beta is significant and R sq adj is very high indicating that model is a very
good fit to the data, there is violation of assumption of independence indicate that there
is some other factor which is playing role behind the screen and we may have to study it
further.

Step VII:
Let us estimate the value of y and interpret it
Say for x = 14 we find and interval for the average value of y
y-hat = 123 - 5.57 * 14 = 45.02
that is we expect that on the average the expected value of weight on the 14th day
approx 45 grams.
98% confidence interval:
45.02 t * .8441 = 45.02 2.326*.8441= (43.0565, 46.9635)

We are 98% confidant that that on the 14 th day the weight of the soap on the average
lies between 43 grams and 47 grams approx.
98% prediction interval:
45.02 2.326* 3.1163 = (41.9036, 48.1363)
We are 98% confidant that on the 14th day the predicted value of the weight of the
soap lies between 42 grams and 48 grams approx.

140

Fitted Line Plot


Weight = 123.1 - 5.575 Day

120
100

Regression
95% CI
95% PI
S
R-Sq
R-Sq(adj)

80

2.94921
99.5%
99.5%

60
40
20
0
0

10

15

20

Day

(optional)For those who want to improve upon the model


Quadratic fitting: compare the s-value and Rsq adj value with last model.

25

140

Fitted Line Plot


Weight = 127.3 - 6.744 Day
+ 0.05063 Day* * 2

120

Regression
95% CI
95% PI

100

S
R-Sq
R-Sq(adj)

60
40
20
0
0

10

15

20

25

Day

Validation of assumptions in quadratic fitting:


Residual Plots for Weight
Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values

99
2
Residual

Percent

90
50
10

0
-2
-4

1
-5.0

-2.5

0.0
Residual

2.5

5.0

Histogram of the Residuals

60
90
Fitted Value

120

2
Residual

3
2
1
0

30

Residuals Versus the Order of the Data

4
Frequency

Weight

80

1.95599
99.8%
99.8%

0
-2
-4

-4

-3

-2

-1
0
Residual

1 2

3 4 5 6 7 8 9 10 11 12 13 14 15
Observation Order

(R denotes an observation with a large standardized residual)

You might also like