In this paper, We will discuses detail review of Regression analysis . Find the regression of
BUDGET on (VOLUMES, VOLADDED, SERIALS). Give the F statistic. Give also the t
statistics for the coefficients and VIF (variance inflation factor) using MINITAB Software.

Key words : Regression , Multiply Regression , Budget, Coefficient.


Regression analysis is a statistical tool for the investigation of relationships between variables.
Usually, the investigator seeks to ascertain the causal effect of one variable upon another—the
effect of a price increase upon demand, for example, or the effect of changes in the money
supply upon the inflation rate. To explore such issues, the investigator assembles data on the
underlying variables of interest and employs regression to estimate the quantitative effect of the
causal variables upon the variable that they influence. The investigator also typically assesses the
“statistical significance” of the estimated relationships, that is, the degree of confidence that the
true relationship is close to the estimated relationship. Here is description of other parts of paper,
In 1.1 we will discuses History of regression, 1.2 Regression Analysis , In 2.0 will explain about
data , 3.0 we will show the solution and describe the solution .

1.1. History:

The earliest form of regression was the method of least squares, which was published
by Legendre in 1805,[1] and by Gauss in 1809.[2] Legendre and Gauss both applied the method to
the problem of determining, from astronomical observations, the orbits of bodies about the Sun
(mostly comets, but also later the then newly discovered minor planets). Gauss published a
further development of the theory of least squares in 1821, [3] including a version of the Gauss–
Markov theorem.

The term "regression" was coined by Francis Galton in the nineteenth century to describe a
biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors
tend to regress down towards a normal average (a phenomenon also known as regression toward
the mean).[4][5] For Galton, regression had only this biological meaning,[6][7] but his work was later
extended by Udny Yule and Karl Pearson to a more general statistical context.[8][9] In the work of
Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to
be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and 1925.[10][11]
Fisher assumed that the conditional distribution of the response variable is Gaussian, but the
joint distribution need not be. In this respect, Fisher's assumption is closer to Gauss's formulation
of 1821.
In the 1950s and 1960s, economists used electromechanical desk calculators to calculate
regressions. Before 1970, it sometimes took up to 24 hours to receive the result from one

Regression methods continue to be an area of active research. In recent decades, new methods
have been developed for robust regression, regression involving correlated responses such
as time series and growth curves, regression in which the predictor or response variables are
curves, images, graphs, or other complex data objects, regression methods accommodating
various types of missing data, nonparametric regression, Bayesian methods for regression,
regression in which the predictor variables are measured with error, regression with more
predictor variables than observations, and causal inference with regression.

1.2. Regression Analysis:

Regression analysis is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables. More
specifically, regression analysis helps one understand how the typical value of the dependent
variable (or 'criterion variable') changes when any one of the independent variables is varied,
while the other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given the independent variables –
that is, the average value of the dependent variable when the independent variables are fixed.
Less commonly, the focus is on a quantile, or other location parameter of the conditional
distribution of the dependent variable given the independent variables. In all cases, the
estimation target is a function of the independent variables called the regression function. In
regression analysis, it is also of interest to characterize the variation of the dependent variable
around the regression function which can be described by a probability distribution.

Regression analysis is widely used for prediction and forecasting, where its use has substantial
overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms
of these relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so caution is advisable;[14] for example, correlation does not imply

Many techniques for carrying out regression analysis have been developed. Familiar methods
such as linear regression and ordinary least squares regression are parametric, in that the
regression function is defined in terms of a finite number of unknown parameters that are
estimated from the data. Nonparametric regression refers to techniques that allow the regression
function to lie in a specified set of functions, which may be infinite-dimensional.
The performance of regression analysis methods in practice depends on the form of the data
generating process, and how it relates to the regression approach being used. Since the true form
of the data-generating process is generally not known, regression analysis often depends to some
extent on making assumptions about this process. These assumptions are sometimes testable if a
sufficient quantity of data is available. Regression models for prediction are often useful even
when the assumptions are moderately violated, although they may not perform optimally.
However, in many applications, especially with small effects or questions of causality based
on observational data, regression methods can give misleading results.[15][16]

1.2.1. Simple Regression:

In statistics, simple linear regression is the least squares estimator of a linear regression model
with a single explanatory variable. In other words, simple linear regression fits a straight line
through the set of n points in such a way that makes the sum of squared residuals of the model
(that is, vertical distances between the points of the data set and the fitted line) as small as

The adjective simple refers to the fact that this regression is one of the simplest in statistics. The
slope of the fitted line is equal to the correlation between y and x corrected by the ratio
of standard deviations of these variables. The intercept of the fitted line is such that it passes
through the center of mass (x, y) of the data points.

1.2.2. Multiple Regression:

Multiple regression analysis is a powerful technique used for predicting the unknown value of a
variable from the known value of two or more variables- also called the predictors. More
precisely, multiple regression analysis helps us to predict the value of Y for given values of X 1,
X2, …,Xk.

1.2.3. The Multiple Regression Model:

In general, the multiple regression equation of Y on X1, X2, …,Xk is given by:

Y = b0 + b1 X1 + b2 X2 + …………………… + bk Xk

1.2.4. Dependent and Independent Variables:

By multiple regression, we mean models with just one dependent and two or more independent
(exploratory) variables. The variable whose value is to be predicted is known as the dependent
variable and the ones whose known values are used for prediction are
known independent (exploratory) variables.
2.0. Data :

The data set used in this illustration is taken from Internet Source . The values describe quantities
related to the budges at a number of large universities. We take this data for the Regression of
Budget . Here I will explain about our data set. Our dependent variable Y is BUDGET and others
School, Volumes , Voladded and Serials are for Independent variable X.

The columns are
SCHOOL = school name
VOLUMES = library volume (in 1000s)
VOLADDED = volumes added in last year (in 1000s)
SERIALS = current serials (in 1000s)
BUDGET = expenditures for materials and salaries (in $1000s)

01 Yale 8236.7 174.7 57.4 19850.4
02 Columbia 5551.7 121.7 63.4 18031.2
03 Minnesota 4286.4 116.5 44.6 14956.7
04 Indiana 3787.0 118.3 32.6 11906.5
05 Penn 3376.9 106.2 30.5 12468.6
06 NYU 2932.1 74.7 29.8 12801.8
07 Duke 3510.6 92.1 35.7 11074.0
08 Florida 2539.4 78.9 29.5 9875.5
09 LSU 2210.8 65.3 22.8 8008.8
10 MIT 2029.5 81.9 21.1 8719.2
11 West_Ont 1868.9 62.0 19.0 7130.9
12 Wash_StL 2069.7 43.3 16.5 8103.6
13 Emory 1951.1 66.9 18.0 8340.1
14 S_Carolina 2175.8 65.9 18.9 5788.8
15 Irvine 1239.1 61.0 15.9 9089.0
16 Nebraska 1833.6 62.8 23.8 5941.3
17 Ga_Tech 1468.6 49.9 28.6 4308.4
18 McMaster 1218.1 47.5 18.2 6069.8
19 Riverside 1250.4 47.2 13.7 6303.2
20 Saskatchwan 1254.0 47.9 10.1 5241.2
21 Oklahoma_St 1420.6 30.3 10.4 4699.8

3.0. Solution :
It’s helpful, before beginning the hard work, to examine some plots. Here are plots of the
dependent variable against each of the three independent variables. O.1 graph t is showing the
relationship of Budget vs Volumes , 0.2 graph is showing Budget vs Voladded and in 0.3 graph is
showing the relationship of Budget and Serials. You can see that all three graphs have the same
general appearance.And Table No 1.0 is showing there correlation between dependable variable
and Independable variables .

Scatterplot of BUDGET vs VOLUMES Scatterplot of BUDGET vs VOLADDED

20000 20000

17500 17500

15000 15000


12500 12500

10000 10000

7500 7500

5000 5000

1000 2000 3000 4000 5000 6000 7000 8000 9000 20 40 60 80 100 120 140 160 180

0.1 0.2

Scatterplot of BUDGET vs SERIALS








10 20 30 40 50 60 70


0.908 0.876
SERIALS 0.000 0.000

0.927 0.919 0.895
BUDGET 0.000 0.000 0.000

Table No 1.0

You might consider the possibility of replacing each variable by its logarithm. In this case, the
decision is marginal. Moreover, the managers might have wanted a cost analysis and resisted the
taking of logarithms.
Let’s get now the regression of BUDGET on all three predictors. We will have some interest in
the VIF (variance inflation factor) numbers,

3.1. The regression equation is

BUDGET = 1567 + 0.854 VOLUMES + 44.4 VOLADDED + 82.3 SERIALS

Predictor Coef SE Coef T P VIF

Constant 1567 1074 1.46 0.163

VOLUMES 0.8544 0.7265 1.18 0.256 12.884

VOLADDED 44.37 31.45 1.41 0.176 9.713

SERIALS 82.28 58.92 1.40 0.181 5.795

Table No. 1.1

S = 1550.85 R-Sq = 88.8% R-Sq(adj) = 86.9%

PRESS = 87542244 R-Sq(pred) = 76.10%

Table No. 1.2

Analysis of Variance
Source DF SS MS F P

Regression 3 325360308 108453436 45.09 0.000

17 40887143 2405126

Total 20 366247451

No evidence of lack of fit (P >= 0.1). Table No 1.3

Residual Plots for BUDGET
Normal Probability Plot Versus Fits

90 1500

50 0

-4000 -2000 0 2000 4000 5000 10000 15000 20000
Residual Fitted Value

Histogram Versus Order
8 3000

6 1500


4 0

2 -1500

-3000 -2000 -1000 0 1000 2000 3000 2 4 6 8 10 12 14 16 18 20
Residual Observation Order

Graph No 0.4

The overall F statistic is 45.09, on (3, 17) degrees of freedom. This is highly significant (P=0.000
to the precision given). The individual t statistics are 1.18 for VOLUMES, 1.41 for VOLADDED
and 1.40 for SERIALS. We have also show graph No 0.4 and none are significant. The
relationship of BUDGET to each of the individual independent variables (predictors) is
approximately the same. Moreover, the independent variables are strongly related to each other,
as seen in the somewhat high VIF values. Thus, BUDGET is definitely strongly related to the
predictors (as decided by the F statistic), but none of the predictors contribute anything to the
relationship which cannot be attributed to one of the other predictors.


In this research of Regression of Budget. We have used Minitab software for regression process
and use coefficient and Multiply regression analysis to show the relationship of BUDGET to
each of the individual independent variables (predictors) is approximately the same. Moreover,
the independent variables are strongly related to each other, as seen in the somewhat high VIF
values. The overall F statistic is 45.09, on (3, 17) degrees of freedom. This is highly significant
(P=0.000 to the precision given).
References :
Information is also taken from internet sources .

1.A.M. Legendre. Nouvelles méthodes pour la détermination des orbites des comètes, Firmin Didot,
Paris, 1805. “Sur la Méthode des moindres quarrés” appears as an appendix.

2.Jump up^ C.F. Gauss. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum.

3.Jump up^ C.F. Gauss. Theoria combinationis observationum erroribus minimis obnoxiae. (1821/1823)

4.Jump up^ Mogull, Robert G. (2004). Second-Semester Applied Statistics. Kendall/Hunt Publishing
Company. p. 59. ISBN 0-7575-1181-3.

5.Jump up^ Galton, Francis (1989). "Kinship and Correlation (reprinted 1989)". Statistical
Science (Institute of Mathematical Statistics) 4 (2): 80–86.doi:10.1214/ss/1177012581. JSTOR 2245330.

6.Jump up^ Francis Galton. "Typical laws of heredity", Nature 15 (1877), 492–495, 512–514, 532–
533. (Galton uses the term "reversion" in this paper, which discusses the size of peas.)

7.Jump up^ Francis Galton. Presidential address, Section H, Anthropology. (1885) (Galton uses the term
"regression" in this paper, which discusses the height of humans.)

8.Jump up^ Yule, G. Udny (1897). "On the Theory of Correlation". Journal of the Royal Statistical
Society (Blackwell Publishing) 60 (4): 812–54.doi:10.2307/2979746. JSTOR 2979746.

9.Jump up^ Pearson, Karl; Yule, G.U.; Blanchard, Norman; Lee,Alice (1903). "The Law of Ancestral
Heredity". Biometrika (Biometrika Trust) 2 (2): 211–236.doi:10.1093/biomet/2.2.211. JSTOR 2331683.

10.Jump up^ Fisher, R.A. (1922). "The goodness of fit of regression formulae, and the distribution of
regression coefficients". Journal of the Royal Statistical Society (Blackwell Publishing) 85 (4): 597–
612. doi:10.2307/2341124.JSTOR 2341124.

11.Ronald A. Fisher (1954). Statistical Methods for Research Workers (Twelfth ed.). Edinburgh: Oliver
and Boyd. ISBN 0-05-002170-2.

12.Jump up^ Aldrich, John (2005). "Fisher and Regression". Statistical Science 20 (4): 401–
417. doi:10.1214/088342305000000331. JSTOR 20061201.

13.Jump up^ Rodney Ramcharan. Regressions: Why Are Economists Obessessed with Them? March
2006. Accessed 2011-12-03.

14.Armstrong, J. Scott (2012). "Illusions in Regression Analysis". International Journal of Forecasting
(forthcoming) 28 (3): 689.doi:10.1016/j.ijforecast.2012.02.001.

15.Jump up^ David A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press

16.R. Dennis Cook; Sanford Weisberg Criticism and Influence Analysis in Regression, Sociological
Methodology, Vol. 13. (1982), pp. 313–361

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.