You are on page 1of 102

Practical Regression and ANOVA us

Julian J. Faraway
July 2002

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 1
Introduction
11 . Look before you leap!
 Statistics starts with a problem, continues with the
collection of data, proceeds with the data analysis and
finishes with conclusions.
 It is a common mistake of inexperienced Statisticians to
plunge into a complex analysis without paying attention
to what the objectives are or even whether the data are
appropriate for the proposed analysis.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Formulation of a problem
-- more essential than its solution, you must
 1. Understand the physical background.
 2. Understand the objective.
 3. Make sure you know what the client wants.
 4. Put the problem into statistical terms.

That a statistical method can read in and process


the data is not enough. The results may be totally
meaningless.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Data Collection
– important to understand
how the data was collected
 Initial Data Analysis
-- critical step to be performed
 Numerical summaries
 Graphical summaries
• – One variable - Boxplots, histograms etc.
• – Two variables - scatterplots.
• – Many variables - interactive graphics.
Are the data distributed as you expected?
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Example – pima (Page 9)
 Variables
pregnant -- 怀孕次数
glucose -- 葡萄糖集中度
diastolic -- 心脏舒张血压
triceps -- 三头肌皮肤折叠厚度
insulin -- 2 小时血清胰岛素
bmi -- 身体重量指标 diabetes
age -- 年龄
test -- 糖尿病测试结果( 0- 阴性, 1- 阳性)
 Initial data analysis
Pages 10 - 13

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Two things unusual :
• Max of pregnant = 17 ? Possible ?
• Diastolic (blood presure)=0 ? Are they missing values?
 Data rearranged – code the data
• -- NA for missing values
• -- designate test as a factor with labels
 Graphical summary

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


1.2 When to use Regression Analysis
– Model relationship between Y and X1,…,Xp
Y --- response/output/dependent variable (continuous)
X1,…,Xp --- predictor/input/independent
/explanatory variables
 p=1 --- simple regression
 p>1 --- multiple regression --- diastolic on bmi and diabetes
 more than one Y --- multivariate regression
 Analysis of Covariance --- diastolic on bmi and test
 Analysis of Variance (ANOVA) --- diastolic on test
 Logistic Regression --- test on diastolic and bmi

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 2
Estimation
 Linear Model
 Y=β 0+β 1 X1+ +β p Xp +ε
 yi=β 0+β 1 x1i + +β p xpi +ε i, i=1,2,,n
 Matrix/vector representation

Data = Systematic Structure + Random Variation


N dimensions = p dimensions + (n-p)dimensions
where y=(y1,,yn)T , =(ε 1,,ε n)T, β =(β 1,,β p)T
 Estimation of β such that Xβ is close to Y.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Geometrical representation
Residual
ε
Y

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Least squares estimation (LSE)
βˆ = ( X T X )−1 X T y
X βˆ = Hy
ˆˆ
E ( β ) = β , Var ( β) =( X X )
T −1 2
σ

H=X(XTX)-1XT --- hat-matrix


--- orthogonal projection of y onto
M(X)
Predicted values :ˆ y=Hy=X βˆ
Residuals:εˆ =(I -H )y
ε ε = y I ( −H y )
ˆˆ
Residual sum of squares: T T

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 β̂is a good estimate?
• Geometrically good – orthogonal projection
• MLE if ε i iid normal
• BLUE if E(ε )=0 and Var(ε )=σ 2I
--- from Gauss-Markov Theorem
 Situations for other than LSE:
• GLSE if errors are correlated or have unequal variances
• Robust estimate if errors are long tailed
• Biased estimate (e.g. ridge regression) if predictors are
highly correlated (i.e. collinear).

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Goodness of fit


R =1 −
2
( ˆ
y i − y 2
i ) RSS
1 = −
∑ i )

( y − y2
T o ta l R S S

--- percentage of variance explained


ε T
ˆˆ ε
• σˆˆ: σ=
2

n− p

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Example --- Gala (Page 23-25)
 Variables
Species -- # of species of tortoise
Endemics -- # of endemic species
Elevation -- highest elevation of the island (m)
Nearest -- distance from the nearest island (km)
Scruz -- distance from Santa Cruz island (km)
Adjacent -- area of the adjacent island (km2)
 gfit <- lm(Species ~ Area + Elevation + Nearest + Scruz
+ Adjacent, data = gala)
summary(gfit)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 3
Inference
General Assumptions for linear models
y» N(Xβ ,σ 2I)
Thus
βˆ = ( X T X )−1 X T y : N ( β , ( X T X ) −1σ 2 )
 Hypothesis tests to compare models
Two models:
 Ω – Large model, with q parameters (dimension)
 ω – small model, with p parameters which consists of a
subset of predictors that are in Ω
Law of parsimony – use ω if the data support it!
See the geometric illustration on page 27.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• Geometrical illustration:

Y
Residual for ω
Residual for Ω

Difference between
two models

Large model space


Small model space

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Test statistic
RSS ω −RSS Ω

RSS Ω
or equivalaently the likelyhood-ra tio st atistic
max β ,σ ∈Ω L( β
, σ| y)
max β ,σ ∈ω L( β, σ| y)

From Cochren’ theorem, we have


( RSSω − RSS Ω ) /( q − p)
: Fq − p ,n− q
RSS Ω /(n − q )
Reject the null hypothesis (ω ) if F>Fq-p,n-q (α )

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Comment:
The same test statistic applies not just when ω is a
subset of Ω but also to a subspace. This test is very
widely used in regression and analysis of variance.
When it is applied in different situations, the form of test
statistic may be re-expressed in various different ways.
The beauty of this approach is you only need to know
the general form. In any particular case, you just need
to figure out which models represents the null and
alternative hypotheses, fit them and compute the test
statistic. It is very versatile.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Some examples
 1. Test of all predictors
Are any of the predicators useful ?
Ω : y=Xβ +ε , X: n by p
ω : y=µ +ε – predict y by mean
(H0: β 1==β p-1=0)
Now we find that
ˆˆ T
RSSΩ = ( y − Xβ ) ( y − X β ) =RSS, df Ω =n −p
RSSω = ( y − y)T ( y − y) = SYY, df ω=n-1
-- sum of squares corrected for the mean

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Comment:
• A failure to reject the null hypothesis is not the end of
the game — you must still investigate the possibility of
non-linear transformations of the variables and of
outliers which may obscure the relationship. Even then,
you may just have insufficient data to demonstrate a
real effect which is why we must be careful to say “fail
to reject” the null rather than “accept” the null.
It would be a mistake to conclude that no real
relationship exists.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• When the null is rejected, this does not imply that the
alternative model is the best model. We don’t know
whether all the predictors are required to predict the
response or just some of them. Other predictors might
also be added —for example quadratic terms in the
existing predictors. Either way, the overall F-test is just
the beginning of an analysis and not the end.
• Example – old economic dataset on page 29.
variables:
dpi – per-capita disposable income
ddpi – percent rate of change in dpi
sr – aggregate personal saving divided by disposable
income
pop15, pop75 – percentage population under 15 and over 75

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 2. Testing just one predictor
Can one particular predicator be dropped?
• RSSΩ – RSS for the model with all the predicators
• RSSω – RSS for the model with all the predicators except
predicator i.
Test statistic:
• F – statistic (with dfs 1 and n-p) or
ˆˆ
• t-statistic with df n-p: ti = β /i se( )βi
Example -- Savings (Page 30)
--- Three methods:
• 1) F-statistic
• 2) t-statistic
• 3) compare two nested models using anova(g2,g)
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 3. Testing a pair of predictors
Suppose we wish to test the significance of
variables Xj and Xk. We might construct a table as
shown just above and find that both variables
have p-values greater than 0.05 thus indicating
that individually neither is significant. Does this
mean that both Xj and Xk can be eliminated from
the model? Not necessarily

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Except in special circumstances, dropping one variable
from a regression model causes the estimates of the
other parameters to change so that we might find that
after dropping Xj, that a test of the significance of Xk
shows that it should now be included in the model.
If you really want to check the joint significance of Xj
and Xk, you should fit a model with and then without
them and use the general F-test discussed above.
Remember that even the result of this test may
depend on what other predictors are in the model.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 3. Testing a subspace
Whether two predicators be replaced by their sum?
Null hypothesis:
H0: β j=β k (a linear subspace)
Example – Savings
• 1) H0: β pop15 =β pop75

Null model :
y=β 0+β pop15 (pop15+pop75)+β dpi dpi+β ddpi ddpi+ε
> g <- lm(sr ~ . , savings)
> gr <- lm(sr ~ I(pop15 + pop75) + dpi + ddpi, savings)
> anova(gr,g)
% The period . stands for the other variables in the data frame
% I() argument is evaluated rather than interpreted as part of the model formula

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• 2) H0: β ddpi =1
Null model :
y=β 0+β pop15 pop15+β pop75 pop75+β dpi dpi+ddpi+ε
% The fixed term is called an offset
> gr <- lm(gr ~ pop15 + pop75 + dpi + offset(ddpi), savings)
> anova(gr, g)
Two other ways as usual:
t = ( βˆˆ
− c )/seβ
( ) ,
t-statistic :
where c is th e poi nt hy pothesis

F- statistic : square of t-statistic

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Confidence Intervals for β
 Confidence intervals are closely related to hypothesis tests:
For a 100(1-α )% confidence region, any point that lies
within the region represents a null hypothesis that would
not be rejected at the 100α % level while every point
outside represents a null hypothesis that would be rejected.
 The confidence region provides a lot more information than
a single hypothesis test in that it tells us the outcome of a
whole range of hypotheses about the parameter values.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Simultaneous confidence region for β

( βˆˆ− βT) X T X ( β − T) β ˆp2 ≤p(α, nF−) pσ


 Confidence interval for one β i
• General form
estimate +/- critical value x s.e. of estimate
• For β i
βˆi ± tn(α−pσ
/2)
( XT ii )1
ˆ X −

• It’s better to consider the joint confidence intervals when


plausible, especially when the estimates of β i’s are heavily
correlated.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Example – Savings
• 95% confidence interval for pop75
• 95% confidence interval for growth
• Joint 95% confidence region for pop75 and growth
Note : “ellipse” is needed for drawing confidence ellipses

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Confidence intervals for predictions
 Q: Given a set of predictors x0, what is the predicted
response?
yˆ 0 = x0T βˆ
A:
Two predictions:
• of the future mean response
• of future observations
 The point estimate are the same, while the confidence
intervals are respectively
ˆ0 ±
y t n(α

p
/2)
+σx 1
0 X X(T
x1T
0 ) −

y t n(α
ˆ0 ± −
p
/2)
σ
x0 X T
X ( T1
x0 )

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Orthothonality (page 41)
 Identifiability (page 44)
 What can go wrong? (page 46)
• Source and quality of the data
• Error component
• Structural component
 Confidence in the conclusions from a model declines as
we progress through these.
 Most statistical theory rests on the assumption that the
model is correct. In practice, the best one can hope for
is that the model is a fair representation of reality. A
model can be no more than a good portrait.
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 All models are wrong but some are useful.
George Box
 So far as theories of mathematics are about reality;
they are not certain; so far as they are certain, they are
not about reality.
Einstein

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Prediction and Extrapolation (Page 52)
• Is the new X0 within the range of validity of the model?
• Is it close to the range of the original data?
• If not,
* the prediction may be unrealistic and
* confidence intervals for prediction get wider as
we move away from the data.
Example
• Model:
>g4 <- lm(sr ~ pop75, data=savings)
>summary(g4)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• Prediction/fit and Confidence interval (CI)
>grid <- seq(0, 10, 0.1)
>p <- predict(g4, data.frame(pop75=grid), se=T)
>cv <- qt(0.975, 48)
>matplot(grid, cbind(p$fit, p$fit-cv*p$se, p$fit+cv*p$se),
+ lty=c(1, 2, 2), type =“1”,xlab=“pop75”,ylab=“Savings”)
>rug(savings$pop75) %show the location of pop75 values
Conclusion (See Figures 3.6 and 3.7): the widening of the CI does not reflect the
possibility that the structure of the model itself may change as we move into new
territory.

 Prediction is a tricky business --- Perhaps the only


thing worse that a prediction is no prediction at all.
(a quote from the 4th century)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 4
Errors in Predictors
 Regression model Y=Xβ +ε allows for Y being measured
with error by having the ε term.
 What if X is measured with error ( is not the X used to
generate Y)?
 Conider (xi,yi,i=1,2,,n):

where ε and δ are independent. Eε i=Eδ i=0,


varε i=σ 2, varδ i=σ 2δ . σσ ξ2ξ=,δ∑=cov(ξ
( i ξ − ) ξn,δ
2
/ ).

True underlying relationship:


η i=β 0+β 1 ξ i
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Then

βˆ1 =
∑ (x − x ) y
i i

∑ (x − x )i
2

ˆ σ ξ + σ ξδ
2
cov(ξ ,δ )= 01
E β 1 = β1 2    →1 β 2 2
σ ξ + σ2δ +2 σ ξ δ 1 + / σδ σξ
 Thus the estimate of β 1 will be biased!
 If the variability in the errors of observation of X (σ 2δ )
are small relative to the range of X (σ 2ξ ), we need not
be concerned.
 If not, it’s a serious problem. Other methods should be
considered!

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Attention on the difference:
• Errors in predictors case

• Treating X as a r.v. (for observational data, assume that Y is


generated conditional on the fixed value of X)
 Example (page 56)
> x <- 10*runif(50)
> y <- x + rnorm(50)
> gx <- lm(y ~ x)
 The regression coeffs are β =0 and β =1. See how are they
0 1
changed when the noise 5*rnorm(50) is added.  decreased
from 1 to 0.25 theoretically, from 0.97 to 0.435 numerically.
 As σ 2=1/12,σ 2 =25 and σ 2 =100/12, the expected mean is 0.25
δ ξ
(see the slide above and the simulation on P57)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 5
Generalized Least Squares
 The general case
 Error: varε = σ 2 ∑, where
σ 2 is unknown
∑ is known.
 GLS to minimize (y-Xβ )T∑-1 (y-Xβ )
βˆ = ( X T −Σ1 X ) 1− X T 1−
Σy
 GLS=OLS by regressing S-1 X on S-1 y, where S is the
triangular matrix using the Cholesky Decomposition of
∑: ∑=SST. (var S-1 ε =σ 2I)
 Example: Longley’s regression data (page 59)
• Errors: ε i+1 =ρ ε i+δ i,δ i» N(0,τ 2)
• ∑=(∑ij ): ∑ij =ρ |i-j|
• Two methods: 1) Use the formula 2) Choleski Decomposition
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Revisit the example using nlme library (Page 61)
 Weighted Least Squares
 ∑=diag(1/w1,,1/wn) – diagonal and uncorrelated
 weights: wi, i=1,2,…,n.
Low/high variability high/low weight.
Regress wi xi on wi yi using OLS
 Example -- strongx:
• An experiment was designed to study the interaction of
certain kinds of elementary particles on collision with proton
targets.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• The cross-section (crossx) variable is believed to be linearly
related to the inverse of the energy (energy – being
inverted!). At each level of the momentum, a very large
number of observations were taken so that it was accurately
estimate the std of the response (sd).
• (Figure 5.1) The unfitted is better? For lower values of
energy, the variance in the response is less, thus the weight
is high. So the weighted fit tries to catch these points better
than the others.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Iteratively Reweighted Least Squares(IRWLS), p64
 Var(ε ) is not completely known
 Model ∑ (e.g. var ε =γ 0+γ 1x1) and then estimate β
using OLS. Iterate until convergence.
 Use gls() in the nlme library – model the variance and
jointly estimate the regression and weighting
parameters using likelihood based method.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 6
Testing for Lack of Fit
 Estimate of σ 2
 If the model is correct, then
σˆ 2 should be unbiased.
 If the model is simple but not correct, then
σ̂ 2 will overestimate
σ 2. σ̂ 2
 If the model is too complex and overfits the data, then will
underestimate. σˆ 2
 Test statistic:

 σ 2 known
Lack of fit if (n − p )σˆ 2 2 (1−α )
> χ n− p
σ 2

Example – strongx
 Model 1: without a quadratic term  lack of fit (p-value=0.0048)
 Model 2: with a quadratic term well fit (p-value=0.85363)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 σ 2
unknown
The σˆ based on regression model needs to be compared to some
2

model-free estimates.
 Repeat y independently for one or more x.
 To reflect both the within subject variability and between subject
variability, we use the “pure error” estimate of σ 2, given by SSpe/dfpe,
where
SS pe = ∑∑ −(yyi
distinct xgiven x
) 2

df pe = ∑
distinct x
(# −
replicates 1)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 If you fit a model that assigns one parameter to each group of
observations with fixed x then the σ̂ 2 from this model will be the pure
error σˆ 2 . You use factor(x) instead of x directly! See the example
below.
 Compare this model to the regression model amounts to the lack of fit
test. -- most convenient way.
 Alternative (ANOVA): Partition RSS into
• that due to lack of fit

• that due to the pure error


df SS MS F

Residual n-p RSS Ratio of MS


Lack of Fit n-p-dfpe RSS-SSpe RSS −
SS pe

n −p−df pe
Pure Error dfpe SSpe SSpe /dfpe

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Lack of fit if the F-statistic >F(1- α )n-p-df
pe ,df pe

Example – corrosion:
Thirteen specimens of 90/10 Cu-Ni alloys with varying iron
content in percent. The specimens were submerged in sea
water for 60 days and the weight loss due to corrosion was
recorded in units of milligrams per square decimeter per
day.
• Model 1:
> g <- lm(loss ~ Fe, data=corrosion)
Both R2=97% and graph (Figure 6.3) show good fit to the data.
• Model 2:
(Reserve a parameter for each group of data with the same x.
The fitted values are the means in each group, see Figure 6.3.)
> ga <- lm(loss ~ factor(Fe),data=corrosion)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Compare the two models:
> anova(g, ga)
• p-value = 0.0086 a lack of fit !

• Reason:
regression std=3.06, pure error sd=sqrt(11.8/6)=1.4.
( replicates are genuine? Low pe sd? Maybe caused by
some correlation in the measurements.
Unmeasured third variable.)
• Another model other than a straight line (though not obvious!)
> gp <- lm(loss ~ Fe+I(Fe^2)+I(Fe^3)+I(Fe^4)+I(Fe^5)+I(Fe^6),
+ corrosion)
>summary$r.squared
[1] 0.99653
• R2=99.7%

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• The fit of this model is excellent (see Figure 6.4)
but it is clearly ridiculous!
• This is a consequence of OVERFITTING the data.
 About R2 and goodness of fit.
• No need to become too focused on measures of fit like R2
• If one method shows that the null hypothesis is accepted,
we cannot conclude that we have the true model. After all, it
may be that we just did not have enough data to detect the
inadequacies of the model. All we can say is that the model
is not contradicted by the data.
• It is also possible to detect lack of fit by less formal,
graphical methods.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• A more general question is how good a fit do you really
want?
• By increasing the complexity of the model, it is possible to fit
the data more closely. By using as many parameters as
data points, we can fit the data exactly.
• Very little is achieved by doing this since we learn nothing
beyond the data itself and any predictions made using such
a model will tend to have very high variance.
• The question of how complex a model to fit is difficult and
fundamental.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 7
Diagonostics
Regression model building is often an iterative and
interactive process. The first model we try may
prove to be inadequate.
Regression diagnostics are used to detect problems
with the model and suggest improvements.
This is a hands-on process.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Residuals and Leverage
 H – hat-matrix: H=X(XTX)-1 XT, thus
yˆ = Hy
εˆ = y − ˆy =( I −H) y
= (I − H )Xβ + (I −H )ε
= (I − H ε)
var εˆ = (I − H ) σ2 (var ε =2 σI )
 hi=Hii -- leverages
εˆi = σ2 (1 h− )i : a large leverage of hi will “force” the fit
to be close to yi.
 Some facts: ∑ hi=p, hi >= 1/n
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 “Rule of thumb” – leverage of more than 2p/n should be
looked at more closely. Large values of hi are due to
extreme values in X. (hi corresponds to a Mahalanobis
ˆ yˆˆ=h iσ 2
var
distance defined by X.) Also notice that .
 Example – savings (page 72)
• Index plot of residuals Chile and Zambia with largest or
smallest residual.
• Index plot of leverages Libya and United States with
highest leverage.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Studentized Residuals
 var εˆ = σ 2 (1 − hi )  (internally) studentized residuals:
εˆi
ri =
σˆ 1 − hi
 If the model assumptions are correct, then
var ri=1 and corr(ri,rj) is small
 Thus studentization can correct for the non-constant
variance in residuals when the errors have constant
variance, but cannot correct for it when there is some
heteroscedascity in the errors.
 Example – savings (page 75)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 An outlier Test
 An outlier is a point that does not fit the current model.
Outliers may effect the fit. See Figure 7.2.
 An outlier test enable us to distinguish between truly
unusual points and residuals which are large but not
exceptional.
 Exclude point i and recompute the estimates to get
βˆ( i) and
σ (i)ˆ 2
yˆ ( i) =xi T β ˆ
( i)

 If yˆ ( i ) − yi is large then point i is an outlier.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Test statistic (jacknife or externally studentized/
crossvalidated ) residuals:
εˆi n − p 1− 1 / 2
ti = ( ri = 2 ): − 1− tn p
σˆ (i ) 1 − h i
n − p i− r
 When all cases are to be tested we must adjust the level
of the test using Bonferroni correction. However it is
conservative!
 Example – savings (page 76)
 Some notes about outliers and their treatment with an
example: Star (page 76-78)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Influential Observations
 An influential point is one whose removal from the
dataset would cause a large change in the fit.
 An influential point may or may not be an outlier and
may or may not have large leverage but it will tend to
have at least one of those two properties.
 Measures of influence
• Change in the coefficient βˆˆ− β
(i )

• Change in the fit yˆˆ− y(i ) = X T ( βˆˆ− β(i ) )

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• Cook Statistics – Popular now

( βˆˆˆˆ
−β ( i)) (T XTβ)(
X−β )
( i)
Di =
pσˆ 2
1 2 hi
= ri
p 1 −h i
• The first term, ri2, is the residual effect and the second is the
leverage. The combination leads to the influence.
• An index plot of Di can be used to identify influential points.
 Example – savings (page 79)
• Full data
• Exclude one point -- helpful using lm.influence()

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Residual Plots
 Residuals vs. fitted – to look for
• deteroscedscity (non-constant variance)
• nonlinearity
 Residuals or abs(residuals) vs. one of the predictors xi –
look for if xi should be included.
 Example 1 – savings (pages 81-83, Figure 7.6)
Scatter plot of residuals vs. predicator pop15 show non-
constant variance in two groups.
 Example 2 – simulation example (page 83)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Non-Constant Variance
 Two approaches to dealing with non-constant variance.
1) weighted least squares
2) variance stabilizing transformation y->h(y):
dy
h( y ) = ∫
var y
• var y/ (Ey)2 h(y)=log y
• var y/ (E y)  h(y)=\sqrt(y) – appropriate for count response
data
Note: Use graphical techniques (e.g. residual plots)
to detect problems/structures of unsuspected
natures.
 Example – Galapagos (page 84)
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Non-Linearity
How to check if the systematic part Ey=Xβ is correct?
 As before, we may look at

• Plots of residuals against fitted and predictors xi

• Plots of y against each xi


What about the effect of other x on the y vs. xi plot?
 Partial Regression plots (or Added Variable plots):
can isolate the effect of xi on y
δˆ
• Regress y on all x except x , get residuals ;
i
γ̂
Regress xi on all x except xi, get residuals ;

δˆ γ̂
• Plot against .
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Partial Residual plots:
Plot εˆ+ βˆi xiagainst x. i

 The slope of a line fitted is βˆi for both plots.


 Comparison: Partial residual plots are reckoned to be
better for non-linearity detection while added variable
plots are better for outlier/influential detection.
 Example – savings (pages 85-87)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Assessing Normality
 Use QQ-plot
 Example 1 – savings (page 88, see Figure 7.9)
 Example 2 – simulated data from different distributions
(page 88, see Figure 7.10)
 Ways to treat non-normality (page 90)
 Half-normal plots
 Designed for assessment of positive data
 Useful for leverages or Cook Statistics – look for
outliers in the model.
 ! No need to look for a straight line relationship
 Example – savings (page 91)
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Corrected Errors
 Plot residuals against time
 Use formal tests like the Durbin-Watson or the run test.
(If errors are correlated, you can use GLS)
 Example – airquality (pages 93-94)
• Variables: ozone
solar radiation
temperature
wind speed
• Non-transformed response
non-constant variable and nonlinearity
• Log transformed response
• Check for serial correlation

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 8
Transformation
Transformations of the response and predictors can
improve the fit and correct violations of model
assumptions such as constant error variance.
We may also consider adding additional predictors
that are functions of the existing predictors like
quadratic or crossproduct terms.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Transforming the response
 Example – Log response
log y=β 0+β 1 x +ε (linear regression)
In the original scale:
y=exp(β 0+β 1 x) exp(ε )
 The errors enter multiplicatively in the original scale.
Compare:
y=exp(β 0+β 1 x) +ε (non-linear regression)
 The errors enter additively in the original scale.
 Usual approach: Try different transformations and check
the residuals.
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Box-Cox method/transformation (for y>0)

• Choose λ using maximum likelihood.


• Profile log-likelihood assuming normal errors:

where RSSλ is the RSS when tλ (y) is the response.


• Usually L(λ ) is maximized over {-2,-1,-1/2,0,1/2,1,2}

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Transforming the response can make the model harder
to interpret so we don’t want to do it unless it’s really
necessary.
One way to check this is to form a confidence interval
for λ . A 100 (1-α )% confidence interval for λ is

 Example 1 – savings (page 96)


CI(λ )=(0.6,1.4) NO need to transform
 Example 2 – gala
CI(λ )=(0.1,0.5)  cube-root transformation
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Other transformations
• Logit transformation log(y/(1-y)) for proportions
• Fisher’s Z transformation 0.5log((1+y)/(1-y)) for correlations
 Transforming the predictors
 Segmented regression (Broken stick regression)
 Example – savings (page 98-100)
• Method 1: Subset regression fit – discontinuous at knotpoint.
• Method 2: with hocky stick function.
 Polynomials & orthagonal polynomial:
add powers of the predictor(s) until the p-value is
significant.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Regression Splines
Polynomials have the advantage of smoothness but the
disadvantage that each data point affects the fit globally. This
is because the power functions used for the polynomials take
non-zero values across the whole range of the predictor.
In contrast, the broken stick regression method localizes the
influence of each data point to its particular segment which is
good but we do not have the same smoothness as with the
polynomials.
There is a way we can combine the beneficial aspects of both
these methods — smoothness and local influence — by using
B-spline basis functions.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Cubic B-spline basis fuction:
• A given basis function is non-zero on interval defined by four
successive knots and zero elsewhere.
This property ensures the local influence property.
• The basis function is a cubic polynomial for each sub-interval
between successive knots
• The basis function is continuous and continuous in its first and
second derivatives at each knot point.
This property ensures the smoothness of the fit.
• The basis function integrates to one over its support
 A constructed example: y=sin2(2π x3)+ε , ε » N(0,0.12)
1) Orthogonal polynomial regression
2) Regression splines
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
Chapter 9
Scale Changes, Principle Components and
Collinearity
 Changes of Scale (page 106)
Rescaling xi as ( xi + a ) / b leaves the t and F tests and σˆ 2
and R 2 unchanged and βˆˆ → bβ . i i

Rescaling y in the same way leaves the t and F tests and R 2


unchanged but σˆ 2 and βˆ will rescaled by b.
i

 Principle Components Regression (PCR) (page 107)


-- Transform X to orthogonality
 Partial Lest Squares (PLS)(page 114)
-- Find the best orthogonal linear combination of X for
predicting Y

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 PCR and PLS compared
PCR:
• PCR attempts to find linear combinations of the predictors
that explain most of the variation in these predictors using
just a few components.
• The purpose is dimension reduction.
• Because the principal components can be linear
combinations of all the predictors, the number of variables
used is not always reduced.
• Because the principal components are selected using only
the X-matrix and not the response, there is no definite
guarantee that the PCR will predict the response particularly
well although this often happens.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• If it happens that we can interpret the principal components
in a meaningful way, we may achieve a much simpler
explanation of the response. Thus PCR is geared more
towards explanation than prediction.

PLS:
• In contrast, PLS finds linear combinations of the predictors
that best explain the response. It is most effective when
there are large numbers of variables to be considered.
• If successful, the variablity of prediction is substantially
reduced.
• On the other hand, PLS is virtually useless for explanation
purposes.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Collinearity (page 117)
 If XTX is singular, i.e. some predictors are linear
combinations of others, we have (exact) collinearity and
there is no unique least squares estimate of β . If XTX is
close to singular, we have (approximate) collinearity or
multicollinearity (some just call it collinearity). This
causes serious problems with the estimation of β and
associated quantities as well as the interpretation.
 Collinearity can be detected in several way.
• Examination of the correlation matrix of the predictors will
reveal large pairwise collinearities.
• A regression of xi on all other predictors gives R2i. Repeat for
all predictors. R2i close to one indicates a problem.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• If R2i is close to one then the variance inflation factor
(VIF) 1/(1-R2i) will be very large, thus var(βˆi large
)
• Examine the eigenvalues of XTX - small eigenvalues indicate
a problem. The condition number is defined as

• Where κ >30 is considered large. Other condition numbers,


sqrt(λ 1/λ i) are also worth considering because they indicate
whether more than just one independent linear combination is
to blame.

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Collinearity leads to
• imprecise estimates of β —the signs of the coefficients
may be misleading.
• t-tests which fail to reveal significant factors
• missing importance of predictors
 Example – Longley (pages 118-120)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Ridge Regression
 Ridge regression makes the assumption that the
regression coefficients (after normalization) are not
likely to be very large.
 It is appropriate for use when the design matrix is
collinear and the usual least squares estimates of β
appear to be unstable.
 The ridge regression estimates of b are then given by

 Suitable λ is chosen from ridge trace plot


 Example – Longley (page 121)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Ridge regression estimates of coefficients are biased.
Bias is undesirable but there are other considerations.
 MSE decomposition:
E ( βˆˆˆˆ
− β ) 2 = ( E (β − β )) 2 + E (β − Eβ ) 2
=bias 2 +variance
 Sometimes a large reduction in the variance may
obtained at the price of an increase in the bias. If the
MSE is reduced as a consequence then we may be
willing to accept some bias. This is the trade-off that
Ridge Regression makes - a reduction in variance at
the price of an increase in bias.
 This is a common dilemma.
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
Chapter 10
Variable Selection
 Select the best subset of predictors
 Prior to variable selection:
• Identify outliers and influential points - maybe exclude them
at least temporarily.
• Add in any transformations of the variables that seem
appropriate.
 Hierarchical Models
 When selecting variables, it is important to respect the
hierarchy. Lower order terms should not be removed
from the model before higher order terms in the same
variable.
 There two common situations where this situation
arises:

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• Polynomials models.
• Models with interactions.
 Stepwise Procedures
 Backward Elimination
 Forward Elimination
 Stepwise Regression
 Example – statedata (page 126)
 Criterion-based Procedures
 The Akaike Information Criterion (AIC)
AIC=-2log (likelihood) +2p

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


and the Bayes Information Criterion (BIC)
BIC= -2log –likelihood +p log n
(-2log-lokelihood=nlog(RSS/n))
Example – statedata
 Adjusted R2 = R2a
R2=1-RSS/TSS
RSS /(n −p ) n− 1 
R = 1−
2
a = 1−  −
(1
 R 2
)
TSS /(n −1) n − p 
σˆ mod
2
=1 − 2 el
σˆ null

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Predicted Residual Sum of Squares (PRESS)
PRESS = ∑i εˆ(2i) ,
ε (i) are the resideuals calculated
without using case i in the fit
 Mallow’s Cp Statistic
RSS p
Cp = + −
p 2n ,
σ
ˆ 2

Where σ̂ is from the model with all predictors and RSSp


2

indicates the RSS from a model with p parameters.


 Example – statedata (page 130)
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
Chapter 11
Statistical Strategy and Model
Uncertainty
 Strategy
 Procedure:
Diagnostics  Transformation
 Variable Selection  Diagnostics
There is a danger of doing too much analysis. The
more transformations and permutations of leaving
out influential points you do, the better fitting model
you will find. Torture the data long enough, and
sooner or later it will confess.
Fitting the data well is no guarantee of good predictive
performance or that the model is a good
representation of the underlying population.
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
You should:
1. Avoid complex models for small datasets.
2. Try to obtain new data to validate your proposed
model. Some people set aside some of their existing
data for this purpose.
3. Use past experience with similar data to guide the
choice of model.
 Model multiplicity: The same data may support different
models. Conclusions drawn from the models may differ
quantitatively and qualitatively.
See further from author’s experiment and discussion
(page 135-137)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 12
A Complete Example

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 13
Robust and Resistant Regression
 Errors ~Normal  LSE
 Error ~ long-tailed
 LSE with outlier removed through outlier tests
 Least Trimmed Squares(LTS)
• Resistant regression

• ltsreg() in library lqs (see example -- Chicago on page 152)

 Robust regression. e.g. M-estimate:

 Possible choices of ρ

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• ρ (x)=x2 -- least squares regression
• ρ (x)=|x| -- least absolute deviations regression (LAD)
• Huber’s method – a compromise between LS and LAD

• rlm() in the library of MASS


• Example – Chicago insurance data (page 151)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 14
Missing Data

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


Chapter 15
Analysis of Covariance
 An example – Regression problem with a mixture of
quantitative and qualitative predicators
 Purpose: to see the effect of a medication on cholesterol
level
 Two groups:
• 1) Treatment: receive the medication, age 50 ~ 70
• 2) Control: not receive the medication, age 30 ~ 50
 Figure 15 shows that the mean reduction in cholesterol
was 0% and 10% respectively.
Better not to be treated?
 Not a two sample problem as the two groups differ w.r.t.
age
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 Take age into account (age 50) the difference
between the two groups is again 10%, but in favor of
the treatment!
 Analysis of covariance: adjust the groups for the age (a
covariate) difference and exam the effect of the
medication
 Method: code the qualitative predicator(s) (covariate(s))
and incorporate within the y=Xβ +ε framework.
• y – change in cholesterol level
• x –age
• d – 0 (or -1) for the control group and 1 for the treatment
group (See 15.2 for coding qualitative predicators, p164)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Models:
• y=β 0+β 1x+ε (in R y~x)
• y=β 0+β 1x+β 2d+ε (in R y ~ x+d)
• y=β 0+β 1x+β 2d+β 3x.d+ε (in R y ~ x*d)
 A two-level example (page 161)
(English medieval cathedrals)
 Variables
• X – nave height
• Y – total length
• Style – r: Romanesque, g: Gotheic
Note that some have parts in both styles
 Purpose: to see how the length is related to height for the
two styles

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Coding qualitative predicators
 For a k-level predictor, k-1 dummy variables are needed for the
representation. One parameter is used to represent the overall
mean effect or perhaps the mean of some reference level and so
only k.
 Treatment coding
A 4 level factor will be coded using 3 dummy variables:
Dummy coding
1 2 3
1 0 0 0
levels 2 1 0 0
3 0 1 0
4 0 0 1

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• The first one is the reference level to which other levels are
compared.
• R assigns levels to a factor in alphabetical order by default.
• The columns are orthogonal and the corresponding
dummies will be too. The dummies won’t be orthogonal to
the intercept.
• Treatment coding is the default choice for R
 Helmert Coding (page 165)
• interpretation. It is the default choice in S-PLUS.
 The choice of coding does not affect the R2, σˆ ,the 2

overall F-statistic. It does affect β̂ and you do need to


 know what the coding is before making conclusions
about β̂

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 A three-level example (page 165)
 The data:
• IQ scores for identical twins, one raised by foster parents,
the other by the natural parents.
• social class of natural parents (high, middle or low).
 Purpose: Predict the IQ of the twin with foster parents
from the IQ of the twin with the natural parents and the
social class of natural parents.
 Seperate lines model
• > g <- lm(Foster ~ Biological*Social, twins)
• > summary(g)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


• The reference level is high class, being first alphabetically.
• We see that the intercept for low class line would be
-1.872+9.0767 while the slope for the middle class line
would be 0.9776-0.005.
 Can the model be simplified to the parallel lines
model?
• > gr <- lm(Foster ˜ Biological+Social, twins)
• > anova(gr,g)
 Further reduction to a single line model
• > gr <- lm(Foster ˜ Biological, twins)
Conclusion : Single line model is accepted.
(p-value=0.59423  the null is not rejected)
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
Chapter 16
ANOVA
 Introduction
 Predictors are now all categorical/ qualitative.
 ANOVA is used to partition the overall variance in the
response to that due to each of the factors and the error.
 Predictors are now typically called factors which have
some number of levels.
 The parameters are now often called effects.
 Two kinds of models
• fixed-effects models-- parameters are considered fixed
• random-effects models -- parameters are random

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 One-Way Anova
 The model
 Estimation and testing
• Estimate the effects
• Test if there is a difference in the levels of the factor.
 Example (page 169)
• Response: blood coagulation times
• Factor: diets
• Sample size: 24 (animals)
 Diagnosis (page 171)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Multiple Comparisons( 多重比较 ) (page 172)
• Bonferroni (for few comparisons it is good but it will become
conservative if there many comparisons)
• Fisher’s LSD (after overall F-test shows a difference)
• Turkey’s Honest Significant Difference (HSD) (fall all
pairwise comparisons)
 Contrasts
 Scheffe’s theorem for multiple comparisons
 Testing for homogeneity of variance– Levene test

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Two-Way Anova
 One observation per cell
 More than one observation per cell
 Interpreting the interaction effect
Example – rats (pages 181-184)
• 48 rats were allocated to 3 poisons (I,II,III) and 4 treatments
(A,B,C,D).
• The response was survival time in tens of hours.
• 4 replicates in each cell
• The reference level is I for poison and A for treat!

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Blocking designs (page 185)

 Latin Squares (191)

 Blanced Incomplete Block design (195)

 Factorial experiments (200)

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


后记
计算与统计分析软件
 符号运算软件
 Maple v9.0
 Mathematica (v5.0)
 Symbolic toolbox for Matlab (v7.0)
 Scientific workplace (SWP)
 Maxima (from Macsyma)
 MuPAD
 Reduce
 ……

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 数值计算软件
 Matlab
 Scilab
 统计分析软件

•SPSS (v12) •S-plus(v6)


•SAS (v9.1) •R (v9)
•BMDP(v7) •Minitab (v14)
•Statistica(v6) •Systat(v10
•Stat a(v8) •JMP (v5)
•Gauss •MacAnova

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )


 Bayes 统计分析
 WinBus (v1.4)( 包括 ARS)
 CODA (v0.5-1 for R)
 JAGS (v0.50)
 计量经济学数据分析软件
 Tsp (v6.5)
 Eviews (v4)
 Istm2000
 TSA
 OX/OXMatrix packages
• PcGets, PcGive, Tsp, ARFIMA, GARCH, MSVAR
http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )
 数据处理软件的综合比较

Comparison of mathematical
programs for data analysis
(Edition 4.4)

by Stefan Steinhaus
(stefan@steinhaus-net.de)
http://www.scientificweb.com/ncrunch/

http://my.mofile.com/tangyc886 R 语言与统计分析 – 华东师范大学 (2008 年 9 月 )