You are on page 1of 51

ACTL2002/ACTL5101 Probability and Statistics: Week 12

ACTL2002/ACTL5101 Probability and Statistics
c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au

Week 12
Week 2
Week 3
Week 4
Probability:
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1

Week 5 VL

ACTL2002/ACTL5101 Probability and Statistics: Week 12

First nine weeks
Introduction to probability;
Moments: (non)-central moments, mean, variance (standard
deviation), skewness & kurtosis;
Special univariate (parametric) distributions (discrete &
continue);
Joint distributions;
Convergence; with applications LLN & CLT;
Estimators (MME, MLE, and Bayesian);
Evaluation of estimators;
Interval estimation.
3301/3343

ACTL2002/ACTL5101 Probability and Statistics: Week 12

Last two weeks
Simple linear regression:
-

Idea;
Estimating using LSE (& BLUE estimator & relation MLE);
Partition of variability of the variable;
Testing:
i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.

Multiple linear regression:
3302/3343

Matrix notation;
LSE estimates;
Tests;
R-squared and adjusted R-squared.

ACTL2002/ACTL5101 Probability and Statistics: Week 12
Modelling assumptions in linear regression
Confounding effects

Modelling with Linear Regression

Modelling assumptions in linear regression
Confounding effects
Collinearity
Heteroscedasticity

Special explanatory variables
Interaction of explanatory variables
Categorial explanatory variables

Model selection
Reduction of number of explanatory variables
Model validation

ACTL2002/ACTL5101 Probability and Statistics: Week 12
Modelling assumptions in linear regression
Confounding effects

Confounding effects
Linear regression measures the effect of explanatory variables
X1 , . . . , Xn on the dependent variable Y .
The assumptions are:
Effects of the covariates (explanatory variables) must be
additive;
Homoskedastic (constant) variance;
Errors must be independent of the explanatory variables with
mean zero (weak assumptions);
Errors must be Normally distributed, and hence, symmetric
(strong assumptions).

But what about confounding variables?
Correlation does not imply causality!
3303/3343

3304/3343 .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Confounding effects Confounding effects C is a confounder of the relation between X and Y if: C influences X and C influences Y . but X does not influence Y (directly).

thus age can be a proxy for experience. The predictor variable has no direct influence on dependent variable. but action taken on the predictor itself will have no effect. Example: Age ⇒ Experience ⇒ Probability of car accident. Example: Becoming older does not make you a better driver.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Confounding effects How to correctly use/don’t use confounding variables? If confounding variable is observable: add the confounding variable. a predictor variable works as a predictor. 3305/3343 Hence. If confounding variable is unobservable: be careful with interpretation. Experience can not be measured. The predictor variable has an indirect influence on dependent variable. .

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Collinearity Modelling with Linear Regression Modelling assumptions in linear regression Confounding effects Collinearity Heteroscedasticity Special explanatory variables Interaction of explanatory variables Categorial explanatory variables Model selection Reduction of number of explanatory variables Model validation .

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Collinearity Collinearity Multicollinearity occurs when one explanatory variable is a (nearly) linear combination of the other explanatory variables. Example: perfect fit for: y = −87 + x1 + 18x2 . this variable is redundant. . βe1 = β1 − c/2 and βe0 = β0 − 5c. but also for y = −7 + 9x1 + 2x2 : i 1 2 3 4 yi 23 83 63 103 xi1 2 8 6 10 xi2 6 9 8 10 3306/3343 Note that x2 = 5 + x1 /2. thus βe2 = β2 + c. If explanatory variable is collinear. it provides no/little additional information.

often the only option is to remove one or more variables from the regression equation. nor predictions. leading to small t-ratio. Estimates of error variance. 3307/3343 When severe collinearity exists. Standard errors of individual regression coefficients are higher. iii) Calculate Variance Inflation Factor: VIFj = (1 − Rj2 )−1 . If large (> 10). Detecting collinearity: i) Regress xj on the other explanatory variables ii) Determine coefficient of determination R 2 . severe collinearity exists. .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Collinearity Collinearity Collinearity: Does not influence fit. thus also model adequacy. are still reliable.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity Modelling with Linear Regression Modelling assumptions in linear regression Confounding effects Collinearity Heteroscedasticity Special explanatory variables Interaction of explanatory variables Categorial explanatory variables Model selection Reduction of number of explanatory variables Model validation .

3308/3343 . Confidence intervals and hypothesis tests depends on homoscedastic residuals. even in presence of heteroscedasticity. if the variance of the residuals are different for different observations. then we have heteroscedasticity. Least squares estimator is unbiased. Least squares estimator might not be the optimal estimator.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity Heteroscedasticity We have assumed homoscedastic residuals.

.5 10 x2i 20 −1 0 2 4 y*i =log(yi) Solution: Use a transformation for Y .5 εi εi * 100 0 −100 0 2 0 −0.e. yi? = log(yi ). i.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity Graphical: plot the estimated residuals against the endogenous variable: 100 100 ε i 200 εi 200 0 −100 0 0 100 200 yi −100 0 300 4 x1i 200 1 0. 3309/3343 6 .

99 (2) = 9.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity 20 20 10 10 εi εi Graphical: plot the estimated residuals against the explanatory variables (LM = SSM/2 = 34.21 and χ20. χ20.95 (2) = 5.99): 0 −10 0 0 200 yi −10 0 400 50 100 x1i 20 20 2i x εi 15 10 0 −10 0 3310/3343 10 5 10 x2i 20 0 0 50 100 x1i .

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity Detecting heteroscedasticity F-test (using two groups of data). White (1980)-test Bruesch and Pagan (1980) test: Test H0 : homoscedastic residuals v. where z i is a know vector of variables and γ is a p-dimensional vector of parameters. Fit the regression model and determine the residuals εi . Test statistic: LM = SSM bi − ε?2 i=1 ε 2 . 3. 2 Pn  ?2 4. Fit a regression model of ε?2 i on z i (can be all Xi ). . Calculate the squared standardized residuals ε?2 i = εi /s . Test procedure: 1.s. 2 2 2. where SSM = 3311/3343 5. H1 : Var (yi ) = σ 2 + z > i γ. Reject H0 if LM > χ21−α (p).

Heteroscedasticity: Ω−1 6= In h i E > = σ 2 Ω Ω−1 = P> P. . Find the OLS estimates of the regression model which is pre-multiplied by P: Py = PXβ + P 3312/3343 ⇒ e + e ye = Xβ Find P using (for example) the Cholesky decomposition. you know how much variation there should be in the residual for each observation. Example: Difference in exposure of risk. Application: Mortality modeling (exposures-ages). proportion claiming (changing portfolio sizes).ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity GLS & WLS (OPTIONAL) Sometimes.

Hence. we have the Generalized Least Squares estimator:  −1 e > ye b= X e >X e X β  −1 = X> P> PX P> X> Py  −1 = X> Ω−1 X X> Ω−1 y  −1 b =σ 2 X e >X e Var (β)  −1 =σ 2 X> Ω−1 X 3313/3343 .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity GLS & WLS (OPTIONAL) Note that we can apply OLS on the pre-multiplied by P model: h i i h i h E ee> = E P> P> = PE > P> = Pσ 2 ΩP> = σ 2 In .

√ 1/ ωn 0 0 1/ωn Only applicable when you know the relative variances ω1 . .. . 0 0 ωn    √ 1/ω1 0 0 1/ ω1 √  0  1/ω 0  2 1/ ω2   −1 >  Ω = ⇒ P= =P P . .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity GLS & WLS (OPTIONAL) √ Weighted least squares. ωn .  ···  . .. . each variable is weighted by 1/ ωi :   ω1 0 0  0 ω2 h i 0    E > =σ 2 Ω = σ 2   .. .. . 3314/3343 Can we also estimate the variance-covariance matrix Ω?   . . .   ··· . .  . .

but estimates it from the data. . Estimate the regression using OLS. 3315/3343 . Regress the squared residuals on explanatory variables. Use WLS with weights ω bi to find the EGLS/FGLS estimate. . . 2.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Modelling assumptions in linear regression Heteroscedasticity EGLS/FGLS (OPTIONAL) Feasible GLS or Estimated GLS does not impose the structure of heteroskedasticity. Determine the expected squared residuals ⇒ω bi ⇒ Ω = diag(b ω1 . . 3. ω bn ). 4. Estimation procedure: 1.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Interaction of explanatory variables Modelling with Linear Regression Modelling assumptions in linear regression Confounding effects Collinearity Heteroscedasticity Special explanatory variables Interaction of explanatory variables Categorial explanatory variables Model selection Reduction of number of explanatory variables Model validation .

3316/3343 . where β1 and β2 are main effects. there is no correlation between interaction term and main effects. The marginal effect of X1 |X2 = x2 is ∂yi /∂x1 = β1 + β3 x2 . Example: Liquidity of stocks can be explained by: Price. Volume.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Interaction of explanatory variables Interaction of explanatory variables In linear regression covariates are additive. if the explanatory variables are centered. Example: yi = β0 + β1 · X1 + β2 · X2 + β3 · X1 X2 + εi . For symmetric distribution. Multiplicative relations are non-additive. Thus center variables to reduce collinearity issues. and Value (=Price×Volume). Note that interaction terms might lead to high collinearity. β3 is the interaction effect.

g. Moderating effects are equivalent to interaction effects.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Interaction of explanatory variables Interaction of random variables A moderator variable is a predictor (e. Y ). | {z } slope depends on X1 Always include the marginal effects in the regression.g. Rearranging we get: yi = (β + β · X ) | 0 {z1 1} intercept depends on X1 + (β2 + β3 · X1 ) · X2 + εi .g. X1 ) that interacts with another predictor (e. X2 ) in explaining variance in a predicted variable (e. 3317/3343 ....

90 6.97 0.55 1.53 1.83 1.86 0.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Interaction of explanatory variables Example: interaction of random variables Regression output (see excel file): intercept X1 X2 X1 · X2 Main effects Coef t − stat −0.51 0.73 5.18 0.57 Excluding X2 Coef t − stat −0.8 0.08 2.11 6.92 0.94 3.30 1.07 .12 6.86 0.8 y e x1 e x2 centered e e x1 x2 e x1 · e x2 0.88 4.11 6.51 0.61 −1.71 Full model Coef t − stat −0.57 Correlations (where e xi = e xi − E[Xi ]): y x1 x2 3318/3343 non-centered x1 x2 x1 · x2 0.92 0.29 0.24 0.07 0.46 −1.46 0.83 Centered Coef t − stat 1.82 0.02 1.83 1.71 −0.08 3.82 0.01 1.

5 0 0. explanatory variables εi 2 0 −2 −1 −0.5 3 3.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Interaction of explanatory variables Always include marginal effects (heteroscedastic variance): Scatter plot residuals v.s.5 1 1.5 2 2.5 x1i variance of εi decreasing function of x1i⋅ x2i εi 2 0 −2 0 3319/3343 2 4 6 x1i⋅ x2i 8 10 12 14 .

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Modelling with Linear Regression Modelling assumptions in linear regression Confounding effects Collinearity Heteroscedasticity Special explanatory variables Interaction of explanatory variables Categorial explanatory variables Model selection Reduction of number of explanatory variables Model validation .

57) .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Binary explanatory variables Categorical variables provide a numerical label for measurements of observations that fall in distinct groups or categories. female=0). number of years education (1 if more than 12 years.05) (0. Regression (see next slide): LnFace i \i LnFace 3320/3343 = β0 + β1 LnIncomei + β2 Singlei + εi = −0.42+ 1.16) (0. 0 otherwise). Example: Gender (male=1.12 LnIncomei − 0. Binary variables is a variable which can only take two values. namely zero and one.51 Singlei + εi (0.56) (0.

. 15 14 Non−singles Singles 13 LnFace 12 11 10 9 8 7 6 6 3321/3343 7 8 9 10 LnIncome 11 12 13 .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Binary explanatory variables Interpretation coefficients: .β0 : Intercept for non-singles. ..β2 : Difference in intercept singles. non-singles i.β1 : Marginal effect of LnIncome. β0 + β2 is the intercept for singles.e.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 15 14 Non−singles Singles 13 LnFace 12 11 10 9 8 7 6 6 3322/3343 7 8 9 10 LnIncome 11 12 13 .

5 0 −0.5 −1 −1.5 6 3323/3343 7 8 9 10 LnIncome 11 12 13 .5 Non−singles Singles 1 Residual 0.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 1.

14) εi εi (0.. β0 + β2 is the intercept for singles.07 LnIi − 3. β1 + β3 is the marginal effect of LnIncome for singles. non-singles i. 3324/3343 .27 Si × LnIi + (0.e.β2 : Difference in intercept singles. . non-singles i.β0 : Intercept for non-singles.β3 : Difference in marginal effect of LnIncome singles.28 (0.11+ 1. ..61) (0.41) Si + Si + β3 Si × LnIi + 0. .56) Interpretation coefficients: .06) (1.e.β1 : Marginal effect of LnIncome.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Binary explanatory variables Regression (see next slide): LnFace i \i LnFace = = β0 + β1 LnIi + β2 0.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 15 14 Non−singles Singles 13 LnFace 12 11 10 9 8 7 6 6 3325/3343 7 8 9 10 LnIncome 11 12 13 .

5 −1 −1.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 1.5 0 −0.5 6 3326/3343 7 8 9 10 LnIncome 11 12 13 .5 Non−singles Singles 1 Residual 0.

Question: Why are you here? Regression (see next slide): HW i di HW = β0 + β1 YEi + εi = −406+ 33 YEi + εi (8.97) (0. .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Binary explanatory variables Now consider the example where we try to explain the hourly wage of individuals in a random sample.65) (12) Solution: To earn more money later on? 3327/3343 Question: Explain the extend of correlation and causality in this case. As explanatory variable we have years of education.

5 14 14.5 years edu 15 15.5 16 .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 150 Hour wage 100 50 0 12 3328/3343 12.5 13 13.

5 years edu 15 15.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 20 15 10 Residual 5 0 −5 −10 −15 −20 12 3329/3343 12.5 13 13.5 16 .5 14 14.

Interpretation coefficients: .β2 : Difference in marginal effect of years education before and after 14 years i.7 (3..47) (2.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Binary explanatory variables Regression (see next slide): HW i di HW = β0 + β1 = −151+ 13. β1 + β2 is the marginal effect of years education for years >14. . .37) where Di = 1 if YEi > 14 and zero otherwise. 3330/3343 .β0 : Intercept for hourly wage.30) YEi + YEi + β2 (YEi − 14) × Di + εi 35 (YEi − 14) × Di + εi (0.e.91) (0.β1 : Marginal effect of years education before 14 years.

5 13 13.5 years edu 15 15.5 16 .5 14 14.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 150 Hour wage 100 50 0 12 3331/3343 12.

5 14 14.5 13 13.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables 6 4 Residual 2 0 −2 −4 −6 12 3332/3343 12.5 years edu 15 15.5 16 .

if highest degree is high school. Define: Ci = 2.7) . university. if highest degree is PhD. Regression (see next slide): YI i ci YI 3333/3343 = β0 + β1 Ci + εi = 33.91) (15. PhD.    3. As explanatory variable we have highest degree: high school. if highest degree is university degree.0+ 8. if highest degree is college.00 Ci + εi (1.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Categorial explanatory variables Now consider the example where we try to explain the hourly wage of individuals in a random sample. college.36) (0.    1.  0.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Categorial 50 90 40 80 30 70 20 60 10 Residual Yearly income Empirical 100 50 40 0 −10 30 −20 20 −30 10 −40 0 3334/3343 0 1 2 Edu level 3 −50 0 1 2 Edu level 3 .

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Dummies 50 90 40 80 30 70 20 60 10 Residual Yearly income Empirical 100 50 40 0 −10 30 −20 20 −30 10 −40 0 3335/3343 0 1 2 Edu level 3 −50 0 1 2 Edu level 3 .

i + εi = 33.50 D3.i + β3 D3.5) Interpretation: - 3336/3343 β0 : β1 : β2 : β3 : Average Average Average Average income of additional additional additional high school. income of university relative to HS.02) (4. Conclusion: Only use categorical variables if the marginal effects of all categories are equal! .32 D1. income of PhD relative to HS.i + εi (1.53) (2. income of college relative to HS.i + 14.4+ 5.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Special explanatory variables Categorial explanatory variables Comparing categorial and dummy explanatory variables Change the categorical variable in dummy variables: Regression (see previous slide): YI i ci YI = β0 + β1 D1.05) (15.07 D2.i + β2 D2.i + 18.12) (2.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Reduction of number of explanatory variables Modelling with Linear Regression Modelling assumptions in linear regression Confounding effects Collinearity Heteroscedasticity Special explanatory variables Interaction of explanatory variables Categorial explanatory variables Model selection Reduction of number of explanatory variables Model validation .

How to decide what explanatory variables to include? 3337/3343 .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Reduction of number of explanatory variables Reduction of number of explanatory variables There are many possible combinations of explanatory variables: number of X 1 2 3 4 5 combinations 1 3 7 15 31 number of X 6 7 8 9 10 combinations 63 127 255 511 1023 You do not want to check them all.

For each of the regressions calculate the t-ratio. 3338/3343 . The variable to enter is the one that makes the largest significant contribution. Select the explanatory variable with the highest absolute t-ratio (if larger than CV). (iii) Delete a variable to the model from the previous step. (ii) Add a variable to the model from the previous step. The t-ratio must be above the CV. (iv) Repeat steps (ii) and (iii) until all possible additions and deletions are performed.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Reduction of number of explanatory variables Stepwise regression algorithm (i) Consider all possible regressions using one explanatory variable. The variable to be removed is the one that makes the smallest contribution. The t-ratio must be below the CV.

The algorithm does not take into account the joint effect of explanatory variables. namely t-ratio and does not consider other criteria such as s.Purely automatic procedures may not take into account an investigator’s special knowledge. R 2 . . . 3339/3343 .No guarantee that the selected model is the best.The algorithm use one criterion.Procedure “snoops” through a large number of candidate models and may fit the data “too well”. . . . and so on (s will decrease if the absolute value of the t-ratio is larger than one). Ra2 .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Reduction of number of explanatory variables Stepwise regression algorithm + Useful algorithm that quickly search trough a number of candidate models.

ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Model validation Modelling with Linear Regression Modelling assumptions in linear regression Confounding effects Collinearity Heteroscedasticity Special explanatory variables Interaction of explanatory variables Categorial explanatory variables Model selection Reduction of number of explanatory variables Model validation .

R-squared.(Optional:) Information Criterions ⇒ select the one with the lowest AIC= 2k − 2`. How to compare two regression models with the unequal number of explanatory variables: .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Model validation Comparing models How to compare two regression models with the same number of explanatory variables: . . .Adjusted R-squared. . . .The variability of the residual (s).F -statistic.The variability of the residual (s). .Likelihood ratio test: −2 · (`p − `p+q ) ∼ χ2q 3340/3343 Reject that the model with p + q variables as good as the model with p variables if −2 · (`p − `p+q ) > χ21−α (q).

. . n1 (iii) Using the model in step (ii) predict ybi in the validation subsample.ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Model validation Out-of-Sample Validation Procedure (OPTIONAL) (i) Begin with a sample of size n. (iv) Assess the proximity of the predictions to the held-out data. One measure is the sum of squared prediction errors: SSPE = n X i=n1 +1 3341/3343 (yi − ybi )2 . (ii) Using the model development subsample. . . divide into two subsamples with size n1 and n2 . fit a candidate model to the data set i = 1.

define: PRESS = n X yi − yb(i) 2 i= Note: one can rewrite it into computational less incentive procedure! εi yi − yb(i) = 1 − x i · (X> X)−1 x i 3342/3343 . . . (ii) Use the regression coefficients in (i) to compute the predicted b(i) .ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Model validation PRESS Validation Procedure (OPTIONAL) Predicted residual sum squares: (i) For the full sample. . n. which is Y (iii) Repeat step (i) and (ii) for i = 1. . omit the i th point and use the remaining n − 1 observations to compute regression coefficients. response for the i th point.

2. use prior knowledge of relationships (economic theory. hypothesize a model. 1. formulate an improved model. Based on assumptions in the model. Examine the data graphically. compare data to a candidate model. Diagnostic checks (data and model criticism) data and model must be consistent with one other before inferences can be made. . 3343/3343 3. must be consistent with the data. industry practice).ACTL2002/ACTL5101 Probability and Statistics: Week 12 Model selection Model validation Data analysis and modeling Box (1980): examine data.