1

Introduction to Data Analisys with Stata
Sara Godoy.

+

Nonparametric Analysis

+

Non-Parametric tests: Summary
NATURE OF DEPENDENT VBL. ONE-SAMPLE TWO-SAMPLE K-SAMPLE

RELATED/
MATCHED

INDEPENDENT

INDEPENDENT

CATEGORICAL/
NOMINAL

Binomial test

McNemar test

Fisher s exact test WilconxonMann Whitney test

Chi-square test

ORDINAL/
INTERVAL

KolmogorovSmirnov onesample test

Wilcoxon signed ranks test

Kruskal Wallis test

+

Non-parametric correlation
A Spearman correlation is used when one or both of the variables are not assumed to be normally distributed and interval (but are assumed to be ordinal). The values of the variables are converted in ranks and then correlated. !  Syntax: spearman [varlist] [if] ,[options]
!

spearman read write Number of obs = 200 Spearman's rho = 0.6167 Test of Ho: read and write are independent Prob > |t| = 0.0000 The results suggest that the relationship between read and write (rho = 0.6167, p = 0.000) is statistically significant.

+

P-values meaning
A p-value is a measure of how much evidence we have against the null hypothesis (H0) !  The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
!  !

One often "rejects the null hypothesis" when the p-value is less than the significance level:
!  !  !

p <0.1 (10%) p<0.05 (5%) P <0.01 (1%)

!

When the null hypothesis is rejected, the result is said to be statistically significant.

+

Binomial probability test
Test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value (small samples) !  Syntax: bitest varname == #p
!

bitest female=.5

The results indicate that there is no statistically significant difference (p = .2292).! In other words, the proportion of females does not significantly differ from the hypothesized value of 50%.

+ One- and two-sample tests of proportions
performs tests on the equality of proportions using large-sample statistics. !  Syntax
!  prtest

One-sample test of proportion: tests that varname has a proportion of #p. prtest varname == #p
!

Two-sample test of proportion: tests that varname1 and varname2 have the same proportion prtest varname1 == varname2
!

+ One- and two-sample tests of proportions
!

Example 1: One-sample test of proportion

Assume that we have a sample of 74 automobiles. We wish to test whether the proportion of automobiles that are foreign is different from 40%. prtest foreign == .4

The test indicates that we cannot reject the hypothesis that the proportion of foreign automobiles is 0.40 at the 5% significance level.

+ One- and two-sample tests of proportions
!

Example 2: Two-sample test of proportion

We have two headache remedies that we give to patients. Each remedy s effect is recorded as 0 for failing to relieve the headache and 1 for relieving the headache. We wish to test the equality of the proportion of people relieved by the two treatments. prtest cure1 == cure2o-sample test of proportion

We find that the proportions are statistically different from each other at any level greater than 3.9%.

+ Kolmogorov-Smirnov one and two-samples
test
!

!

ksmirnov performs one-sample Kolmogorov ‒ Smirnov tests of the equality of distributions. In the first syntax, varname is the variable whose distribution is being tested, and exp must evaluate to the corresponding (theoretical) cumulative. Syntax: ksmirnov varname = exp

Example : One-sample test Let s now test whether x in the example above is distributed normally. Kolmogorov‒Smirnov is not a particularly powerful test in testing for normality, and we do not endorse such use of it; In any case, we will test against a normal distribution with the same mean and standard deviation ksmirnov x = normal((x-r(mean)/r(sd))
!

The largest difference between the distribution functions is 0.  Example : One-sample test summarize x 2. which is not significant. corrected to 0.735. + Kolmogorov-Smirnov one and two-samples test !  Example : two-sample test The first line tests the hypothesis that x for group 1 contains smaller values than for group 2. The approximate p-value for this is 0.5.909.571429)/3.1667. the approximate p-value for the combined test is 0. The largest difference between the distribution functions in this direction is 0. The second line tests the hypothesis that x for group 1 contains larger values than for group 2.  ksmirnov x = normal((x-4. . Finally.457222) The results indicate that the data cannot be distinguished from normally distributed data.424. The approximate p-value for this small difference is 0.785.+ Kolmogorov-Smirnov one and two-samples test !  1.

and 6 answered Q2 correctly and Q1 incorrectly. Suppose 172 students answered both questions correctly. !  Example: !  !  !  !  !  Consider two questions. These counts can be considered in a two-way contingency table. from a test taken by 200 students. The null hypothesis is that the two questions are answered correctly or incorrectly at the same rate (or that the contingency table is symmetric). a command from Stata's epidemiology tables. + McNemar test McNemar's chi-square statistic suggests that there is not a statistically significant difference in the proportions of correct/incorrect answers to these two questions. The outcome is labeled according to casecontrol study conventions. !  These binary outcomes may be the same outcome variable on matched pairs (like a case-control study) or two outcome variables from a single group. 7 answered Q1 correctly and Q2 incorrectly. . Q1 and Q2. 15 students answered both questions incorrectly.+ McNemar test You would perform McNemar's test if you were interested in the marginal frequencies of two binary outcomes. We can enter these counts into Stata using mcci.

we will use the same variables in this example and assume that this difference is not ordinal. we would use the two-sided test and conclude that no statistically significant difference was found (p=. Assuming that we were looking for any difference. but we will not assume that the difference between read and write is interval and normally distributed. This output gives both of the onesided tests as well as the twosided test. You use the Wilcoxon signed rank sum test when you do not wish to assume that the difference between the two variables is interval and normally distributed (but you do assume the difference is ordinal). We will use the same example as above. Again. . then you may want to consider a sign test in lieu of sign rank test. + Wilcoxon signed ranks test !  If you believe the differences between read and write were not ordinal but could merely be classified as positive and negative.+ Wilcoxon signed ranks test !  The Wilcoxon signed rank sum test is the non-parametric version of a paired samples ttest.5565). The results suggest that there is not a statistically significant difference between read and write.

we wish to test whether the mean for write is the same for males and females. we have cells with observed frequencies of two and one.0002). but the Fisher's exact test has no such assumption and can be used regardless of how small the expected frequency is. but one or more of your cells has an expected frequency of five or less.99) than males (50. The results indicate that there is a statistically significant difference between the mean writing score for males and females (t = -3.12). so we will use Fisher's exact test with the exact option on the tabulate command.+ Fisher exact test !  The Fisher's exact test is used when you want to conduct a chi-square test.7341.597). Note that the Fisher's exact test does not have a "test statistic". females have a statistically significantly higher mean score on writing (54. In other words. For example. In the example below. . These results suggest that there is not a statistically significant relationship between race and type of school (p = 0. which may indicate expected frequencies that could be below five. p = . but computes the p-value directly. Remember that the chisquare test assumes that each cell has an expected frequency of five or more. + Two independent samples t-test !  An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups.

is normally distributed.0009). Remember that the chi-square test assumes the expected value of each cell is five or higher. You will notice that the Stata syntax for the Wilcoxon-Mann-Whitney test is almost identical to that of the independent samples t-test the same variables in this example as we did in the independent t-test example above and will not assume that write. These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom = 0. !  !  . p = 0. In Stata.329. the chi2 option is used with the tabulate command to obtain the test statistic and its associated p-value Example: let's see if there is a relationship between the type of school attended (schtyp) and students' gender (female). !  + Chi-square test !  A chi-square test is used when you want to see if there is a relationship between two categorical variables.828). our dependent variable. The results suggest that there is a statistically significant difference between the underlying distributions of the write scores of males and the write scores of females (z = -3.+ Wilcoxon-Mann Whitney test !  The Wilcoxon-Mann-Whitney test is a non-parametric analog to the independent samples t-test and can be used when you do not assume that the dependent variable is a normally distributed interval variable (you only assume that the variable is at least ordinal).0470. Thus the female group had higher rank. The sum of the female ranks was higher while the sum of the male ranks was lower. p = 0. You can determine which group has the higher rank by looking at the how the actual rank sums compare to the expected rank sums under the null hypothesis.

The point of this example is that one (or both) variables may have more than two levels. then a correction factor is used. this time looking at the relationship between gender (female) and socio-economic status (ses). and that the variables do not have to have the same number of levels.+ Chi-square test !  Let's look at another example. the results indicate that there is a statistically significant difference among the three type of programs. With or without ties. p = 0. . female has two levels (male and female) and ses has three levels (low. Again we find that there is no statistically significant relationship between the variables (chi-square with two degrees of freedom = 4.101). + Kruskal-Wallis !  The Kruskal Wallis test is used when you have one independent variable with two or more levels and an ordinal dependent variable. yielding a slightly different value of chi-squared. medium and high). If some of the scores receive tied ranks. In this example.5765.

4. Before running a regression it is recommended to have a clear idea of what you are trying to estimate (i.+ Linear Regression + Regression: A practical approach !  We use regression to estimate the unknown effect of changing one variable over another. which are your outcome and predictor variables). Technically.  !  !  Examine descriptive statistics Look at relationship graphically and test correlation(s) Run and interpret regression Test regression assumptions .  3.e.  2. Previous Steps 1. linear regression estimates how much Y changes when X changes one unit.

S. Click here to download the data or type: . Use the file states.dta (educational data for the U.) + Regression: Check the variables describe csat expense percent income high college region summarize csat expense percent income high college region . variable csat in dataset !  Predictor (X) variables !  !  !  !  !  !  Per pupil expenditures primary & secondary (expense) % HS graduates taking SAT (percent) Median household income (income) % adults with H Sdiploma (high) % adults with college degree (college) Region(region) * Source: Data and examples come from the book Statistics with Stata (updated for version 9) by Lawrence C. Hamilton (chapter 6).+ An example: SAT Expenditures !  scores and Education Are SAT scores higher in states that spend more money on education controlling by other factors?* !  Outcome (Y) variable: SAT scores.

+ Regression: View relationship graphically twoway scatter !expense !scat 1100 Relationship Between Education Expenditure and SAT Scores 800 2000 Mean composite SAT score 900 1000 4000 6000 8000 Per pupil expenditures prim&sec 10000 + Regression: View relationship graphically twoway (scatter ! !scat ! !expense) !(lfit scat ! expense) .

+ Regression: View relationship graphically twoway lfitci expense !csat + Regression: Correlation test pwcorr csat expense. !star(.05) .

To reject this. the p-value has to be lower than 0. The t. expense is statistically significant in explaining SAT .10). robust !  The t-values test the hypothesis that the coefficient is different from 0. You can get the t-values by dividing the coefficient by its standard error. SAT scores decreases by 0.values also show the importance of a variable in the model. you need a t-value greater than 1.+ Regression: what to look for !  Lest run the regression: SAT scores and education expenditures Outcome Variable (Y) regress csat expense.05 (you could choose also an alpha of 0. Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this. For each onepoint increase in expense.96 (for 95% confidence). In this case.022 Constant (Intercept): state s mean SAT score if its expenditure is 0\$ + Regression: what to look for Significance of individual predictors: Is there a statistically significant relationship between SAT scores and per pupil expenditures? regress csat expense. robust Predictor Variable (X) Robust standard errors (to control for heteroscedasticity How state s mean SAT changes if its expenditure increases one unit.

robust . regress csat expense.+ Regression: what to look for !  Significance of overall equation This is the p-value of the model. Usually we need a p-value lower than 0. It tests whether R2 is different from 0. robust R-square shows the amount of variance of Y explained by X.05 to show a statistically significant relationship between X and Y. + Regression: what to look for !  Adding the rest of predictor variables regress csat expense percent income high college . In this case expense explains 22% of the variance in SAT scores.

901422 105.576 0.66149 945.52558 9.439705 33.03 3.9111 21. 26.002021 -3.248754 3.889 0.57704 808.0092676 -3.000 0.00 76. tab region.492 csat expense percent income high college reg2 reg3 reg4 _cons Coef.049 0.0000 0.00 100.77 2. generate dummies: tab region.56 -12.1674421 1.14 1.9661 .007647 -.39701 34.00 44.+ Regression: adding dummies (I) !  !  Region is entered here as dummy variable.44989 67.92 3.483864 -2. .82 0.10295 .00 24. g(reg) Geographica l region West N. robust Linear regression Number of obs = F( 8.888679 7.000 0. First.02694 1.00 18.86 2.86418 t -0.91 P>|t| 0.196409 1.2358047 1.0206 Robust Std. East South Midwest Total Freq.000 [95% Conf. 41) = Prob > F = R-squared = Root MSE = 50 69.0035883 .53143 2.00 100. 13 9 16 12 50 Percent 26.0751 . -.001 0.66 11.085 0.101086 15.670564 69.006 0. g(reg) .00 + Regression: adding dummies (I) regress csat expense percent income high college reg2 reg3 reg4 . Err.8037 50.814731 4.00 32.99933 12. .0052256 -2.583638 -.75 -0.45333 25.69293 53.4926 670.2592168 1.00 Cum. Interval] -.599798 17.

0751 NOTE: By default xi excludes the first value.3729* 0.82 0.6400* 0. _Iregion_1 omitted) Linear regression Number of obs = F( 8.000 0.0000 high 0.03 3. robust i.6784* 0. 41) = Prob > F = R-squared = Root MSE = xi: regress csat expense percent income high college i.66149 945.9661 .region _Iregion_1-4 (naturally coded.483864 -2.5495 -0.0000 .69293 53.region.583638 -.6509* 0.0000 0.+ Regression: adding dummies (II) !  Let Stata do your dirty work with xi command .0001 0.66 11.02694 1.3226 0.5319* 0. to select a different value.53143 2.52558 9. xi: regress csat expense percent income high college i.889 0.region.86 2.4926 670.814731 4.91 P>|t| 0.57704 808.0000 expense percent income high college expense -0.0252 0.0000 percent 0.0000 0.05) sig csat csat 1. Closer to 1 means strong correlation.007647 -.4663* 0.0000 college 0.0005 0.439705 33.7234* 0.901422 105.001 0. before running the regression type: .8037 50. .region.0000 income 0.0001 1.86418 t -0.0000 1.14 1.576 0.99933 12.196409 1.0092676 -3.1413 0.0000 0. go from -1 to 1.0052256 -2. Err.6091* 0. robust 50 69. pwcorr csat expense percent income high college.56 -12. when one goes up the other goes down).0070 1. -.2358047 1.000 [95% Conf.45333 25.0000 0.3133* 0.1674421 1.6733* 0.5099* 0.39701 34. star(0. robust This will select Midwest (4) as the reference category for the dummy variables.9111 21.599798 17. + Regression: correlation matrix !  Below is a correlation matrix for all variables in the model.75 -0.77 2.002021 -3.0000 1. A negative value indicates an inverse relationship (roughly.888679 7.492 csat expense percent income high college _Iregion_2 _Iregion_3 _Iregion_4 _cons Coef.000 0. Numbers are Pearson correlation coefficients.0858 0.2592168 1.006 0.4713* 0.085 0.92 3.10295 .0206 Robust Std. Interval] -.101086 15.670564 69.8758* 0.44989 67.248754 3.0006 -0. char region[omit] 4 xi: regress csat expense percent income high college i.0000 -0.049 0.0035883 .0000 1.

+ Regression: graph matrix !  Command graph matrix produces a graphical representation of the correlation matrix by presenting a series of scatterplots for all variables graph matrix csat expense percent income high college. we ll be testing multiple models at a time Can be difficult to compare results !  Stata offers several user-­ friendly options for storing and viewing regression output from multiple models: !  !  Store Output: eststo / esttab Outputting into Excel: outreg2 . half maxis (ylabel(none) xlabel(none)) + Regression: Managing all this outputs !  Usually !  when we re running regression.

167 (-0.56) -3.00202 (-0.631 (1.05.86) 51 808. memory until you ask to recall it: (1) (2) csat csat expense -0. robust eststo model3 percent income high + Regression: eststo/esttab !  esttab model1 model2 model3 Now Stata will hold your output in .66) 1060.73) 2.0*** (11.671** (2.58*** (3. *** p<0.618*** (-11.55) 51 851. just type: regress csat expense.86) 25.7*** (43.07) 0.09) 1.region.008*** (-12.14) 1.031 (0.75) -0. . robust eststo model2 percent income high xi: regress csat expense college i.70) -2.40* (2.01.6*** (14.96) (3) csat -0.91) 50 esttab model1 !model2 model3 percent income high college _Iregion_2 _Iregion_3 _Iregion_4 _cons N t statistics in parentheses * p<0. robust eststo model1 regress csat expense college.106 (0.001 .0223*** (-6.+ Regression: eststo/esttab ! We can store this info in Stata.45*** (3.44) 0. ** p<0.92) 69.00335 (0.77) 4.815 (1.03) 34.

58*** (9.05.doc regress csat expense percent income high college.207) 1.824 0.doc.618*** (0.0223*** (0.943) 2.008*** (0.53) 34.671** (1.35) 51 0.00367) % HS graduates tak~T Median household~000 (2) Mean compo~e 0.031 (2.0*** (67.86) 50 0.00335 (0.167 (1. ** p<0.450) 1060.00359) -3. *** p<0. r2 ar2 se label !  (1) Mean compo~e Per pupil expendit~c -0.00478) -2.201 851.114) (3) Mean compo~e -0.196) 1. robust outreg2 !using !outreg2 using prediction.894 % adults HS diploma % adults college d~e region==2 region==3 region==4 Constant Observations R-squared Adjusted R-squared Standard errors in parentheses * p<0. robust outreg2 using prediction.7*** (24.027) 4.236) -0.911 0.600) 69.805 808.6*** (57.229) 0.106 (1. append xi: regress csat expense percent income high college i.+ Regression: eststo/esttab Some options (type help eststo and help esttab for mor options) esttab model1 model2 model3.00) 25. robust outreg2 using prediction.29) 51 0.217 0. append .40* (12.00202 (0.region.01.631 (0.45*** (18.doc.815 (1.001 + Regression: outreg2 !  Avoid human error when transferring coefficients into tables regress !csat expense.

robust predict csat_predict label variable csat_predict "csat predicted” !  percent percent2 income high .region. the linearity of the model and the behavior of the residuals. Using predict immediately after running the regression: xi: regress csat expense college i.+ Regression: outreg2 + Getting predicted values !  How good the model is will depend on how well it predicts Y.

In this case the model seems to be doing a good job in predicting csat + Linear Regression Assumptions !  Assumption 1: Normal Distribution !  !  The dependent variable is normally distributed The errors of regression equation are normally distributed The variance around the regression line is the values of the predictor variable (X) same for all !  Assumption 2: Homoscedasticity !  !  Assumption 3: Errors are independent !  The size of one error is not a function of the size previous error of any !  Assumption 4: Relationships are linear !  !  AKA ‒ the relationship can be summarized with a straight line Keep in mind that you can use alternative forms of regression to test non-­ linear relationships .+ Getting predicted values scatter csat csat_predict 1100 800 850 Mean composite SAT score 900 1000 900 950 csat predicted” 1000 1050 We should expect a 45 degree pattern in the data. Y-axis is the observed data and x-axis the predicted data (Yhat).

!residual label !var resid "Residuals !of !pp !expend !and !SAT" histogram !resid. !normal + Testing for Normality !  Shapiro-­ Wilk test of normality tests null hypothesis that data is normally distributed .+ Testing for Normality predict !resid.

The null hypothesis is that residuals are homoskedastic. .+ Regression: testing for homoscedasticity Note: rvfplot command needs to be entered after regression equation is run ‒ Stata uses estimates from the regression to create this plot + Regression: !  testing for homoscedasticity A non-graphical way to detect heteroskedasticiy is the BreuschPagan test. However at 90% we reject the null and conclude that residuals are not homogeneous. In the example below we fail to reject the null at 95% and concluded that residuals are homogeneous.

This is the probability that some event happens. !  Logit regression is a nonlinear regression model that forces the output (predicted values) to be either 0 or 1.+ Logit/Probit Regression + Logit model Use logit models when every our dependent variable is binary (also called dummy) which takes values 0 or 1. !  Logit models estimate the probability of your dependent variable to be 1 (Y=1). the difference is in the distribution: !  !  !  !  Logit ‒ Cumulative standard logistic distribution (F) Probit ‒ Cumulative standard normal distribution ( ) Both models provide similar results. !  Logit and probit models are basically the same. .

+ Logit model + Logit: predicted probabilities .

+ Logit: Odds ratio + Logit: adjust .

Here is an example of the type of variable: !  + Ordinal logit: the setup .+ Ordinal logit !  When a dependent variable has more than two categories and the values of each category have a meaningful sequential order where a value is indeed ‘ higher’ than the previous one. then you can use ordinal logit.

+ Ordinal logit: predicted probabilities + Ordinal logit: predicted probabilities .

+ Predicted probabilities: using prvalue + Predicted probabilities: using prvalue .

+ Panel Data (fixed and random effects) + Panel Data Analysis .

students.e. etc. or variables that change over time but not across entities (i.). schools. With panel data you can include variables at different levels of analysis (i.e. it accounts for individual heterogeneity.e. non-response in the case of micro panels or cross-country dependency in the case of macro panels (i.+ Panel Data Analysis !  Panel data allows you to control for variables you cannot observe or measure like cultural factors or difference in business practices across companies.e. coverage). This is. correlation between countries) !  !  + Panel Data Analysis ! In this document we focus on two techniques use to analyze panel data: !  Fixed effects !  Random effects . states) suitable for multilevel or hierarchical modeling. national policies. federal regulations. Some drawbacks are data collection issues (i. international agreements. districts. sampling design.

+ Setting panel data: xtset + Exploring panel data .

+ Exploring panel data + FIXED-EFFECTS MODEL (Covariance Model. Least Squares Dummy Variable Model) . Individual Dummy Variable Model. Within Estimator.

). !  . Each entity has its own individual characteristics that may or may not influence the predictor variables (for example being a male or female could influence the opinion toward certain issue or the political system of a particular country could have some effect on trade or GDP or the business practices of a company may influence its stock price). etc. Each entity is different therefore the entity’ s error term and the constant (which captures individual characteristics) should not be correlated with the others.+ Fixed effects !  Use fixed-effects (FE) whenever you are only interested in analyzing the impact of variables that vary over time. FE remove the effect of those time-invariant characteristics from the predictor variables so we can assess the predictors’ net effect. !  + Fixed effects !  When using FE we assume that something within the individual may impact or bias the predictor or outcome variables and we need to control for this. FE explore the relationship between predictor and outcome variables within an entity (country. This is the rationale behind the assumption of the correlation between entity’ s error term and predictor variables. Another important assumption of the FE model is that those time-invariant characteristics are unique to the individual and should not be correlated with other individual characteristics. this is the main rationale for the Hausman test (presented later on in this document). If the error terms are correlated then FE is no suitable since inferences may not be correct and you need to model that relationship (probably using random-effects). person. company.

+ Fixed effects + Fixed effects .

+ Fixed effects + Fixed effects: Heterogeneity across countries (or entities) .

+ Fixed effects: Heterogeneity across years + OLS regression .

+ Fixed Effects using least squares dummy variable model (LSDV) + Fixed effects .

+ Fixed effects: n entity-specific intercepts using xtreg + Fixed effects: n entity-specific intercepts (using xtreg) .

+ Another way to estimate fixed effects: n entityspecific intercepts (using areg) + Another way to estimate fixed effects: common intercept and n-1 binary regressors (using dummies and regress) .

gender.245 !  . Technically. Substantively. Frauke Kreuter. time-invariant characteristics of the individuals are perfectly collinear with the person [or entity] dummies.” (Underline is mine) Kohler. race. so the estimated coefficients of the fixed-effects models cannot be biased because of omitted time-invariant characteristics [like culture. fixed-effects models are designed to study the causes of changes within a person [or entity]. p. A timeinvariant characteristic cannot cause such a change. etc] One side effect of the features of fixed-effects models is that they cannot be used to investigate time-invariant causes of the dependent variables. Data Analysis Using Stata. because it is constant for each person. Ulrich.+ Fixed effects: comparing xtreg (with fe). regress (OLS with dummies) and areg + A note on fixed-effects !  “ The fixed-effects model controls for all time-invariant differences between the individuals. religion. 2nd ed..

Partial Pooling Model) + Random effects .+ RANDOM-EFFECTS MODEL (Random Intercept.

RE allows to generalize the inferences beyond the sample used in the model.+ Random effects !  Random effects assume that the entity’ s error term is not correlated with the predictors which allows for time-invariant variables to play a role as explanatory variables. The problem with this is that some variables may not be available therefore leading to omitted variable bias in the model. In random-effects you need to specify those individual characteristics that may or may not influence the predictor variables. !  !  + Random effects .

+ FIXED OR RANDOM? + Fixed or Random: Hausman test .

+ OTHER TESTS/ DIAGNOSTICS + Testing for time-fixed effects .

+ Testing for random effects: Breusch-Pagan Lagrange multiplier (LM) + Testing for cross-sectional dependence/contemporaneous correlation: using Breusch-Pagan LM test of independence .

edu/repec/bocode/x/xtscc_paper. Daniel.+ Testing for cross-sectional dependence/ contemporaneous correlation: Using Pasaran CD test Source: Hoechle.bc. “ Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence” http:// fmwww.pdf + Testing for heterocedasticity NOTE: Use the option ‘ robust’ to control for heteroscedasticiy (in both fixed and Presence of heteroscedasticity random effects). .

+ Testing for serial correlation + Testing for unit roots/stationarity .

+ Robust standard errors + Summary of basic models (FE/RE) .