0 views

Uploaded by ReneeXu

Statistical analysis assumptions

Statistical analysis assumptions

© All Rights Reserved

- Statistical tools for Biomedical Research
- multivariate data analysis
- ~$48908979-MARDIANA-skripsi
- Logistic Regression Ensemble for Predicting Custom
- THE CRITICAL ORGANIZATIONAL FACTORS OF EGOVERNMENT IN KENYA
- Attitude of Secondary Students towards Basic Mathematics
- Predicting Long-Term Outcome After Acute Ischemic Stroke 2008
- statlit
- jurnal islam indnesia
- Sep Protocol Project
- Testing
- syllab705_17
- Educ Degree and Teaching Performance
- Timing and Predictors of Femoral Haematoma Development After Manual Compression of Femoral Access Sites
- Value of Time VOT
- Prediccion de La Probabilidad de Supervivencia
- Test of the Significance Examine the Validity of Inference Drawn From an Observation
- Unbased
- Emerg Med J 2015 Cameron 174 9
- Introduction to Multivariate Regression Analysis

You are on page 1of 10

Always mention get rid of the outliers, so don’t skew the dataset

- is an inferential statistic used to assess the equality of variances for a variable calculated

for two or more groups.

- If the resulting p-value of Levene's test is less than some significance level

(typically 0.05), the obtained differences in sample variances are unlikely to have

occurred based on random sampling from a population with equal variances. Thus, the

null hypothesis of equal variances is rejected and it is concluded that there is a difference

between the variances in the population.

- After running Levene’s test, we will assume that it came out not significant thereby

implying homogeneity of variance.

- is the magnitude of difference between groups or sometimes called the magnitude of the

experimental effect/phenomenon

- suggest a practical significance; whether it’s adequate to warrant action in the real-world

- Regression:

o we are looking for effect size we are looking for effect size (f squared:

small .02 medium .15, large .35; the difference between your null hypothesis and

the alternative hypothesis that you hope to detect), alpha of .05, expected power

(.8)

Statistical Power

- Rang from 0 to 1

- is the probability that a statistical test will detect differences when they truly exist

- If statistical power is high, the probability of making a Type II error, or concluding there

is no effect when, in fact, there is one, goes down.

For a given statistical test, the sample size is calculated from statistical power, effect size, and

significance level.

Type 1 error: false positive, reject a true null; type 2 error: false negative, fail to reject a false

null

One-Sample T-test

- Assumptions

o Independent random sampling

o Normal distribution

o SD of sampled population equals to that of the comparison population

- Overview: A one-sample t-test (two-tailed) is needed to compare the sample mean and

the known population mean to determine whether the difference between the two means

is statistically significant. The null hypothesis is that the sample mean are equal to each

other; the alternative hypothesis is that the two are significantly different.

- Requirements

- Bivariate independent variable

- Continuous dependent variable:

- DV measured on interval/ratio scale (although many treat ordinal scales

like interval scales as well)

- Generally, you should use the t-test if your two groups are very different (group 1

vs group 2). Otherwise, a correlation design might be better if your IV is a

continuous variable because less information is lost.

- Assumptions:

- Independent random sampling

- Normal distribution

- Central Limit Theorem can be generalized to imply that even when two

populations are not normally distributed, the distribution of sample means

will approach the normal distribution as the sample size increases

- If you find that either one or both of your group's data is not

approximately normally distributed and groups sizes differ greatly, you

have two options:

- transform your data so that the data becomes normally distributed

- run the Mann-Whitney U test which is a non-parametric test

- Homogeneity of Variance: HOV (Levene’s test), which will affect Type 1 error

rate if unequal (not significant)

- Generally not a problem if:

- Large sample sizes (at least 100 subjects in each)

- Both samples are the same size

- One sample variance is no more than twice as large as the other

- How to interpret results

- T value should be significant at 0.05 level

- Effect size, d: 0.2 - small, 0.5 - medium, 0.8 - large (Cohen, 1988)

- Requirements

- Bivariate independent variable

- Continuous dependent variable

- DV measured on interval/ratio scale (although many treat ordinal scales

like interval scales as well)

- Assumptions

- Independent Random Sampling

- Normal distribution

- How to interpret results

- T value should be significant at the 0.05 level

- Effect size, Cohen’s d (for dependent measures) - essentially same as for

independent samples (but best to specify that you’re doing it for

matched/dependent samples)

- 0.2 - small, 0.5 - medium, 0.8 - large (Cohen, 1988)

Chi-square

- Overview: measure whether there is a relationship between two categorical variables;

whether distribution of categorical variables differ from each other.

- statistical independence means that the frequency distribution of a variable is the

same for all levels of some other variable.

- Expected frequencies are the frequencies we expect in our sample if the null

hypothesis holds.

- Observed frequencies

- One-way Chi Square/Goodness of fit test

- Only one IV (one variable, like a one sample t-test)

- determine whether or not the relative frequencies in the observed

categories are similar to, or statistically different from, the hypothesized

relative frequencies within those same categories

- Alternative - there is a difference between each level of the IV

- Assumptions

- Nominal/ordinal (categorical) data for DV

- Independence of observations

- Groups of the categorical IV should be mutually exclusive (e.g. a male

employee will be counted only under the male level, and cannot be

counted under the female level)

- At least 5 expected frequencies in each group of your categorical IV

- How to interpret results

- Chi square value should be significant at the 0.05 level - implying that

each level of the IV is significantly different from the other levels

- Effect size - Cramer’s phi

- 0.1 - small, 0.3 - medium, 0.5 - large

- Two-way Chi Square/Test of Independence

- 2x2 design two variables, similar to an interaction effect of an ANOVA

- Null - the two IVs are independent of each other

- Is the outcome in one variable related to the outcome in some other

variable

- Assumptions

- Two IVs should be measured using categorical data (nominal/ordinal DV)

- Each IV should have at least two levels that are independent of each

other/mutually exclusive

- How to interpret results

- Pearson Chi Square should be statistically significant at the 0.05 level -

implying that there is a statistically significant association between the two

IVs (they seem to be interacting)

- Effect size - Cramer’s phi

- 0.1 - small, 0.3 - medium, 0.5 - large

One-way ANOVA

- Overview: compare the means of # sample groups and determine whether any of those

means are statistically significantly different from each other.

- One IV multiple levels

- Independent

- Assumptions

- HOV (Levene’s test for HOV)

- Independent Random Sampling

- Normal Distribution

- How to interpret results

- F ratio should be significant at the 0.05 level - only tells you that a

statistically significant difference exists but not where it exists

- Use post-hoc tests

- if IV has only two levels - use an independent samples t-

test

- If IV has three levels - use Fisher’s LSD (because only

three pairs of comparisons)

- If IV has more than three levels - use Tukey’s HSD

Note: Can use a Bonferroni adjusted alpha level for post-hoc tests

- Effect size - ETA SQUARED

- Small: .01

- Medium: 06

- Large: .14

- http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize

- Repeated Measures

- Assumptions

- Sphericity (Mauchly’s W) - basically means that all pairwise interactions

will be equally large (for amount of interaction between any two levels of

the IV)

- Independent Random Sampling

- Normal Distribution

- How to interpret results

- F ratio should be significant at the 0.05 level - only tells you that a

statistically significant difference exists but not where it exists

- Use post-hoc tests

- if IV has only two levels - use (a dependent?) samples t-test

- If IV has three levels - use Fisher’s LSD (because only

three pairs of comparisons)

- If IV has more than three levels - use Tukey’s HSD

Note: Can use a Bonferroni adjusted alpha level for post-hoc tests

- Effect size - ETA SQUARED

- Small: .01

- Medium: .06

- Large: .14

- http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize

Two-way ANOVA

- Overview: understand if there is an interaction between the two independent variables on

the dependent variable

- Multiple IVs with multiple levels

Note - for independent/repeated measures - same protocol except look at assumptions for

respective one-way designs)

Mixed design:

- Null: the means of IV1 conditions are equal; the means of IV2 conditions are

equal; no interaction between IV1 & IV2

- Assumptions

- HOV

- Homogeneity of Covariance across groups (Box’s M)

- Sphericity

- Independent Random Sampling

- Normal Distribution

- How to interpret results

- You will have two main effects and one interaction effect (3 F ratios)

- If main effects significant, use post-hoc tests for follow up (same as One-

Way ANOVAs)

- If interaction effect significant, will need to heed more to this than your

significant main effects

- If IV(s) has two levels - post-hoc will be a t-test

- If IV(s) has more than two levels - simple main effect - keeping

one level of IV1 constant, and doing a one-way ANOVA for the

other IV

- Again, this will tell you that a difference exists, but you

don’t know where, so maybe follow up with a Fisher’s

LSD/Tukey’s HSD

- Ordinal and disordinal (check)

- Effect size

- Eta squared for main effects of each of the two IVs

- 0.01 - small, 0.06 - medium, 0.14 - large

- Partial eta squared for the interaction effect between the two IVs

- 0.01 - small, 0.09 - medium, 0.25 - large

Correlation

- Requirements:

- Each variable should be continuous

- Each observation/participant should have a pair of values (related variables)

- Assumptions:

- Linearity

- Independent random sampling

- Normal distribution

- Absence of influential outliers

- Homoscedasticity (look at scatterplot, distance from data points to straight line

should be roughly equal)

- Used for:

- Test-retest reliability

- Internal consistency (split-half)

- Interrater reliability

- Criterion validity

- Strength of correlation, Pearson’s r:

- .1, .3, .5 (Cohen, 1988)

Linear Regression

- Assumptions:

- Linearity

- Independent random sampling

- Normal distribution

- Absence of influential outliers

- Homoscedasticity (look at scatterplot, distance from data points to straight line

should be roughly equal)

- How to interpret results

- R squared - The amount of variance accounted for by the IV

- Small - 0.01, Medium - 0.09, Large - 0.25

- Check the beta weight for per unit change in DV resulting from per unit change in

IV - sign of beta weight will explain direction of relationship

Logistic Regression

- Overview:

- predicting for every unit increase in IV, what is the likelihood of a dichotomous

outcome

- model the probability of an event occurring depending on the values of the

independent variables, which can be categorical or numerical

- estimate the probability that an event occurs for a randomly selected observation

versus the probability that the event does not occur

- predict the effect of a series of variables on a binary response

- classify observations by estimating the probability that an observation is in a

particular category

- Assumptions (http://www.statisticssolutions.com/assumptions-of-logistic-regression/)

- NOT REQUIRED: First, logistic regression does not require a linear relationship

between the dependent and independent variables. Second, the error terms

(residuals) do not need to be normally distributed. Third, homoscedasticity is not

required. Finally, the dependent variable in logistic regression is not measured on

an interval or ratio scale. However, some other assumptions still apply.

- Dependent variable to be binary

- Observations to be independent of each other - in other words, the observations

should not come from repeated measurements or matched data.

- Little or no multicollinearity among the independent variables - this means

that the independent variables should not be too highly correlated with each other

- Linearity of independent variables and log odds. although this analysis does

not require the dependent and independent variables to be related linearly, it

requires that the independent variables are linearly related to the log odds.

- Logistic regression typically requires a large sample size. A general guideline

is that you need at minimum of 10 cases with the least frequent outcome for each

independent variable in your model. For example, if you have 5 independent

variables and the expected probability of your least frequent outcome is .10, then

you would need a minimum sample size of 500 (10*5 / .10).

- How to interpret results

- Nagelkerke R squared - amount of variance accounted for by predictors in the

regression model in predicting the DV

- Odds-ratio - change in odds of predicting DV with one unit change in one IV,

holding all other variables constant

- Example - if position grade was found significant, then it would suggest

that a change in one unit of grade will affect (increase/decrease) the

likelihood of return by (odds ratio) times - this is with reference to

expatriate training

Multiple Regression

- Assumptions of Multiple Regression (COHEN, 2013)

- Minimal sample size - 41 + number of predictors (accounts for

- Independent Random Sampling – individual cases should be selected

independently of each other

- Normal Distributions – all variables involved in the multiple regression are

normally distributed

- Homoscedasticity – errors from the regression surface (e.g. line, plane, etc.) have

the same variance in all locations

- Multivariate Outliers – combinations of values on three or more variables that are

unusual and may indicate measurement errors or psychological phenomena

- Measuring Leverage, Residuals and Influence – we probably want to do an outlier

analysis to ensure that these are in check

- Leverage – outliers that can easily rotate the regression line

- Residuals – a point’s value on the regression line minus its predicted value

- Influence – outliers that have leverage and large residual values

- Dichotomous Predictors – all categorical variables have been coded into numeric

values

- Categorical variables with more than two levels have been coded into

dichotomous categorical variables – these are IVs, therefore multiple

logistic regression not done

- Problems with Multiple Regression that will be addressed prior/post-test

- Multicollinearity – no two variables are perfectly or highly correlated, or

predicted by a combination of other variables

- Shrinkage – when regression model based on one sample but to be applied to

another

- Cross-validation to address shrinkage – take one half of sample and create

regression model, use beta weights to apply to the other half of sample and

see if it works

- HAVING TOO MANY PREDICTORS - USE A BONFERRONI ADJUSTED

ALPHA FOR YOUR REGRESSION MODEL + MINIMAL SAMPLE SIZE =

41 + # OF PREDICTORS - CITE THE STATS TEXTBOOK - COHEN (2013)

- How to interpret results

- Check R squared value for amount of variance explained by predictors in the

regression model

- Check beta weights to see what the per unit change in DV resulting from per unit

change in IV will be (for each IV) - the sign of the beta weight will indicate

positive/negative relationship

NOTE: We will only use predictors that have statistically significant correlations with the DV in

the regression model - relationship can be positive/negative or weak/moderate/strong

EFFECT SIZE CHART (WE DON’T KNOW ABOUT ETA SQUARED - TBD)

Effect size ‘r’ (correlation) R2 (R squared) f Cohen’s d (t-

Cramer’s phi Partial eta tests)

(for chi square) squared

arcsines proportions

(2013)

squared MANOVA

Cohen's f one-way an(c)ova 0.1 0.25 0.4

(regression)

rho

- Statistical tools for Biomedical ResearchUploaded bySantanu Ghorai
- multivariate data analysisUploaded byLalit Shah
- ~$48908979-MARDIANA-skripsiUploaded byImron_09
- Logistic Regression Ensemble for Predicting CustomUploaded byJames Sarumaha
- THE CRITICAL ORGANIZATIONAL FACTORS OF EGOVERNMENT IN KENYAUploaded byATS
- Attitude of Secondary Students towards Basic MathematicsUploaded byJASH MATHEW
- Predicting Long-Term Outcome After Acute Ischemic Stroke 2008Uploaded byBilge Serhateri Yenidemir
- statlitUploaded byapi-244936632
- jurnal islam indnesiaUploaded byCondro Triharyono
- Sep Protocol ProjectUploaded byssam890209
- TestingUploaded byMehmood Ali
- syllab705_17Uploaded byRohit
- Educ Degree and Teaching PerformanceUploaded byRichelleElisan
- Timing and Predictors of Femoral Haematoma Development After Manual Compression of Femoral Access SitesUploaded byAhmad Khalil Ahmad Al-Sadi
- Value of Time VOTUploaded byiciubotaru20002123
- Prediccion de La Probabilidad de SupervivenciaUploaded byJose David Massa
- Test of the Significance Examine the Validity of Inference Drawn From an ObservationUploaded byAbdul Hamid
- UnbasedUploaded byTrilce
- Emerg Med J 2015 Cameron 174 9Uploaded bykyurinkim
- Introduction to Multivariate Regression AnalysisUploaded bynikowawa
- TARD_44_4_212_218[A]Uploaded bySergio Tamayo
- 63Uploaded byEd Noel De Leon
- Apprehension of Water Absorption and Heat Capacity OfUploaded byKian Aquilo
- Soyer & Hogarth_2012Uploaded bymavven
- SSRN-id1996568Uploaded byArop Ndras
- LAMPIRAN PERHITUNGAN SPSS.docxUploaded byDhyc Dhinca
- 00144b8d73c72b773792968607da90f7585f.pdfUploaded byniclover
- Initial PagesUploaded bysaksham jain
- 20588.pdfUploaded byEzekiel Fernandez
- Estimating Changing Volatility in Cash Flow Simulation Based Real Option Valuation With Regression Sum of Squares Error MethodUploaded byCharuJagwani

- McClelland_on_Power.pdfUploaded byReneeXu
- Three Way ANOVAUploaded bynehdia_s
- R cheat sheet.xlsxUploaded byReneeXu
- asdasdUploaded byJ Tran
- Personal Bargaining InventoryUploaded byReneeXu
- Power Risk TakingUploaded byReneeXu
- Blass_on_Obedience.pdfUploaded byReneeXu
- Fat Pig ScriptUploaded byNick Ward
- This is Our YouthUploaded byReneeXu
- Leon: The Professional ScriptUploaded bynatashamdl
- Atonement ScriptUploaded byKelliana

- p19 WarnerUploaded byreix
- 168 Nearest Neighbour.pdfUploaded byKurniawan Oween Sullivan
- 10.1.1.40Uploaded byEbit Nur Alfian
- Module II Lec2Uploaded bydinesh111180
- Michael Festing ILAR PresentationUploaded byNational Academies of Science, Engineering, and Medicine
- skittles project april 7Uploaded byapi-253708332
- 3.2 - Hypothesis Testing (P-Value Approach)Uploaded byHabib Mrad
- ARDS Meduri JAMA 1998Uploaded byRivaneide Amorim
- Math117FinalReview-Fall2014Uploaded bySayeesh Kapu
- Hypothesis Testing ProcedureUploaded byNikita Mehta Desai
- minitab spcUploaded byMahender Kumar
- Meehl, Paul - Theory-Testing in Psychology and Physics, A Methodological ParadoxUploaded byAndrés Asturias
- Chap 08 StudentUploaded by7814262
- Schank KoehnleUploaded bypepoandino
- The Chi Square StatisticUploaded byC Athithya Ashwin
- Mayo 2-19-13seminar WhatIsThePhilosophyOfStatisticsUploaded byahmed22gouda22
- AIPM-6-21Uploaded bySebastian Parrales
- 38-maup-openshawUploaded byMarcelo Miño
- Epid Exam Midterm 04Uploaded byChikezie Onwukwe
- JAMA 2Uploaded byIssac Rojas
- Lecture Notes Exam ReviewUploaded byJo Jo
- Exam Solutions 2016 2017 Semester 2 Adapted MODUploaded byRyan Lee
- STAT 230 Final Examination, Summer 2015 OL1_US1Uploaded byteacher.theacestud
- Studying Effect of Privatization on Application of Management Accounting Tools and Performance of National Iranian Copper Industries CompanyUploaded byTI Journals Publishing
- Alfred Mele - Testing Free WillUploaded byMiep La Croquette
- CHAPTER 3 ProbabilityUploaded byyahska1305
- StillwellUploaded byyasin_s
- 40007_Chapter8Uploaded byasimi
- QNT 561 Week 4 Lab Work (New)Uploaded bymkt--593
- Level1LOS2017.pdfUploaded byAnonymous tPvhuo