You are on page 1of 6

SD= sqrt (SS/df)

How do you calculate a Z score


(X- mean )/sd
Power= 1 Type 2 error rate
Effect Size Sample Size Random Variation
Power analysis: EDS, Type 1 & Type 2 exp. Sd & effect size of interest
Cohens d
Standardised effect size for difference between mean
d = diff means/ pooled sd
Cooks >1 = bad .
Overall influence of a data point on overall model.
Leverage Values
influence of the data point on predicted values
> 2(K+1)/n

DFFIT= change in fit of model


DFBeta= change of slope
>1= substantial influence
Size of the residual = distance from regression line
Leverage = the distance from the overall mean of the predictor
Increased colinearity = StE of B coeffs. Increases
VIF> 10 = bad
Tolerance = 1/VIF
Automatic variable selection = increases type 1
Dont use ShapWilk if n>100
Too powerful
One group test single proportion = Z test for proportion
2 groups, 2 equal proportions and data paired- McNemars
2 Groups, 2 equal proportions, unpaired data
Expected values >5 in 80% cells= Chi squared
Expected values <5 in 20% of cells or more= Fishers Exact
2 Groups, testing for trends, independent data
Linear by linear association aka Chi-squared trend
One-sample t-Test for the mean t= (sample mean- hypothesized mean)/SEM
Unpaired t-Test= Difference in means 0 / Standard Error of Difference
df= (N1 + N2 2)
Paired t-Test for the mean t=(mean difference between pairs-0/ SEM)
Equality of variance
Levenes; P>0.05 = assumption met
P <0.05; Welchs
Anovas- pretty robust
Independence of residuals - Durbin Watson 1-3 = ok
Repeat measure = sphericity (mauchly's) - basically variance
Greenhouse-Geisser correction
Corrects df and sig values

Tukey
sample sizes equal + good trade-off between type 1 and type 2 errors
Bonferroni
Corrects P-value by the number of tests being performed
V. conservative
high level of confidence
Dunnetts
Groups vs a single control group
enables 1-tailed test
Games-Howell
Variance not equal
Gabriel Test
Slightly different sample sizes
Hochberg GT2
Sample sizes very different
Planned contrast
Reduce type 1
Helmert = Orthogonal
Each category except the last is compared to the mean of subsequent categories.
Difference Contrast= Reverse Helmert
Non-orthogonal =bad
Inflated P-value
Higher type 1
Polynomial Contrasts
Orthogonal already run- no need for weighting
Trends in data- which cannot be obtained directly using post-Hoc tests
Data must be ordered
Repeated factors- Bonferroni
Independent factors- Sidak
Interaction plots
Parallel = no sig. Interaction
Horizontal - no sig. Effect of factor 1 e.g dose
Overlapping = no sig effect of factor 2 e.g sex
Pearson correlation coefficient
degree of association between two paired variables.
Scatter plot to visualize
R misleading with outliers or non-linear
R=0.1 small effect
R=0.3 medium effect
R=0.5 large effect
Non parametric= Spearman
Standardised residuals >3 worth checking as an outlier
>5% of Residuals >2 indicates model is poor fit
Q & A with Shirshah Brennan

1. Sample data is unimodal but not normally distributed, what would be the most important
summary statistic?
a. Median and interquartile range
b. SD, SEM & V are all calculated in relation to the mean- mean and range are more
sensitive to outliers than median
2. Joe has a BMI of 15.62 kg/m^2, what is the percentage of population that has a lower BMI
than him if the mean for a normally distributed population is 23.16 and SD = 7.55?
a. Calculate Z score -> Z = (x -u) /sd
Z = (value - population mean) / standard deviation
(15.61 -23.16 )/7.55 = -1
68% of the population lie within 1 SD - (100-68)/2
One tailed as less = 16%
3. Sample of 30 males have a glucose level of 6.0mM with a variance of +/- 1.9mM, what is the
standard deviation?
a. Variance= sd
b. 1.9 =1.4
4. 1000 samples are drawn randomly from a normal population, how many samples would you
expect the population mean to lie within the range of 2.58 standard errors of the sample
mean?
a. 990
b. 95% = +/- 1.96 SEM
c. 99% = +/- 2.58 SEM
d. 99.9% = +/- 3.29 SEM
5. How would data on performance be best presented graphically to compare performance of
students in groups A and B
a. Boxplots
b. Shows ranges and position of outlying data points
6. Interpretation of P <0.05
a. P> 0.05 means there is insufficient evidence for an effect to be notable
7. What would cause an increase in the probability of committing a type one error
a. Using 5% instead of 1% sig level
b. P = 0.01 means there is a 1% chance of making a type 1 error
c. Using a two tailed hypothesis will decrease the chance of making a type 1 error
d. Using an unpaired t-test rather than a paired t-test on paired data will result in a
decrease in power and an increase in type 2 error rate but not affect type 1 error rate
e. Increasing the sample size increase the power of the test and decreases type 2 error
rate but no effect on the type 1 error rate
f. Using a nonparametric test on data the meets the assumptions for a parametric test
will result in a loss of power and therefore increase type 1 error rate but will have no
effect on type 1 error rate

8. 64 students with a mean mark of 62.7 sd = +/-8 , historical average is 60 where doe the p
value lie where the new exam > old exam ?
a. Sind SEM = sd/ n ; 8/64 =1
95% cl = 62.7 - (1x 19.6) = 60.74
99% = 62.7 - (1 x 2.6) = 60.1
99.9% = 62.7 - (1 x 3.3) = 59.4 = within the old mark (6))
P = 0.001 -> because it is one tailed P< 0.0005
9. WHat does levenes test measure ?
a. Equality of variance if P>0,05
10. What will occur if you used multiple t-tests at p<0.05 to assess the differences between
means of 4 groups ?
a. Overall type 1 error rate will increase
b. Multiple t-testing increases alpha false positives
c. Conversely ANOVA has a higher type 2 error rate
11. Skin infection was treated in 40 out of 60 patients treated with an antifungal cream,
compared to another treatment with cream containing steroids where 45 out of 48 were
treated. What is the standard error of proportion of patients treated with antifungal cream ?
a. Find proportion 40/60 = 0.66
b. SEP = ((P( 1-P))/n) = (40/60, x (1 - 40/60) /60 ) = 0.061
12. Same Q as 11 but what is approximate 99% confidence interval for percentage of patients
treated with antifungal cream ?
a. 99% =+/- 2.6 x SEP = 2.6x 0.061 = 16.2
b. 40/60 = 66.67%
c. 66.67 +/- 16.2 = range 50 to 82 %
13. Normally distributed variables x and y show significantly correlated p-value of 0.006 and
pearson's correlation coefficient of 0.325, approximately how much variability is there ?
a. Pearsons correlation coefficient = r = 0.325
b. r^2 = 0.325^2 * 100 = 11%
14. What does a pearson's correlation coefficient of -0.27 between variables A and B with p -value
of p = 0.044 ?
a. An increase in A is associated with a decrease in B with small effect size.
b. 0.1- 0.3 = small ; negative means as A increases B negative
15. A stats test has a significance level of 5 % and a power of 80% ; what is the probability of
incorrectly accepting the null hypothesis when an effect actually exists ?
a. Power = 1- type 2 error rate
b. If power= 80% type 2 error rate = 20%
16. Standard Error of sample mean is 0.4 ; SD = 8 how many data points are there in the sample ?
a. 400
b. SEM = sd /n ; 0.4= 8/ n ; n= 20 ; n= 400
17. An extraneous variable that affects the variables being studied - so that the results obtained
fail to reflect the actual relationship between the studied variables is known as?
a. A confounding variable
18. Frequency histogram for a sample of data is found to be positively skewed- what would be an
appropriate measure of central tendency for the data ?
a. Median
19. Assuming 95% confidence intervals for the mean of a normally distributed sample was 193 to
252 what was the standard error of the mean of the sample?
a. 15
b. 59/4 = 14.75 ~ 15
20. Significant relationship for resting heart rate to increase as reaction rate decreased in a group
of elderly patients. 20% of variability in reaction rate could be explained by its association with
resting heart rate, what is the pearson's correlation coefficient ?
a. r^2 = 0.2; pearson's correlation coefficient = r
2 = 0.2 = 0.44 ; there is a negative correlation so - 0.44
21. If unpaired t-test is used to analyse the difference between 2 samples of paired data - what is
the most adverse consequence
a. Increased type 2 error rate
22. 20 randomly selected UK universities - how many in the sample lie within the range of +/- 1 sd
?
a. 20*0.68 = 13.6
23. Kruskal Wallis test has little power when analysing sample sizes of 7 or less to detect a
significant difference at p<0.05 increasing the likelihood of what?
a. Accepting the null hypothesis when there really is a difference among groups
24. A one-way ANOVA had to be applied using welch's correction - what does this indicate?
a. Equality of variance for residuals cannot be assumed
25. If a real difference between groups exists but a statistical test fails to reject the null
hypothesis at 5% sig what type of error has occurred ?
a. Type 2 error
26. Why is excessive multicollinearity an issue when analysing data using linear regression ?
a. It increases the standard errors of the regression coefficients
27. 2 randomly selected samples of patients either treated with melatonin or a placebo. Which
test should be used to test the null hypothesis that there is no difference in the two groups
a. A two tailed unpaired t-test
28. Which stats test should be used to test the claim that a new course gives better exam results
than an old course ?
a. One-tailed one sample t-test
29. For Q28 what is the p-value that the new course is better than the old course ? (16 students ,
new course mean mark = 65 ; sd = 8; old course mark = 60 )
a. Sind SEM = sd/ n = 8/16 = 2
b. 95% = 65 - (2* 1.96) = 61.08
c. 99% = 65 - ( 2 *2.6) = 59.8
d. P value = 0.01 ; one tailed = 0.005
30. Why is automatic variable selection a bad thing in multiple linear regression ?
a. Increase type 1 error
31. What is the Durbin Watson test in regression ?
a. Independence of residuals - 1-3
32. Sample of 36 inbred mice had a mean weight of 25g sd = 3g , how large a sample of outbred
mice have a mean weight of 30 g with a sd of 5g would be needed to have a similar 99% cl for
the mean?
a. SEM = 3/root 36 = 0.5
b. Condition 1 99% cl = condition 2 cl
c. Therefore 3/ sqrt 36 = 5/ sqrt n ; 0.5 = 5 / sqrt n -> n =100
33. What is the non-parametric test for a one way repeated measures ANOVA
a. Friedman's
34. What is the best way of increasing power in an independent 2 way ANOVA used to test dose
dependent effect of 4 different drug doses in 2 different inbred strains of mice ?
a. Increase numbers in each group
35. MLR investigating BMI height and weight - what invalidates this analysis?
a. Entanglement

You might also like