INSTITUTE OF NEUROLOGY

Saiful Islam

Chris Hardy

Feedback from you about past 2 lectures

Error (SE) and Confidence Interval (CI)

•Useful examples

•Bad timing as in the afternoon and everyone

was tired / sleepy.

•Class quizzes are useful

Standard Deviation (SD) vs Standard Error (SE)

• The SE quantifies the typical error or difference between the mean measured in a sample

and the theoretical mean in the population from which the sample was drawn

The SE indicates how accurately the sample mean estimates the population mean

• Standard deviation (SD) of a sample of observations measures how a typical observation in

the sample differs (deviates) from the sample mean

Confidence interval – an example

• There is 95% probability that this interval contains the unknown

but true value of the population mean

• Assume we have a large sample size > 30 say low blood pressure

of the patients, also we found these data are normally

distributed then,

• We need to obtain an upper and lower limit of the interval and

say

95% CI for mean is ( x 2 SE ) to ( x 2 SE )

• Approximated 1.96~2

Learning Outcomes for today’s lecture

•When do you need these tests?

•Difference between paired and unpaired

data

•How to compare data using:

•Mann-Whitney test (two independent samples)

•Kruskal-Wallis test for > 2 independent samples.

•Wilcoxon-Signed Rank test (paired data)

•How to interpret results from STATA output

Current research at UCL

primary progressive aphasia (lvPPA)

struggle to understand speech?

Logopenic Variant PPA

•Rare variant of Alzheimer’s disease

•Young onset (<65)

•Impaired repetition of phrases and

sentences

•Word finding difficulties

understanding speech in noisy places

[video]

Speech perception in lvPPA

Current research at UCL

Current research at UCL

17 healthy controls

7 patients with lvPPA

(9 patients with nfvPPA)

and Neursurgery

Centre

Paradigm

freq

0 1800

time (msec)

Research question

understand degraded speech?

List of variables

Diagnosis (Categorical)

SinewaveScore (SWS) (Continuous)

Bin1 (Trials 1-10 SWS; Continuous)

Bin 2 (Trials 11-20 SWS; Continuous)

Bin 3 (Trials 21-30 SWS; Continuous)

Bin 4 (Trials 31-40 SWS; Continuous)

Study objectives

understand degraded speech

2. Control controls with nfvPPA

3. Compare controls, lvPPA, and nfvPPA

4. Compare SWS score between two time

points (Bin 1 and Bin 4) in the lvPPA

group

What are non-parametric tests?

and assume that distribution of sample means are ‘normally’

distributed – (planned to cover lecture-4 on 23 Oct 2018).

•Often data does not follow a Normal distribution eg number of

cigarettes smoked, cost to NHS etc.

•Positively skewed distributions

20

15

Frequency

10

Mean = 8.03

Std. Dev. = 12.952

N = 30

0

0 10 20 30 40 50

What are non-parametric tests?

situations where fewer assumptions have to be made

• Sometimes called Distribution-free tests

• NP tests STILL have assumptions but are less

stringent

• NP tests can be applied to Normal data but

parametric tests have greater power IF assumptions

met

Ranks

•Practical differences between parametric and

NP are that NP methods use the ranks of

values rather than the actual values

•E.g.

1,2,3,4,5,7,13,22,38,45 - actual

1,2,3,4,5,6, 7, 8, 9,10 - rank

Median

• The median is the value above and below which 50% of

the data lie.

• If the data is ranked in order, it is the middle value

• In symmetric distributions the mean and median are the

same

• In skewed distributions, median more appropriate

Class exercise : Find

median 1 min

• Blood Pressure measures of 7 patients:

135, 138, 140, 140, 141, 142, 143

Median= ?

0, 1, 2, 2, 2, 3, 5, 5, 8, 10

Median=

Paired And Not Paired Comparisons

occasions then this is a paired comparison

• Two independent samples is not a paired comparison

• Different samples which are ‘matched’ by age and gender

are paired

Non parametric tests:

Wilcoxon tests

• Frank Wilcoxon was Chemist in USA

who developed

test

test

Please note that parametric will discuss in next lecture (lecture-4) on 23rd

October 2018.

•Histogram

20

15

Frequency

10

5

0

0 50 100 150

numbersSWS

•Histogram by group 0 1

0

Density

0 50 100 150

0

0 50 100 150

numbersSWS

Density

•0 vs 1 not similar

normal numbersSWS1

Graphs by Group

•0 vs 2 not similar

•1 vs 2 similar

•Null hypothesis : there are no difference in

distribution of SWS score between control

and lvPPA group

•Alternative hypothesis : there are some

differences in distribution of SWS score

between control and lvPPA group.

•Now we check quantile-quantile (q-q) plot to check

normality for group = 1 (control group).

•Data point are

Away from the

straight line

suggests not normal

•Now we check quantile-quantile (q-q) plot to check

normality for group=2 (lvPPa).

•Data point are

Away from the

straight line

suggests not normal.

Very few data points

as well.

•Control group and lvPPA group are

independent , very small sample and none of

them are normally distributed so met the

assumptions of non-parametric test.

•We should choose non-parametric version

of two independent sample test called Mann-

Whitney test to compare SWS score

•Stata output

STATA code . ranksum numbersSWS1 if Group== 0 | Group==1 , by(Group)

0 17 264 212.5

1 7 36 87.5

STATA output

adjustment for ties -1.19

z = 3.279

Prob > |z| = 0.0010

• The output gives us a handy table displaying the two groups, their

Obs (number of observations), the observed ranked sums and the

rank sum that would be expected if the null hypothesis were retained

(if there were no difference).

• Tied ranks can be an issue, so below the table there is a variance

adjustment to account for these ties.

• Then you are reminded of the null hypothesis, and given the z-

statistic (3.29) and p-value (0.001); which suggests that there are

significant difference in the distribution (medians) between control

and experimental group in SSW.

Class Quiz 1 min in

pairs

and experimental lvPPA are not

independent?

•Null hypothesis : there are no difference in

distribution of SWS score between control

and nfvPPA group

•Alternative hypothesis : there are some

differences in distribution of SWS score

between control and nfvPPA group.

•Now we check quantile-quantile (q-q) plot to

check normality for nfvPPa.

•Data point are

Away from the

straight line

suggests not normal.

*Very few data points

as well.

•Control group and lvPPA group are

independent , very small sample and none of

them are normally distributed so met the

assumptions of non-parametric test.

•We should choose non-parametric version

of two independent sample test called Mann-

Whitney test to compare SWS scores.

•Stata output

STATA code . ranksum numbersSWS1 if Group== 0 | Group==2 , by(Group)

0 17 286 221

2 8 39 104

STATA output

adjustment for ties -1.25

z = 3.795

Prob > |z| = 0.0001

• The output gives us a handy table displaying the two groups, their Obs

(number of observations), the observed ranked sums and the rank sum

that would be expected if the null hypothesis were retained (if there

were no difference).

• Tied ranks can be an issue, so below the table there is a variance

adjustment to account for these ties.

• Then you are reminded of the null hypothesis, and given the z-statistic

(3.79) and p-value (0.001); which suggests that there are significant

difference in the distribution (medians) between control and

experimental group in SSW.

•Null hypothesis : there are no difference in the

distribution of at least one pair of SWS score

(among control , lvppa and nfvPPA)

in distribution of SWS scores at least between one

pair.

•Quantitative measure for all outcome

•Overall outcome not normally distributed

•The shapes of at least one pair in groups not

similar

•Each groups are independent from each

other

1. We have Quantitative measure for each

outcome

2. Overall outcome not normally distributed

3. The shapes of at least one pair in groups not

similar (e.g.; control vs lvfppa measure)

4. Each groups are independent from each other

•As we have more than two groups and overall

outcome not normally distributed so a non

parametric test is preferred

•We have more than two independent groups.

•We will consider a non-parametric test called

Kruskal-Wallis equality of populations rank

test

•STATA output

• kwallis numbersSWS1, by( diagnosis1) STATA command

Kruskal-Wallis equality-of-populations rank test

Control 17 397.00

lvPPA 7 63.50

nfvPPA 8 67.50 STATA output

probability = 0.0001

probability = 0.0001

• We had ties in our data, so we want to consult the Kruskal-Wallis H test results

highlighted in the red rectangle above.

• The top line (i.e., "chi-squared with ties = 19.37 with 2 d.f.") reports the chi-squared

value and the degrees of freedom of the test.

• The line below this one (i.e., "probability = 0.0001") indicates the statistical

significance of the Kruskal-Wallis H test (i.e., the p-value).

• We can see that the significance level is 0.0001 (i.e., p = .0001), which is below 0.05,

and, therefore, there is a statistically significant difference in the median score

between the three different groups of the independent variable, SWS (i.e., control

vs lvfppa vs nfvPPA )

• There are only 7 patients in this group

parametric test is appropriate

time1 and time4

• First take the differences of SWS between two time points:

Table : SWS score between two time

points for the patients with lvfppa

diff = time4-

Time1 Time4 time1

2 8 6

12 17 5

2 6 4

20 27 7

30 28 -2

1 0 -1

23 30 7

• Almost all the data points in q-q plots are away from the straight line

so we apply a non-parametric Wilcoxon signed-rank test (an

alternative to paired t-test) for testing the hypothesis that there is no

difference between in SWS score between two time points.

• Use STATA command: gen diff=Time4-Time1 if Diagnosis==2

• Stata output

• signrank diff=0 STATA command

Wilcoxon signed-rank test

positive 5 25 14

negative 2 3 14

zero 0 0 0

all 7 28 28

STATA

unadjusted variance 35.00

adjustment for ties -0.13

output

adjustment for zeros 0.00

Ho: diff1 = 0

z = 1.863

Prob > |z| = 0.0625

• Stata output

Binom. Interp.

Variable Obs Percentile Centile [95% Conf. Interval]

diff 7 50 5 -1.685714 7

•The test gives a p-value of 0.0625 suggesting that there is

not enough evidence of the difference in median of SWS

scores between two time points for the patients with

lvfppa.

• The 95% confidence around the median falls between -1.68

and 7. This confidence interval includes 0, which indicates

there is no much difference in regards to the shape of

sinewave score for lvFPPA patients between two time

points.

Take home : What statistical methods should I use to

analyze my data?

• Choose appropriate statistical methods/tests

Will cover in next lecture 23rd

October 2018

Suggested Reading

Martin Bland (4th edition): page 117-191

Sterne : page 344-350

•Any questions?

