A. Investigating probability

• To start StatsDirect go to Start > All Programs > Core > Statistics > StatsDirect2006

• When StatsDirect opens, click Cancel.

The mean FEV1 in 57 male medical students is 4.06 litres with a standard deviation of 0.67 litres.

What is the probability of one of the students having an FEV1 of 3.57 litres or less?

z = σ = 0.67

Constructing a formula • Type (3.57-4.06)/0.67 into the Expression to evaluate box.

in StatsDirect:

Click Calculate

The formula will use one

or more of these

• Then click Save > Close. Click Yes to copy saved results to

mathematical operators: report.

^ exponentiation You need to convert the answer for z into a percentage to find

(raise to the the probability:

power of)

* multiplication • Click Analysis > Distributions > Normal

/ division

• Type the z value you’ve calculated into the Normal deviate (z)

+ addition box

- subtraction

• Click Calculate. The lower tail P value gives the probability of

a medical student having an FEV1 of 3.57 litres or less.

Question: What is the probability (as a percentage) that the student will have an FEV1 of

4.55 litres or less?

• Click Save > Close > Select to send the result of the analysis to the report.

3. Use the z score to calculate the interquartile range of the FEV1 values (the middle

half of the values)

interquartile range:

• Type 0.25 in the lower tail P value box and click

Calculate

• Click Save > Close > Select

http://www-users.york.ac.uk/~mb55/intro/refint.htm

• Click Tools > Calculator and type the values for µ + z*σ into the Expression to

evaluate box, using the z value for the lower 25% and the mean and standard deviation in

the equation above.

• Remember to save your result to the report.

• The result is the FEV1 in litres at the lower end of the interquartile range (ie 25% of the

medical students will have this FEV1 or lower).

• Click Tools > Calculator and type the values for µ - lower 25% + µ

4. Use the z score to find the number of asthmatics with obstructive airway disease

The lung function in a sample of 34 similar aged male asthmatics was investigated. The mean

FEV1 was 3.78 litres with a standard deviation of 0.74. The obstructive airway FEV1 is set at

2.85 litres or less.

a) Use the equation above to work out the proportion of asthmatics with an FEV1

that indicates an obstructive airway in this group.

B Investigating normal distribution and confidence intervals

Here you are going to work with data from your lung function practical class, to investigate

height and peak expiratory flow rate. The sex has been coded: male = M; female = F.

Go to Blackboard > Discussions > Lung function practical and download the file Lung

function 2009.sdw to My Documents.

• In StatsDirect, click on File > Open File > Look in > My Documents > Lung function

2009.sdw. Save the workbook to My Documents.

• Plot a histogram for height: click on B at the top of the height column then click on Graphics

> Histogram. If you approve the title, click OK or change the title. Click OK on Histogram Bin

Setup then Yes to overlay a normal curve, OK on Histogram unless you want to change the

axis title. Select output destination.

• The data should now look something like this:

• Warning!!! If you try to print the data file when any part of the workbook is highlighted this will

jam up the printer.

• Split the data for height and PEFR by sex: Click Data > Grouping > Split. Select column

A (sex) and click OK. To select the data you want to split click on column B (Height). Click

OK. You will now be asked if it is 2 groups. Click OK. Put the output in a new sheet. Split

the data for PEFR and put it on the same new sheet.

Questions:

2. Is- there a think

Do you difference in maleshow

the histograms and that

female PEFR?

the data for height are normally distributed?

- Are there any outliers? Could these prevent the data showing a normal distribution?

What test should you use to compare the means of male and female PEFR?

Clues: Is the data parametric or non-parametric; which Student’s t test should you use? (Look

at the audio lecture in Case 3 for help here.)

• Click Analysis and select the correct test then highlight the column containing the PEFR

values for females, hold down the control key and highlight the column for male PEFR.

Click OK and the test will be carried out. Click Select to send the results to the report.

The report should look like this: How can you tell if there is a significant

difference in male and female PEFR?

Mean of PEFR~Sex=F =

Mean of PEFR~Sex=M = The first two lines of the report show the

means, followed by 2 alternative outputs of the

Assuming equal variances data:

Combined standard error =

df =

t= o Assuming equal variance

One sided P o Assuming unequal variance

Two sided P

Choose the output after looking at bottom of the

95% confidence interval for difference report, Comparison of variances which will

between means = show one of those below:

Power (for 5% significance) >

Combined standard error =

df =

t(d) =

One sided P

Two sided P

between means =

Comparison of variances

Question:

between means tell you?

3. Are height and PEFR

related?

• Click on the male height column for the x axis, click OK

• In Chart settings, change the title to Male height v PEFR, click OK, and send it to the

report.

Question: Are there any outliers in either male or female samples? What effect could they

have on the result?

Correlation is a method to establish whether there is a relationship between the two variables (eg

height and PEFR).

• Click Analysis > Regression and correlation > Simple Linear & Correlation.

Simple linear regression (example)

• Input:

Equation: PEFR~Sex=M = 4.255232 Height~Sex=M

º Outcome variable = PEFR column

-172.162076

º Predictor variable = height column

Standard Error of slope = 1.071842

• Standard Error of slope measures

the slope’s variability

•95% CI for

Click population

Plot value ofclick

Regression, slope = 2.133586

Plot,to to the regression

produce line on a scatter plot.

6.376879 The slope measures the relationship

Analysing the results: between the variables

Correlation coefficient (r) = 0.337022 (r² = 0.113584)

• 95% CI shows a linear relationship

95% CI for r (Fisher's z transformed) = 0.171569 to between the variables unless 95% CI

0.483986 includes a zero (0) = no relationship.

Two sided P = 0.0001 between –1 to +1. Zero = no linear

Power (for 5% significance) = 97.12% relationship, minus = negative

relationship.

Correlation coefficient is significantly different from zero

Question: Why does the analysis calculate a two-sided probability (P) rather than a one-sided P?

• Plot the regression for both the male and female data of height v PEFR

Questions:

Do the 95% confidence intervals indicate that there is a linear relationship between height

and PEFR for the males and females?

What does the correlation coefficient tell us about the type of relationship? Is it a positive or a

negative correlation?

• In the simple linear regression report for females, click on the button Interpolate x to y,

type in the height 162 and click Calculate.

• The predicted PEFR will appear in the output.

• Repeat this to calculate the predicted PEFR for males of the same height.

Question:

Do the 162cm tall males and females in the lung function practical have identical

predicted PEFR?

6. Are height and PEFR normally distributed?

• Use the Shapiro-Wilk test to check for normal distribution✸

• Click Analysis > Parametric > Shapiro-Wilk

• Highlight the column, click OK, and click OK to send output to the report.

• Check all your samples (male, female, height, PEFR) for evidence of non-normality

The student’s t-test is very robust and can cope with non-normal tell you if a distribution

is normal; it can only

distributions for independent, unpaired samples unless there is a

indicate that a sample

significant difference in the variance of the two samples. does not have a

StatsDirect tests for this and gives an appropriate warning. normal distribution.

Questions: Look at your analysis of the difference in male and female PEFR.

1. Was there a significant difference in the variance?

2. If there was a difference, which test would StatsDirect suggest that you should

try instead of the t-test?

3. What is the type of test that you would use if you could not use the t-test?

recorded before and after the use of a salbutamol inhaler with or without a

spacer, or a placebo.

- P: Placebo inhaler

- I: Salbutamol inhaler

- S: Salbutamol inhaler with spacer

• Download and open the Salbutamol 2009.sdw file in >Blackboard > Discussions

> Lung function practical.

• Split the PEFR before data into the three groups. Do the same for the PEFR after data

and put them into the same worksheet.

• Choose an appropriate t-test to compare the difference before and after treatment,

and use it for each of the three groups. (After highlighting the first column, hold down the

control key and click on the second column.)

Your results should look something like this:

For differences between PEFR Before~Type=S and PEFR After~Type=S:

Mean of differences = -22.777778 (n = 36)

Standard deviation = 41.238755

Standard error = 6.873126

df = 35

t = -3.314035

Two sided P = 0.0021

Questions:

Is there any evidence that the salbutamol

had an effect on PEFR?

What is the effect of using the placebo?

• First, you must subtract the before results from the after results:

o Click Data > Apply Function and highlight the after column and then the before

column

o Type v1-v2 into the Apply function to data box.

o Do this for all three groups, P, S and I.

• Choose an appropriate t-test to compare the difference between salbutamol with spacer

and the salbutamol inhaler alone.

Question:

Is there any evidence for a difference between the salbutamol with spacer

or salbutamol inhaler alone?

