31 views

Uploaded by stald

Contin Educ Anaesth

Contin Educ Anaesth

© All Rights Reserved

- EC303 – Assignment 3
- Basic mathematics
- Measures of Central Tendency and Variability
- Quiz 1 Solution
- ECON1203 Hw Solution week04
- ma12lt3
- dpm-sigma
- Pandas Summarized Visually in 8
- Practical No.2
- Math 3
- probabilityfound00galtuoft
- Fredda- Lesson Plan
- Sample Size Calculator
- Data Handling 2
- 1st Sem Feb 2013
- Frank h. Allen j. Chem. Soc Perkin Trans. II 1987
- diagnostic test
- 36_1
- show20you20know201
- (2) Stat-324 (1)

You are on page 1of 4

spread of data

Anthony McCluskey BSc MB ChB FRCA

Abdul Ghaaliq Lalkhen MB ChB FRCA

Central tendency

Examining the raw data is an essential rst step

before proceeding to statistical analysis.

Thereafter, two key sample statistics that may

be calculated from a dataset are a measure of

the central tendency of the sample distribution

and of the spread of the data about this central

tendency. Inferential statistical analysis is

dependent on a knowledge of these descriptive

statistics. In the rst article of this series, types

of data and correlations were discussed.

1

Different measures of central tendency

attempt to determine what might variously be

termed the typical, normal, expected or average

value of a dataset. Three of them are in general

use for most types of data: the mode, median,

and mean.

The mode

Literally, the mode is strictly a measure of the

most popular (frequent) value in a dataset and

is often not a particularly good indicator of

central tendency. Despite its limitations, the

mode is the only means of measuring central

tendency in a dataset containing nominal catego-

rical values. For example, in a survey of 10

senior house ofcers (SHOs) asked which form

of continual professional development (CPD)

activity they preferred in preparation for a

forthcoming examination, the following

responses were obtained: viva practice 5, tutor-

ial 3, in-theatre teaching 2, lecture 0. The mode

of this dataset is viva practice as it is the

largest (most popular) category. We might say

that a typical SHO prefers viva practice and

plan the CPD time accordingly.

The mode may also be used for ordinal cate-

gorical data and for interval data, although the

median or mean are more useful in these circum-

stances. For example, suppose a pilot study is

undertaken to determine the severity of pain on

injection of propofol in 10 patients and an ordinal

verbal pain score system between 03 is used.

The pain scores observed are: 0, 1, 1, 2, 2, 2, 2, 3,

3, 3. The mode of this dataset is a pain score of 2.

The median

The median is dened as the central datum

when all of the data are arranged (ranked) in

numerical order. As such, it is a literal measure

of central tendency. When there are an even

number of data, the mean (see below) of the

two central data points is taken as the median.

For the distribution of pain scores described

above, the median pain score is again 2.

The median may be used for ordinal categ-

orical data and for interval data. When analysing

interval data, the median is preferred to the

mean when the data are not normally (symmetri-

cally) distributed, as it is less sensitive to the

inuence of outliers.

The mean

The mean is used to summarize interval data.

As the mean may be inuenced by outlying

data points, it is best used as a measure of

central tendency when the data is normally

(symmetrically) distributed. Although several

different means are dened, the arithmetic

mean is most commonly used. The arithmetic

mean is calculated by adding all the individual

datum values in a dataset (x

1

x

2

. . . x

n

)

and dividing by the number of values (n) in the

dataset.

The mean of ordinal categorical data is

often reported in the literature (together with its

associated measure of data spread, SD). For

example, in the sample of verbal pain scores

above, the mean score is 1.9. The use of mean

and SD for ordinal data is controversial. For

example, what does a pain score of 1.9 actually

mean when using a categorical scale?

Measurement of spread of

data (variability)

Once again, the rst step in assessing spread of

data is to examine it in either a table or an

appropriate graphical form. A graph often

makes clear any symmetry (or lack of it) in the

spread of data, whether there are obvious atypi-

cal values (outliers) and whether the data is

Key points

Two key sample statistics

are measures of central

tendency and the spread of

data about this central

tendency.

The mode, median, and

mean are used to describe,

respectively, the central

tendency of categorical

data, interval data that is

not normally distributed,

and normally distributed

interval data.

The most important

distribution in stastistical

analysis is the normal

(Gaussian) distribution,

dened by its mean and SD.

The normal distribution is

characterized by its

unimodal, symmetrical, bell-

shaped frequency curve.

Anthony McCluskey BSc MB ChB

FRCA

Consultant

Department of Anaesthesia

Stockport NHS Foundation Trust

Stepping Hill Hospital

Stockport SK2 7JE

UK

Tel 44: 0161 419 5869,

Fax: 0161 419 5045,

E-mail: a.mccluskey4@ntlworld.com

(for correspondence)

Abdul Ghaaliq Lalkhen MB ChB FRCA

Specialist Registrar

Department of Anaesthesia

Royal Lancaster Inrmary

Ashton Road

Lancaster LA1 4RP

UK

127

doi:10.1093/bjaceaccp/mkm020

Continuing Education in Anaesthesia, Critical Care & Pain | Volume 7 Number 4 2007

& The Board of Management and Trustees of the British Journal of Anaesthesia [2007].

All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

b

y

g

u

e

s

t

o

n

J

u

l

y

3

0

,

2

0

1

4

h

t

t

p

:

/

/

c

e

a

c

c

p

.

o

x

f

o

r

d

j

o

u

r

n

a

l

s

.

o

r

g

/

D

o

w

n

l

o

a

d

e

d

f

r

o

m

skewed in one direction or the other (a tendency for more values

to fall in the upper or lower tail of the distribution).

Range

The simplest measure of data variability is the range, dened

simply as the interval between the highest and lowest values in a

distribution. It is of limited practical use in statistical analysis as it

is obviously profoundly inuenced by extreme outliers.

Percentiles

When a dataset is arranged in order of magnitude, it may be

divided into 100 separate cut-off points ( percentiles). The xth per-

centile is dened as a cut-off point such that x% of the sample has

a value equal to or less than the cut-off point. For example, the

35th percentile splits the data up into two groups containing,

respectively, 35 and 65% of the data.

Quartiles are used most commonly, i.e. lower (25th percentile),

middle (50th percentile or median), and upper (75th percentile).

They split the data into four equal groups. The interquartile range

(IQR) is often quoted when referring to interval data that is not

normally distributed. If the 25th percentile value (lower quartile)

of a dataset is 10 and the 75th percentile value (upper quartile) is

40, the IQR may be expressed as either 1040 or simply as

30. Percentiles and quartiles may be estimated from a cumulative

frequency curve.

A useful graphical representation of the distribution of interval

data is the box and whisker plot. For example, the box and whisker

plot in Figure 1 has been produced from the marks obtained by a

cohort of candidates taking an examination. The upper and lower

limits of the box (hinges) represent the upper and lower quartiles,

respectively. The horizontal line inside the box is the median and

the whiskers represent extreme values, in this case the 10th and

90th percentiles. Any further outliers are represented by asterisks.

The normal distribution

The most important and useful distribution of data in statistical

analysis is the normal or Gaussian distribution (Fig. 2). It is also

often referred to as a parametric distribution because two key par-

ameters which fully describe its shape can be dened (the mean

and SD). A normal distribution is characterized by a unimodal,

symmetrical, bell-shaped curve when interval data are represented

by a histogram or line graph.

Much biological data such as height and mean arterial blood

pressure in healthy adults are normally distributed. In clinical

trials, sample data drawn from such a population will also

follow a normal distribution provided the sample size is reason-

ably large (e.g. .100 in each group). However, it is not actu-

ally necessary for sample data to follow a normal distribution

in order to subject the data to parametric statistical analysis

(which is often the case with the smaller sample sizes described

in clinical studies). Rather, it is necessary for the sample data

to be compatible with having been drawn from a population,

which is normally distributed. Thus, although visual inspection

of the frequency curve is useful and should always be under-

taken, with smaller sample sizes it may not be obvious that

the sample data is compatible with normally distributed popu-

lation data. Data can be subjected to formal statistical analysis

for evidence of normality using a variety of tests (e.g. Shapiro

Wilkes test, DAgostinoPearson omnibus test). However, these

tests should be used with caution for very small samples as

they may then give false positive results. If in doubt as to

whether data is normally distributed or not, it is safer to use

non-parametric inferential statistical analysis which is not based

on any assumptions about the shape of the frequency distribution

curve.

The normal distribution shown in Figure 2 has a mean of 20

and a SD of 4. The mean positions the frequency curve on the

x-axis. The spread (width) of the curve around the mean is deter-

mined by its SD. The shape of any normal distribution frequency

curve is entirely described by these two parameters. The centre of

the distribution occurs at the zenith and all three measures of

Fig. 2 The normal distribution [mean (SD) 20 (4)].

Fig. 1 Box and whisker plot of the examination marks. The data do not

follow a normal distribution.

Statistics: Central tendency and spread of data

128 Continuing Education in Anaesthesia, Critical Care & Pain j Volume 7 Number 4 2007

b

y

g

u

e

s

t

o

n

J

u

l

y

3

0

,

2

0

1

4

h

t

t

p

:

/

/

c

e

a

c

c

p

.

o

x

f

o

r

d

j

o

u

r

n

a

l

s

.

o

r

g

/

D

o

w

n

l

o

a

d

e

d

f

r

o

m

central tendency (mode, median, and mean) are equal and

described by the zenith.

For a population, the SD is calculated by summing the squares

of all the individual differences of each datum value from the

mean, then by calculating the mean of this value, and nally by

calculating the square root. The process of squaring the differences

followed by taking the square root results in all of the differences

being converted into positive values. Otherwise, the SD would

always be zero.

SD

P

in

i1

x

i

m

2

n

s

;

where m population mean and n population size.

As clinical research rarely deals with populations but with

(random) samples drawn from the population, the SD of a sample

is:

SD

P

in

i1

x

i

x

2

n 1

s

;

where x sample mean and n sample size.

It is expressed in the same units as those of the mean from

which it is derived.

For parametric (normally distributed, symmetrical) data, the

mean and SD are the appropriate measures of central tendency and

variability of the data. For non-parametric data, the median is the

appropriate central tendency measure and the IQR is the appropri-

ate measure of the variability of the data.

In a normal distribution, approximately two-thirds of the data

(68%) lie within+1 SD of the mean, approximately 95% of the

data lie within+2 SD of the mean (sometimes quoted more accu-

rately as +1.96 SD) and approximately 99.7% of the data lie

within +3 SD of the mean.

Several other sample statistics related to the sample SD are often

used in statistical analysis, e.g. sample variance is dened as SD

2

,

standard error of the mean (SEM) SD/

n

p

The standard normal distribution

The standard normal distribution is dened as having a mean 0

and SD 1. Any normal distribution may be converted into the

standard normal distribution according to the formula: z (x

i

2

m)/s, where x

i

is a datum value from the original distribution, m is

the mean of the original normal distribution, and s is the SD of

original normal distribution. The standard normal distribution is

therefore sometimes referred to as the z-distribution. A z-value

indicates the number of SDs, a datum value is above or below the

mean.

The standard normal distribution may be useful in comparing

different normal distributions. For example, suppose we wish to

consider two candidates for a job by comparing their performance

in a qualifying examination set by different examination boards.

They might both have scored 60% but it would not be surprising if

the mean and SD of the marks obtained in each of the examinations

was different. The raw marks obtained by candidates may be con-

verted into z-values, equivalent to the number of SDs that the

scores are from the mean of zero, according to the equation:

z-value (x

i

2 m)/s, where x

i

a candidates raw score, m

mean mark of the population of all the candidates in each particu-

lar examination, and s SD of the population of all the marks

obtained in each particular examination.

If candidate 1 scored 60% in an examination with a mean

mark of 70% and SD of 10%, the corresponding z-value mark is

21. His performance was 1 SD below the mean for that examin-

ation, i.e. his mark was at the level of the 16th percentile [50 2

(68/2)] for his cohort. If candidate 2 scored 60% in an examination

with a mean mark of 40% and SD 10%, the corresponding z-value

mark is 2. His performance was 2 SD above the mean for that

examination, i.e. his mark was at the level of the 97.5 percentile

[50 (95/2)] for his cohort.

Even with the same pass mark, candidate 2 is seen to have per-

formed better. In this context, the z-values are equivalent to the

candidates standard normal scores in their examinations. Strictly

speaking, candidate 2 performed better in relation to his cohort,

not to an absolute standard. If the two cohorts were of widely dif-

fering general ability then our conclusion that candidate 2 was

more able based on his better standard normal score might be

invalid.

Standard deviation vs SEM

When the results of statistical analyses are reported, the SD and

SEM are sometimes used inappropriately. For example, authors

may quote the range of their sample data as mean +SEM rather

than mean+SD. The temptation to do this follows from the fact

that by denition, the SEM decreases as sample size increases

(SEM SD/

n

p

). When statistics comparing two or more groups

are quoted in this way, any differences between them appear more

signicant than when the SD is quoted. The SD should always be

used when describing the variability of the actual sample data. The

SEM is used specically to describe the precision of the sample

mean, i.e. how far is the sample mean from the population mean.

The 95% condence interval (see future article) for the population

mean sample mean +2 SEM.

Skewness and kurtosis

Two terms may be dened with reference to the shape of a fre-

quency curve, kurtosis and skewness. Kurtosis describes the peak-

edness of the curve whereas skewnesss describes the symmetry of

the curve. A positive kurtosis indicates a frequency distribution

with a sharper peak and a longer tail than a normal distribution; a

negative kurtosis indicates a wide, attened distribution. The stan-

dard normal distribution has a kurtosis of zero. If a frequency

curve has a longer upper tail, the data is positively skewed; if the

data has a longer lower tail, it is negatively skewed.

Statistics: Central tendency and spread of data

Continuing Education in Anaesthesia, Critical Care & Pain j Volume 7 Number 4 2007 129

b

y

g

u

e

s

t

o

n

J

u

l

y

3

0

,

2

0

1

4

h

t

t

p

:

/

/

c

e

a

c

c

p

.

o

x

f

o

r

d

j

o

u

r

n

a

l

s

.

o

r

g

/

D

o

w

n

l

o

a

d

e

d

f

r

o

m

The distribution shown in Figure 3 is positively skewed. It is

evident from the frequency curve that the mean has been shifted to

the right by the skewed data; although the median has also shifted,

it has been inuenced less by the outliers on the upper tail and is

the appropriate measure of central tendency for skewed distri-

butions. Biological data not infrequently follows such a distri-

bution, examples being adult weight, white cell count of healthy

individuals, and serum triglyceride concentrations. Negatively

skewed data is relatively uncommon.

When faced with a sample that comprises non-normally

distributed (skewed) data, there are two choices: to accept the dis-

tribution as it is and use non-parametric inferential statistical

analysis or to attempt to transform the data into a normal distri-

bution. Common methods of transforming skewed data into a

normal distribution are logarithmic, square root, and reciprocal

transformations.

Acknowledgements

The authors are grateful to Professor Rose Baker, Department

of Statistics, Salford University for her valuable contribution in

providing helpful comments and advice on this manuscript.

Bibliography

1. McCluskey A, Lalkhen AG. Statistics I: Data and correlations. CEACCP

2007; 7: 9599

2. Bland M. An Introduction to Medical Statistics, 3rd Edn. Oxford: Oxford

University Press, 2000

3. Altman DG. Practical Statistics for Medical Research. London: Chapman &

Hall/CRC, 1991

4. Rumsey D. Statistics for Dummies. New Jersey: Wiley Publishing Inc.,

2003

5. Swinscow TDV. Statistics at Square One. http://www.bmj.com/statsbk

(accessed 14 June 2007)

6. Lane DM. Hyperstat Online Statistics Textbook. http://davidmlane.com/

hyperstat/ (accessed 14 June 2007)

7. SurfStat Australia. http://www.anu.edu.au/nceph/surfstat/surfstat-home/

surfstat.html (accessed 14 June 2007)

8. Elwood M. Critical Appraisal of Epidemiological Studies and Clinical

Trials, 2nd Edn. Oxford: Oxford University press, 1998

Please see multiple choice questions 1417

Fig. 3 Positively skewed data.

Statistics: Central tendency and spread of data

130 Continuing Education in Anaesthesia, Critical Care & Pain j Volume 7 Number 4 2007

b

y

g

u

e

s

t

o

n

J

u

l

y

3

0

,

2

0

1

4

h

t

t

p

:

/

/

c

e

a

c

c

p

.

o

x

f

o

r

d

j

o

u

r

n

a

l

s

.

o

r

g

/

D

o

w

n

l

o

a

d

e

d

f

r

o

m

- EC303 – Assignment 3Uploaded bytracer14
- Basic mathematicsUploaded byRanjan Singh Garhia
- Measures of Central Tendency and VariabilityUploaded bySherry Lou Pacuri Consimino
- Quiz 1 SolutionUploaded byrahul84803
- ECON1203 Hw Solution week04Uploaded byBad Boy
- ma12lt3Uploaded byfrances leana capellan
- dpm-sigmaUploaded byfrich1662
- Pandas Summarized Visually in 8Uploaded byqwerty
- Practical No.2Uploaded bymahsan12
- Math 3Uploaded byborsio10
- probabilityfound00galtuoftUploaded byuncleretro
- Fredda- Lesson PlanUploaded byLhinever Gilhang
- Sample Size CalculatorUploaded byPadmakar29
- Data Handling 2Uploaded bySuperheavy Rockshow Go't
- 1st Sem Feb 2013Uploaded byMichael Solis
- Frank h. Allen j. Chem. Soc Perkin Trans. II 1987Uploaded byYanethPacheco
- diagnostic testUploaded byMarilyn Murcia Pomoy
- 36_1Uploaded byMahyar Zobeidi
- show20you20know201Uploaded byapi-246652180
- (2) Stat-324 (1)Uploaded byFajobi Abeeb
- Stats ReviewUploaded byitybitytitys
- activity 1 adamskiaUploaded byapi-333861301
- Mathematical Investigation and Modelling (1)Uploaded byMaria Jocosa
- Mean, Median, Mode... MATH InvestigationUploaded byMaria Jocosa
- ISDS Sample Test SolutoinsUploaded byJoseph
- steams 2017-14 ieom borgota a demonstration of the central limit theorem using java programUploaded byapi-448507449
- Smart StatisticUploaded byAzzati Manap
- Explore NyaUploaded bypene asoy
- Normal DistributionsUploaded byZaid Ibrahim
- 2003 P2Uploaded bysewcin

- Biodiversity.pdfUploaded bystald
- 1985 COSGROVE, D. Perspective and the Evolution of the Landscape Idea.pdfUploaded byPietro de Queiroz
- Lec 6 Vector TopologyUploaded bystald
- Factsreports 282 Vol 3 Ecosystemic Forest Management Approach to Ensure Forest Sustainability and Socio Economic Development of Forest Dependent Communities Evidence From Southeast CameroonUploaded bystald
- KennedyUploaded bystald
- How to Select Appropriate Statistical TestUploaded bydrquan
- HCV Briefing Note High ResUploaded bystald
- Euroforester Graduate Survey_Gosia and Vilis_2008Uploaded bystald
- Learn Swedish Words - Core 100 List.pdfUploaded bystald
- Otto von Bismarck.pdfUploaded bystald
- SFM in GermanyUploaded bystald
- Stats for DummiesUploaded byAzuanee Azuanee
- bulletin29_3Uploaded bystald
- InTech-A Data Mining Amp Knowledge Discovery Process ModelUploaded bystald
- DocumentUploaded bystald
- Salvage Foster06Uploaded bystald
- bulletin30_3Uploaded bystald
- 07 KaplanUploaded bystald
- Grillmayer_Lc_Model.pdfUploaded bystald
- Phenomenon.pdfUploaded bystald
- bulletin30_2Uploaded bystald
- bulletin30_4Uploaded bystald
- Anatol Rapoport - Wikipedia, The Free EncyclopediaUploaded bystald
- bulletin31_1Uploaded bystald
- bulletin30_1Uploaded bystald
- bulletin29_2Uploaded bystald
- bulletin29_1Uploaded bystald
- bulletin28_1Uploaded bystald
- Bulletin27 2 SupplementUploaded bystald

- Kaysons EducationUploaded byvarunbansalji
- paul sabatini cvUploaded byapi-253258406
- teaching philosophy updatedUploaded byapi-310906503
- From(Anshul Chandra (02.Anshul@Gmail.com))_ID(1013)_Affairmative ActionUploaded byRahull Dandotiya
- CrtUploaded byyankumi07
- hildah curriculum map final docxUploaded byapi-302059687
- detailed lesson plan in ictUploaded byMeydem Eelied
- cooperative discipline paperUploaded byapi-239308420
- Szelest - Psycholinguistic and Sociolinguistic PerspectivesUploaded byAnonymous T5PT3U
- resume_tyea.docUploaded byTanner Yea
- ISSUES IN ICT ESSAYUploaded bySyafinas Aslan
- Chapter 1Uploaded byLee Apple Guinitaran Lumpay
- "Information and Communications Technology (ICT) Facilities Utilization for Quality Instructional Service Delivery among University Lecturers in Nigeria" by B A Akuegwu, E P Ntukidem, P J Ntukidem, G. JajaUploaded byKolapo
- Annikin Tarot SpreadsUploaded byCalliaste
- Information on Available ScholarshipUploaded byJay T
- Proposal Format.pdfUploaded byDeepu P.Thomas
- More Than 100 Million Women Are Missing by Amartya SenUploaded byspoonfork789
- Jaros ICFAI 2007 Meyer and Allen1Uploaded byspinrad06
- An Analysis of Memory Retrieval and Performances of Physiotherapy Exercises in Non-specific Low Back Pain Patients. Disha Jacob, Varoon c Jaiswal Srji Vol 3 Issue 2 Year 2014Uploaded byDr. Krishna N. Sharma
- Eop Assignment q(Job E-portfolio)Uploaded byAzman Syafri
- Product and Process WritingUploaded byStella María Caramuti Picca
- A Study of the Relationship Between Organizational Performance and Executive CompensationUploaded bydiabloshater
- Safety Management Systems in Aviation MaintenanceUploaded byjose lourenco
- DocumentUploaded byfernandes soares
- Journal of the American Academy of Dermatology Volume 71 Issue 6 2014 [Doi 10.1016%2Fj.jaad.2014.06.015] Bronsnick, Tara; Murzaku, Era Caterina; Rao, Babar K. -- Diet in DermatologyUploaded byxjorgwx
- 5 Sense Lesson Plan2Uploaded bybashayer
- ecosystems inquiry project rubricUploaded byapi-268491266
- Heidegger Aff v2 - DDI 2014 TWUploaded byareedades
- History of GTMUploaded byWanie Piee
- Complete Report Draft 200612Uploaded byMAliAkbar