You are on page 1of 10

Gait & Posture 29 (2009) 360369

Contents lists available at ScienceDirect

Gait & Posture


journal homepage: www.elsevier.com/locate/gaitpost

Review

The reliability of three-dimensional kinematic gait measurements:


A systematic review
Jennifer L. McGinley a,d,*, Richard Baker a,b, Rory Wolfe a,c, Meg E. Morris a,d
a

Centre for Clinical Research Excellence in Clinical Gait Analysis and Gait Rehabilitation, Murdoch Childrens Research Institute, Royal Childrens Hospital, Melbourne, Australia
Hugh Williamson Gait Analysis Service, Royal Childrens Hospital, Melbourne, Australia
c
Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Australia
d
School of Physiotherapy, The University of Melbourne, Melbourne, Australia
b

A R T I C L E I N F O

A B S T R A C T

Article history:
Received 5 March 2008
Received in revised form 5 September 2008
Accepted 5 September 2008

Background/Aim: Three-dimensional kinematic measures of gait are routinely used in clinical gait analysis
and provide a key outcome measure for gait research and clinical practice. This systematic review
identies and evaluates current evidence for the inter-session and inter-assessor reliability of threedimensional kinematic gait analysis (3DGA) data.
Method: A targeted search strategy identied reports that fullled the search criteria. The quality of fulltext reports were tabulated and evaluated for quality using a customised critical appraisal tool.
Results: Fifteen full manuscripts and eight abstracts were included. Studies addressed both withinassessor and between-assessor reliability, with most examining healthy adults. Four full-text reports
evaluated reliability in people with gait pathologies. The highest reliability indices occurred in the hip and
knee in the sagittal plane, with lowest errors in pelvic rotation and obliquity and hip abduction. Lowest
reliability and highest error frequently occurred in the hip and knee transverse plane. Methodological
quality varied, with key limitations in sample descriptions and strategies for statistical analysis. Reported
reliability indices and error magnitudes varied across gait variables and studies. Most studies providing
estimates of data error reported values (S.D. or S.E.) of less than 58, with the exception of hip and knee
rotation.
Conclusion: This review provides evidence that clinically acceptable errors are possible in gait analysis.
Variability between studies, however, suggests that they are not always achieved.
2008 Elsevier B.V. All rights reserved.

Keywords:
Gait
Gait analysis
Reliability
Reproducibility
Measurement error

Contents
1.
2.

3.

4.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.
Study identication and selection . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.
Data extraction and quality appraisal . . . . . . . . . . . . . . . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.
Sample selection, composition and description . . . . . . . . . . . . . . .
3.2.
Study procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.
Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.
Reliability ndings: overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.
Methodological considerations: participant and assessor samples
4.2.
Methodological considerations: study design and procedures . . . .
4.3.
Methodological considerations: statistical analysis . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

361
361
361
361
361
362
362
362
364
365
366
366
367

* Corresponding author at: Gait CCRE, Murdoch Childrens Research Institute, Hugh Williamson Gait Laboratory, Royal Childrens Hospital, Flemington Rd Parkville, Victoria
3052, Australia. Tel.: +61 3 9345 5354; fax: +61 3 9345 5447.
E-mail address: jennifer.mcginley@mcri.edu.au (J.L. McGinley).
0966-6362/$ see front matter 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.gaitpost.2008.09.003

5.

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

361

Considerations and recommendations for future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

368
368
368

1. Introduction

2. Method
2.1. Study identication and selection

Three-dimensional kinematic gait measurements are used


widely in clinical gait analysis services and clinical research.
Despite the increasing number of gait laboratories, there is limited
cohesive information regarding the reliability of kinematic gait
measurements. Two recent reports [1,2] highlighting betweenlaboratory differences in 3DGA measures have raised concerns
from the wider orthopaedic community [3,4]. This paper presents a
systematic review and qualitative appraisal of the evidence
describing the reliability of three-dimensional kinematic gait data
(3DGA).
The reliability and validity of gait measurements should be
known in order to be used appropriately [5]. As repeated gait
measurements typically show some differences, these can be
assumed to contain a proportion of error. This review addresses
reliability, which is the extent to which gait measurements are
consistent or free from variation. The term error in this paper is
used within the context of reliability and refers to the variation
found across repeated measures. Knowledge and understanding of
typical measurement variation is helpful to guide the use and
interpretation of data.
Clinical gait analysis typically seeks to discriminate between
normal and abnormal gait and to assess change in walking over
time [6]. Repeated gait measurements can be used to evaluate
the response to therapeutic interventions such as surgery,
physiotherapy, medications and orthotics. Variability between
before and after measurements may be due to treatment effects
or measurement variation, or a combination of both. Knowledge of
the error magnitude can enable clinical teams to minimise the risk
of over-interpreting small differences as meaningful, [7] and to
have greater condence that the treatment effect exceeds the
measurement error. Additionally, the use of measurements with
low reliability in clinical research may lead to underestimation or
failure to detect signicant effect sizes; with too much noise (error)
drowning out real effects [8].
The reliability or consistency of 3DGA can be examined in
various ways. Typically multiple walking trials are collected
within a single session. Variability between these trials can be
regarded as intrinsic variation, and reects the inherent variation
within unimpaired individuals or those with pathology [7]. These
intrinsic variations cannot be reduced, yet provide a baseline
indication of variation independent of other error sources. Other
measurement variation arises from extrinsic factors such as
procedural errors [7]. Reliability of data obtained from different
testing sessions conducted by the same assessor (inter-session or
within-assessor) and by different assessors (inter-assessor) is
susceptible to these extrinsic errors. Inconsistent marker placement is generally regarded as a key factor, although other factors
such as inconsistent anthropometric measurements, variation in
walking speed, data processing or measurement equipment errors
may also contribute to data variation [9]. Reliability across
sessions is of immediate relevance to clinical gait analysis
practices; as observations are routinely and regularly repeated
to measure patient performance over time, and different assessors
may conduct tests for an individual patient. This aim of this review
was to identify and critically evaluate the evidence describing the
reliability of lower body kinematic gait data across repeated
sessions.

The search strategy for this review began with retrieval of published reports
indexed on health or biomechanics related electronic databases from MEDLINE
(1970 to July 2007), EMBASE (1980 to July 2007), CINAHL (1982 to July 2007), RECAL
Bibliographic Database (pre-1990 to July 2007) and Inspec (1970 to July 2007). The
search was limited to literature reporting studies of human subjects with abstracts
written in English. The search terms were customised to each database and
included the following keywords; gait, gait disorders, gait analysis, observer
variation, reproducibility of results, and reliability. Bibliographies of identied
papers and relevant conference proceedings were hand searched.
The review was conducted to be of primary relevance to gait laboratories
collecting typical multi-joint lower body gait kinematic data. The titles and
abstracts identied by the initial search strategy were screened by the rst named
author (JM) to identify potentially eligible reports and retrieve full-text reports.
When the title or abstract did not clearly indicate whether an article should be
included then the complete article was obtained and reviewed. Full-text reports
were then evaluated by two authors (JM and RB) for the following inclusion criteria:
(1) reports of the inter-session or inter-assessor reliability of three-dimensional
kinematic gait or running measures of human participants; (2) including at least
three joints of the lower body (pelvis, hips, knees, ankles); (3) reporting numerical
ndings from repeated kinematic data capture from more than one measurement
occasion (with markers replaced each occasion); (4) full papers or abstracts (not
later published as full papers); (5) published with an English abstract.
2.2. Data extraction and quality appraisal
Reports were retained as either full-text reports or published abstracts. A
standardised data extraction and appraisal form was constructed to identify and
detail key features of each study. Two reviewers (JM and RB) initially independently
piloted the form with a small subset of representative studies to conrm the
content and to assess the reliability. The extracted study details focused on
participant characteristics and recruitment, study procedures and biomechanical
models, and the statistical analysis techniques.
The quality of study design and conduct are key elements in evaluating scientic
evidence, with contemporary systematic reviews providing study quality
appraisals in addition to quantitative reviews. Although a large body of literature
exists to provide guidelines for the systematic evaluation of research methodology
[10,11], the majority are focussed primarily upon studies of healthcare interventions, in particular randomized controlled trials. As no standardised or established
guidelines were located for reviews of reliability, a customised quality appraisal
form was developed. The appraisal component was developed to integrate relevant
examples of methodological quality criteria from other systematic reviews of
reliability [1214], and gait classication [15]. Relevant quality themes and
principles were also adapted from quality criteria proposed for the measurement
properties of health status questionnaires [16], and the QUADAS tool used to
appraise studies of diagnostic accuracy [17]. Additionally, an initial expert panel
was formed to consider and dene the data extraction and appraisal criteria for the
study. Quality appraisal indicators were developed into a standardised form to
ensure a structured approach to evaluation of key quality elements and to ensure
equal appraisal of all papers. Appraisal items were not scored as the validity of such
scoring systems is currently unproven [18]. The appraisal criteria included themes
related to external validity such as sampling methods and description, standardisation and description of procedures, and selection of statistical analysis techniques.
Appraisal criteria were not applied to the abstract-only reports because their
brevity limited the provision of methodological detail.
The data extraction and appraisal form were used independently by two
reviewers (JM and RB) to extract key details from each report and to evaluate the
quality of each full-text paper. Any rating disagreements on quality criteria were
checked against the original article to ascertain the correct scoring according to a
pre-dened procedure, in accordance with established and recommended protocols
[15,18].

3. Results
The electronic searches and hand-search of references and
selected conference proceedings yielded a total of 510 articles.
Following the application of the inclusion/exclusion criteria, 23

362

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

studies were identied for inclusion in the systematic review; 15


full papers and 8 abstracts.
Details of the 23 identied studies are provided in Table 1.
Within-assessor reliability was reported in 15 of the studies, and
between-assessor in 10. Four of the studies were described as test
retest, as either the number of assessors was not stated, or it was
uncertain whether the same assessors applied the markers in
repeated sessions [1922]. Gait participant sample sizes ranged
from 1 to 50 (median of 10), with approximately 70% of studies
examining healthy subjects. Three studies included groups of both
healthy and disabled participants [20,22,23]. Thirteen studies
included adults, eight included children and ve did not report the
age of participants. The number of assessors ranged from 1 to 24,
and included physiotherapists and technicians. Commercially
available biomechanical models were most frequently used in
the studies. Measurement sessions commonly numbered two or
three and occurred over time intervals ranging from 2 h to 20
weeks.

reect the heterogeneity of the sample, and allow insight into the
generalisability of the ndings to other populations. The number of
gait participants varied widely across full-text studies from 1 [27]
to 40 [23,28], with 10 reports including more than 10 participants.
Justication for the sample size of gait participants was not
provided in any study.
The sampling method used to recruit assessors or the related
inclusion and exclusion criteria were not reported in any of the
full-text reports. Descriptions of the assessors were generally
poor, with only two studies reporting the desired complete
details including the number of assessors, professional background and experience or training [29,30]. Physiotherapists were
most often reported as the group, with six of the reports also
describing their assessors as either experienced or highly trained
[9,24,2831]. The number of assessors was frequently small,
between one and ve.

3.1. Sample selection, composition and description

The majority of full-text reports appeared to use standardised


measurement protocols on repeated occasions. Of the full-text
reports, twelve studies reported capturing data at self-selected
or normal speed, with one study selecting a xed speed (of
running) [32], and one study not reporting speed [27].
Metronome paced or beep test controlled speed protocols were
also reported in abstracts [33,37]. Associated spatio-temporal
(ST) gait measures or measures of between-session ST variation
were provided in 11 of the 15 reports. Data capture systems
were generally adequately described with most full-text reports
providing adequate overall descriptions of the biomechanical
models used, or providing appropriate reference to available
descriptions. Desirable model-specic details regarding withinmodel options such as Knee Angle Device (KAD) utilisation,
post-testing adjustment of thigh rotation angles and specication of hip joint centre location techniques were inconsistently
provided.
The duration between measurement sessions varied and ranged
from 2 h [24] up to 20 weeks [2], with several studies failing to
state their time interval (e.g. [7,34]). Within-study standardisation
of the measurement interval did not occur uniformly, with some
studies reporting varied time intervals or wide within-study
ranges of 620 weeks [2].

The quality indicators related to sampling methods and


description varied widely across the 15 full-text reports. Key
criteria are reported in Table 2. The sampling method for
recruitment of gait participants was frequently not reported. In
the studies with unimpaired people, it is likely that convenience
sampling was employed. In studies with clinical participants, the
sampling method was also predominantly convenience samples
from local known clinical populations [2,21,23]. Yavuzer et al. [24]
sought consecutive patients with stroke who fullled the study
criteria.
The study inclusion/exclusion criteria for gait participants were
stated in around half of the full-text reports and varied greatly in
detail. Of the studies including healthy, unimpaired or normal
subjects, seven specied inclusion/exclusion criteria such as the
absence of previous musculo-skeletal, neurological or other
conditions that may affect gait. A single study chose to recruit
only male healthy participants to minimise any potential inuence
of gender variation on the ndings [25]. Another study selected
participants older than 16 in order to minimise potential variation
due to adolescent growth and variable walking velocity [9]. Of the
three full-text studies including participants with CP, common
inclusion criteria included age, type of CP, ability to walk
independently, and adequate cognition to cooperate with gait
analysis. Required gait ability varied from the ability to walk
without an orthosis, [2] to the ability to walk continuously for
15 min without walking aids or orthoses [23]. Noonan et al. [2]
offered study participation to a wide range of subjects, including
subjects from mild to severe disability, both pre- and postoperatively, with and without bracing and with varied distribution
of spasticity. This sample seems likely to be representative of a
typical gait laboratory population, although it is uncertain whether
the patients with prior surgery had stabilised prior to study
inclusion. Children with prior therapeutic intervention (therapy/
casts/surgery) were excluded in the study by Steinwender et al.
[23].
The quality of the descriptions of the gait participants also
varied markedly across the full-text reports. Gait participants were
fully described in terms of age, gender, health status and
anthropometric characteristics in only seven studies. Relevant
pathology-specic detail was provided in the descriptions of the CP
and stroke participants, although only one of the three studies
including subjects with CP reported the Gross Motor Function
Classication System (GMFCS) [26] to detail gait ability. Adequate
clinical participant descriptors are particularly important as they

3.2. Study procedures

3.3. Statistical analysis


The coefcient of multiple correlation (CMC), or coefcients
of multiple determination (CMD) were used in eight of the 23
studies. These techniques examine consistency across the entire
gait cycle and are expressed as an index of agreement between 0
and 1. Intra-class correlations (ICCs) were reported in six
studies. Various forms of ICCs are available for different study
designs and different methods of estimation exist with
corresponding differences in the underlying assumptions and
generalisability [35].
Absolute measures of measurement variation in degrees were
also included in numerous reports, including standard deviations
(S.D.), standard error (S.E.), range, mean absolute difference and
Bland and Altman limits of agreement. The majority of studies used
techniques to examine the reliability of the entire kinematic curve,
with others choosing to examine selected key kinematic peaks,
amplitudes or events [9,24,32,34,36,37]. Many authors also
presented within-session reliability data. Schwartz et al. and
Murphy et al. [7,38] reported sources of variance related to withinsession (inter-trial), inter-session (within-assessor) and intertherapist (between-assessor).

Table 1
Characteristics of the identied studies of the reliability of 3DGA data.
Study

Biomechanical model

Participant characteristics
(n, age (years), type, gender)

Assessor characteristics
(n, profession)

Type of reliability, session number and interval

Statistical analysis

Besier et al. [31]

n = 10, Age: NS, Able-bodied, 6


M;4 F

n = 5, Discipline: NS

Inter-session (W-Ass), 2  sessions,


Inter-assessor, Interval: 4 h

Analysis across GC curve: CMD, Average


systematic error

Charlton et al. [27]

Custom models; Anatomical


landmark model, Functional
joint model
VCM, OLGA

n = 1, Age: NS, Healthy

n = 3, Discipline: PT

Cowman et al. [37] (A)

Bilateral CODA mpx30

n = 2, Age: 9, 21 years, Normal

n = 2, Discipline: PT

Eve et al. [40] (A)

PiG

n = 10, Age: 31.4, Healthy

Analysis across GC curve: S.D. of averaged joint


angle
Kinematic key points selected for analysis:
Error indices (% of 95% condence ranges)
Analysis across GC curve: S.E.

Ferber et al. [32]

Unilateral model MOVE3D

Gok et al. [25]

VCM

Gorton et al. [19] (A)


Gorton et al. [1] (A)

VCM
Vicon and Motion Analysis
Corporation software
Vicon and Motion Analysis
Corporation software
21 marker model similar to
Kabada model with ANALYZE
software.
Conventional gait model

n = 20, Age: 21.4 (mean),


Healthy, 7 M; 13 F
n = 11, Age: 32 (mean), Healthy,
11 M
n = 50, Age: 516, Normal
n = 1, Age: NS

n = 1, Discipline:
therapist
n = 1, Discipline: NS

Inter-session (W-Ass), 3  sessions,


Inter-assessor, Interval: NS
Inter-session (W-Ass), 3  sessions,
Inter-assessor
Inter-session (W-Ass), 2  sessions,
Interval: Within a week
Inter-session (W-Ass), 2  sessions,
Interval: 1 week
Inter-session (W-Ass), 2  sessions,
Interval: 3 days
Inter-session (test-retest), Interval: >1 week
Inter-assessor, Interval: within a 3 month
period
Inter-assessor, Interval: within 1 month

Growney et al. [34]

Kadaba et al. [28]


Leardini et al. [29]#

Maynard et al. [36]

Custom model: Anatomically


based protocol
Cleveland clinic marker set, EvA
and Orthotrak (Motion Analysis
Software)
CODA mpx30 model

Miller et al. [20] (A)

Modied Helen Hayes

Monaghan et al. [9]

Unilateral CODA

Murphy et al. [38] (A)


Noonan et al. [2]

Conventional biomechanical
model
VCM, CCM/Orthotrak

Quigley et al. [22] (A)

NS

Steinwender et al. [23]

Conventional gait model

Schwartz et al. [7]

VCM

Tsushima et al. [30]

VCM

Yavuzer et al. [24]

VCM

Mackey et al. [21]

n = 5, Age: NS, Normal, 3 F; 2 M

n = 40, Age: 1840, Normal

n = 1, Age: NS

Linear mixed model, S.D. and range

Inter-session (W-Ass), 3  sessions,


Interval: NS; 3 separate days

Analysis across GC curve: CMC, S.D.

n = 1, Discipline = NS

Inter-session (W-Ass), 3  sessions >1 week

n = 1 Age: 7, Healthy, F

n = 5, Discipline: PT

Inter-assessor, Interval: NS

Analysis across GC curve: CMC with mean


subtracted
Analysis across GC curve: Averaged S.D.

n = 10, CP; Age: 6 M; 9+/ 4, 4 F;


12+/ 3

n = NS, Discipline = NS

Inter-session (test-retest), 2  sessions,


Interval: 1 week

Analysis across GC curve: CMC (S.D.), Mean


absolute difference (S.D.)

For inter-session: n = 10, Age:


39.2 (mean) 5 M 5 F, For
inter-assessor: n = 19, Age: 34.4
(mean) 4 M, 15 F
n = 10, 5 = CP, 5 = non-disabled,
Age 516
n = 10, Age: 28.5 (mean), 7 F, 3
M, Healthy
n = 3, Age: 50 (mean, S.D. = 8),
Stroke
n = 11, Age: 517, CP, 6 M; 5 F

For inter-session: NS,


For inter-assessor:
n = 3,
Discipline = NS
n = NS, Discipline: NS

Inter-session (W-Ass), 3  sessions,


Inter-assessor, Interval: within day, & 1 week

Kinematic key points selected for analysis:


Bland and Altman LOA, ICC

Inter-session (test-retest), 5  sessions

Analysis across GC curve: ICC

Inter-session (W-Ass), 2  sessions,


Interval: 1 week
Inter-session (W-Ass), 2  sessions,
Interval: NS
Inter-assessor, Interval: 620 weeks

Kinematic key points selected for analysis,


Bland and Altman LOA, ICC
Analysis across GC curve: Multi-level,
random-effects linear regression model
Analysis across GC curve, Discordance index,
Absolute variability
Kinematic key points selected for analysis, CV
(S.D.)

n = 10, CP (n = 5), Typically


developing (n = 5), Age: 9.6
(mean)
n = 40, Healthy (n = 20), CP
(n = 20), Age: 715
n = 2, Age: 40, 36, Healthy
n = 6, Age: 34.8 (mean),
Unimpaired, 3 M; 3 F
n = 20, Age: 54.2 (mean), 7 F; 13
M, Stroke

n = 1, Discipline: NS
n = 3, Discipline:
Clinician
n = 4 laboratories,
Discipline: NS
n = NS, Discipline: NS

n = 1, Discipline: NS
n = 4, Discipline: PT
n = 2, Discipline: PT
n = 1, Discipline:
Technician

Inter-session (test-retest), 5  sessions;


Interval: Each session >2 days apart
Inter-session (W-Ass), 3  sessions,
Interval: 3 days within a week
Inter-session (W-Ass), 3  sessions,
Inter-assessor, Interval: NS
Inter-session (W-Ass), 2  sessions,
Inter-assessor, Interval: within 2 weeks
Inter-session (W-Ass), 2  sessions,
Interval: 2 h

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

Gorton et al. [33] (A)

n = 1 or 2, Discipline:
Physician & technician
n = 2, Discipline: NS
n = 24 (in 12 labs),
Discipline: clinicians
n = 24 (in 12 labs),
Discipline: clinicians
n = 1, Discipline: NS

Kinematic key points (stance phase) selected


for analysis: ICC, Mean (S.E.M.)
Kinematic key points selected for analysis, ICC,
Wilcoxon signed ranks test
Analysis across GC curve: CMC
Linear mixed model, S.D. and range

Analysis across GC curve, CMC (S.D.)


Analysis across GC curve, Variance components
estimation (S.D.)
Analysis across GC curve, CMC (S.D.)
Analysis across GC curve and at selected
kinematic key points, CV%, CMC, ICC

363

(A), Abstract only; F, female; M, male; VCM, Vicon Clinical Manager; GC, gait cycle; S.D., standard deviation; SEM, standard error of measurement; W-Ass, Within-assessor; ANOVA, analysis of variance; CODA, Cartesian
Optoelectric Dynamic Anthropometer; OLGA, optimised lower-limb gait analysis; LOA, limits of agreement; CCM, Cleveland Clinic Model; NS, not stated; PiG, Plug-in-Gait; PT, Physiotherapist; CMC, coefcient of multiple
correlation; CMD, coefcient of multiple determination; CV%, coefcient of variation; ICC, intra-class correlation; CI, condence interval; NS, not stated; # data refers to Leardini et al. [29] Study 2 (inter-examiner).

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

364
Table 2
Methodological quality of the reviewed full-text studies.
Gait participants

Besier et al. [31]


Charlton et al. [27]
Ferber et al. [32]
Gok et al. [25]
Growney et al. [34]
Kadaba et al. [28]
Leardini et al. [29]
Mackey et al. [21]
Maynard et al. [36]
Monaghan et al. [9]
Noonan et al. [2]
Schwartz et al. [7]
Steinwender et al. [23]
Tsushima et al. [30]
Yavuzer et al. [24]

Sampling
method

Inclusion and
exclusion criteria

Description

Assessor
participant
description

Not stated
Not stated
Not stated
Convenience
Not stated
Not stated
Not stated
Convenience
Not stated
Not stated
Convenience
Not stated
Convenience
Not stated
Case consecutive

Not stated
Not stated
Not stated
Stated
Not stated
Limited
Not stated
Stated
Stated
Stated
Stated
Not stated
Stated
Stated
Stated

Partial
Inadequate
Adequate
Partial
Partial
Partial
Adequate
Adequate
Partial
Adequate
Partial
Adequate
Partial
Adequate
Adequate

Partial
Partial
Inadequate
Partial
Inadequate
Partial
Adequate
Inadequate
Inadequate
Partial
Partial
Partial
Inadequate
Adequate
Partial

3.4. Reliability ndings: overview


The diversity in the reported studies precludes a simple
synthesis of results. Meta-analysis of the results was not
considered to be appropriate given the diversity among a fairly
small number of studies, the varied participant ages and pathology,
the marked variability in the quality, methods and selected
statistical analysis and the heterogeneity of results. Under these
circumstances, the review comprised a qualitative analysis of the
research available, a best evidence synthesis [39].
Limited comparisons are possible across the seven studies
reporting within-assessor reliability using the CMC or CMD (see
Table 3). Reliability varied widely across the studies and gait
variables. Excluding pelvic tilt, very high values were typically
reported for the sagittal plane data, with the transverse plane
generally showing the lowest reliability (median < .72). The
lowest obtained reliability indices (<.6) were reported for pelvic
tilt, [23,28,30], knee varus [23], and hip, knee and foot (transverse
plane) [23,28,31,34].
Evaluation of the reliability indices (either CMC or ICC) across
all studies conrms that sagittal plane reliability was typically
higher than .8, excluding pelvic tilt. For the coronal plane, most
studies reported reliability indices of >.7. The majority of studies
reported indices <.7 for the transverse plane (excluding the pelvis).
Results from gait studies reported as either S.D. or S.E. provide
the magnitude of error across different gait variables and are
reported in Fig. 1. The diversity of study types, participants and

Protocol
standardisation
and description

Model
description

Data
description

Statistical
analysis

Adequate
Limited
Adequate
Adequate
Limited
Limited
Limited
Limited
Adequate
Adequate
Limited
Limited
Adequate
Adequate
Limited

Adequate
Adequate
Limited
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Limited
Adequate
Adequate

Adequate
Limited
Limited
Limited
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate

Adequate
Adequate
Adequate
Limited
Adequate
Limited
Adequate
Adequate
Adequate
Adequate
Adequate
Adequate
Limited
Limited
Limited

analyses limits between-study comparisons; however grouping


data in this manner is useful to look broadly at patterns. In general,
sagittal plane errors were typically <48, and coronal plane around
28. Highest errors were seen in hip and knee rotation, and the
lowest errors commonly at the pelvis in the transverse and coronal
plane, and hip abduction. The pattern of these ndings broadly
concurs with other reports in terms of range [1,33] or absolute
variability [2,29], with reported values of hip rotation ranging
from 168 [29] to 348 [33], in contrast to lower estimates of pelvic
obliquity of less than 68 [1,2,29,33].
Reports of the distribution and relative error sizes across gait
variables did not always coincide with the ndings of studies using
CMCs to assess reliability. Both statistical methods suggested that
hip and knee rotation measures showed most variation, with
higher errors reported and generally low CMC values. For some gait
variables, however, the error magnitudes did not reect the
reported CMC indices. For example, pelvic rotation was frequently
reported with relatively low error (<28), yet showed only moderate
reliability with CMC values ranging from .67 to 89 (median of .72).
Similarly, knee exion showed uniformly high CMC values, yet
showed relatively larger error magnitudes, with some studies
showing errors in excess of 48.
Of the six studies reporting both within-assessor and betweenassessor error, three found single assessors to be more repeatable
than multiple assessors [7,27,30], one found similar repeatability
[31] and a single study reported between-assessor reliability to be
better than within-assessor [36]. Of the three reports comparing

Table 3
Summary of studies reporting within-assessor reliability of 3DGA, data as coefcient of multiple correlation (CMC) (within-assessor).

Sagittal

Coronal

Transverse

Pelvic Tilt
Hip exion
Knee exion
Ankle dexion
Pel obliquity
Hip abduction
Knee varus/val
Pel rotation
Hip rotation
Knee rotation
Foot progression

Besier et al.
[31] (AL)a

Besier et al.
[31] (FUN)a

Gorton
et al. [19]

Growney
et al. [34]

Kadaba
et al. [28]

Steinwender
et al. [23]

Steinwender
et al. [23]

Tsushima,
et al. [30]

Yavuzer
et al. [24]

Healthy
adults

Healthy
adults

Healthy
children

Healthy

Healthy
adults

Healthy
children

Children
with CP

Healthy
adult

Adults with
stroke

.97
.96
.92

.93
.80

.62
.83

.98
.96
.93

.79
.99
.99
.96

.91

.64
.96
.99
.98
.85
.90
.74
.88
.74
.54
.55

.24
.98
.99
.93
.89
.89
.61
.72
.41
.49
.58

.32
.96
.96
.87
.75
.85
.49
.67
.59
.34
.37

.56
.96
.96
.83
.73
.76
.58
.71
.57
.41
.49

.38
.99
.99
.98
.98
.97
.79
.89
.82
.81
.82

.95
.89
.85
.85

.92
.82

.63
.87

Besier et al. CMCs derived from coefcient of multiple determination data, L side.

Median

.56
.96
.96
.93
.85
.89
.74
.72
.62
.54
.55

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

365

Fig. 1. Summary of gait studies reporting 3DGA reliability as S.D. or S.E.


Study details: Besier [31]; average systematic error between and within-assessors, Charlton [27]a*; SD PIG inter-assessor, Charlton [27]b*; SD OLGA inter-assessor, Eve [40];
SE within-assessor, Gorton [1]; SD inter-assessor, Gorton [33]; SD inter-assessor, Growney [34]; SD within-assessor (right side), Leardini [29]; SD inter-assessor, Maynard
[36]; SD diff. inter-assessor (averaged across events), Monoghan [9]; SD diff. within-assessor (averaged across events), Murphy [38]*; SD within and inter-assessor, Schwartz
[7]; SD inter-assessor. * data estimated from Figure provided.

the repeatability of healthy children and those with CP, no clear


ndings emerged. The only full-text report described the repeatability of children with CP as lower than healthy children, although
the provided data show broadly comparable CMC values with
higher values obtained for the CP group pelvic tilt and foot rotation
[23]. In the abstract reports from Quigley et al. and Miller et al.
[20,22], the normal children appeared to be slightly less consistent

than those with CP, although varying across different gait


variables.
4. Discussion
The diversity of study participants, methods, biomechanical
modelling techniques, statistical analyses and results precludes a

366

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

simple conclusion about the reliability of 3DGA. Data from studies


reporting reliability indices do however suggest that the majority
of studies reported moderate to good reliability for sagittal and
coronal plane variables, with the exception of pelvic tilt and knee
varus/valgus in some reports. Likewise, estimates of error (S.D. or
S.E.; Fig. 1) suggest most studies reported error of less than 58 for all
gait variables, excluding hip and knee rotation.
Whether 3DGA data is reliable enough remains a question that
can be answered only in the context of proposed use, with the
degree of acceptable measurement variation relating directly to
the intended application. It is clearly beyond the scope of this
review to specify acceptable limits of reliability for 3DGA data. We
do however believe that in most common clinical situations that
error of 28 or less is highly likely to be widely considered
acceptable, as such errors are probably too small to require explicit
consideration during data interpretation. Errors of between 28 and
58 are also likely to be regarded as reasonable but may require
consideration in data interpretation. We suggest that errors in
excess of 58 should raise concern and may be large enough to
mislead clinical interpretation. Data from the studies reporting
error reveals that the majority of studies and gait variables show
errors that fall between 28 and 58. Hip rotation clearly shows the
highest error, although it is noteworthy that some studies report
lower error of <58 for this variable, suggesting that lower error is
currently achievable [7,27,38,40,41]. This compares well with
clinical measurements of similar variables. For example, both
Fosang et al. [42] and McDowell et al. [43] report variability of
between 58 and 108 in clinical assessment of sagittal plane range of
movement of the major joints of the lower extremity.
4.1. Methodological considerations: participant and assessor samples
The widespread use of 3DGA as part of clinical services in
clinical populations warrants careful consideration of best-quality
study methodology. Appropriate sample composition and inclusion/exclusion criteria should ensure that the range of characteristics of interest in a clinical population is most likely to be present
in a sample, and that the ndings can be generalised. Of the four
full-text reports including clinical participants [2,21,23,24], three
chose convenience samples. Such samples may be susceptible to
sampling bias, such as selective inclusion of the most cooperative
or compliant participants. Ideally, subjects participating in a study
of a measurement should consist of individuals who would be
likely to undergo the test in clinical practice, and reect a
continuum of severity from mild to severe [44]. If the target
population is intended to be typical clinical gait analysis service
patients then recruitment strategies could consider use of a
prospective cohort design with consecutive clinical subjects, such
as the case consecutive sampling described by Yavuzer et al. [24].
Such designs are recognized as the best method in studies of
diagnostic tests to ensure a representative sample and avoid
selection bias [45]. Alternate strategies could include stratied
random sampling of typical gait disorder populations.
The prevalence of healthy participants in the majority of the
studies is noteworthy and contrasts with the widespread clinical
and research application of 3DGA to evaluate gait disorders. Gait
analysis services typically include those with gait conditions such
as CP [46], spinal cord injury [47,48], spina bida and acquired
brain injury [49]. Furthermore, research studies have used 3DGA to
characterise gait disorders and examine intervention efcacy in
diseases such as Parkinsons disease [50], myelomenigocele [51],
and CP [52]. Generalisability of the error associated with repeated
measures of healthy adults to children or those with gait pathology
should be viewed with some caution. Adult gait data is generally
found to be less variable than childrens [53], and younger children

also more variable than older children [19]. Children with CP were
more variable for some kinematic gait variables than healthy
children [23]. Measures of gait data reliability are intrinsically
related to the variability within the studied group [54], with
measurements widely considered to be population-specic [55].
Whether estimates of error can be reasonably generalised across
clinical populations should be carefully considered, in the context
of the characteristics of the specic pathology, and the associated
impairment and gait dysfunction characteristics. Furthermore,
although different gait disorders may be associated with variable
levels of intrinsic gait repeatability, it is not clear whether the
nature of the gait disorder has any direct effect on procedural
sources of error such as marker placement. It is likely that such
errors may be related to patient-specic factors such as cognition,
compliance and cooperation which may or may not be related to
the gait disorder.
The potential inuence of the assessor characteristics on the
reliability of 3DGA data received very limited focus within the
studies in this review, with generally poor detailing of assessor
recruitment and descriptions. Kinematic 3D gait measurement
using landmark-specic models requires specialised staff skills,
including accurate and consistent placement of markers, and
expert knowledge of the underlying biomechanical model.
Training of clinical staff in standardised protocols is widely
considered to be important [1,29]. The consistency of the measures
may therefore be inuenced by assessor experience, expertise,
professional background and additional training [56], with
experience of the clinical team potentially contributing to random
error in gait data [57]. Inclusion criteria or sampling methods for
assessors were not reported in any study, and it seems probable
that assessors were convenience samples of staff working within
the authors laboratories. Whether the samples were inuenced by
any biasing factors, such as recruitment of only the most
experienced or best assessors is uncertain. If experience or
discipline-specic training is a determinant of 3DGA measurement
reliability, then it is uncertain whether the results of best
assessors can be applied to other inexperienced assessors, or those
from different professional backgrounds. Similarly, if the ndings
are from novice assessors, then the error sizes reported may be
larger than those typically achieved by experienced assessors with
greater expertise.
4.2. Methodological considerations: study design and procedures
Although the majority of studies described the use of
standardised protocols, wide variation was apparent in the
duration between measurement sessions. Justication of the time
interval duration is recognized as a desirable attribute of study
quality [16], but was absent in the majority of reports. Selection of
an optimal interval in repeated 3DGA measures requires consideration of both practical and theoretical issues. In principle,
intervals should be far apart to minimise fatigue or memory bias
effects, but short enough to avoid genuine change in the
measurements [16,55]. Articially short intervals within a day
are often most feasible to achieve, yet may leave visible signs of
marker placement on skin to unblind a repeat assessment or
subsequent assessor, or increase the possibility that assessors may
remember aspects of anthropometric measures or landmark
identication. Fatigue may also cause true variations in the gait
patterns of clinical subjects when measured repeatedly within a
day by multiple assessors. In contrast, longer time periods of
months increase the possibility that real change has occurred
within the measurement interval, potentially introducing disease
progression bias [17]. In clinical populations such as CP,
deterioration in gait has been documented over periods of 12

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

years [58,59], and may potentially occur over shorter periods.


Further, the exact level of intrinsic stability of able-bodied human
gait patterns over hours, days, weeks, months or years has not been
well detailed.
Blinding of assessors to prior measurements is typical practice
within repeatability studies of measurement tools other than
3DGA (e.g. [60]). Although the potential for assessor bias is less
apparent with instrumented measures, it remains a possible factor
in some studies which good research design may minimise. It is
particularly relevant to within-laboratory or within-assessor
studies using biomechanical models that rely on landmark-specic
marker placement and anthropometric measures. When measures
are repeated over short duration intervals, assessors may recall
anthropometric measures and/or bony landmark identication.
Bias may also be introduced in data processing, by unblinded
selective trial inclusion or post hoc data adjustment. Three authors
reported efforts aiming to minimise assessor bias. Tsushima et al.
[30] ensured that any traces of marker placement were absent
prior to repeated marker placement, and both Maynard et al. [36]
and Noonan et al. [2] reported that assessors/study sites were blind
to previous measurements. We suggest that future studies reect
upon potential sources of bias within study design and when
relevant consider blinding assessors to previous measurements.
Provision of concurrent ST data are a potentially useful
additional indicator of the true level of between-session gait
stability. Kinematic gait patterns are known to vary with changes
in walking speed [61]. Signicant changes in speed or step size
across sessions are therefore more likely to be associated with
true change in kinematic variability, rather than error related to
inconsistent marker placement. Inspection of the reported ST data
shows marked across-study differences in ST variation. Healthy
adults varied little in mean walking speed (0.03 m/s; 3%
Coefcient of Variation (CV)) across four measurement sessions
within 2 weeks [30]. Higher variation was evident in the study of
children with CP by Noonan et al. [2], which included four visits to
separate laboratories over 620 weeks, reporting a mean absolute
variability of 0.3 m/s, with a maximum absolute variability of
0.6 m/s. These wide variations suggest that the resultant discordance index may include kinematic changes due to differences
in walking speed in some individuals across measurement
sessions.
The data selected for the evaluation of reliability may
potentially inuence between-session data variation and differed
markedly across studies. Measurement sessions ranged from two
to ve (Table 1) and trial numbers varied from a single or typical
or representative trial [2,36], up to 10 trials (e.g. [1,9,33]). Some
evidence suggests that the number of analysed trials may inuence
the reliability of gait measurements. Monoghan et al. [9] examined
the inter-session reliability of two, four, six, eight and 10 trials,
nding that reliability improved with higher trial numbers,
subsequently advocating that 10 trials be used in analysis.
Similarly, in a study of inter-session reliability of the kinematics
of able-bodied running, Diss [62] found higher reliability indices
from inclusion of ve trials, in contrast to the lower values
obtained from a single trial. Wide variations in methodology
prevent a detailed examination of the inuence of trial number
within this review. It is notable however, that the two studies
including only single trials reported generally lower values of
reliability [36] and larger data variability [2].
The majority of studies either stated or are presumed to have
captured data in barefoot conditions, inferred from the description
of skin-mounted foot markers. No study examined gait with an
orthosis. It is common for children with CP or adults after stroke to
wear lower limb Ankle-Foot Orthoses (AFOs). Typical clinical gait
analysis for these people includes measures of gait in both barefoot

367

and AFO conditions, requiring marker replacement for the AFO


condition. The data from both gait conditions are commonly used
for clinical interpretation and evaluation of AFO prescription and
efcacy. Further studies are needed to examine the reliability of
3DGA data from gait conditions including orthoses.
Comparisons of the repeatability of alternate biomechanical
models are likely to inuence and guide the development and
adoption of more reliable models. Adequate model description is
therefore necessary to allow ready identication of the models
used. Two studies evaluated the repeatability of alternate models
with concurrent data capture. Charlton et al. [27] compared the
within- and between-assessor repeatability of a new model using
optimisation techniques (OLGA) with a conventional gait model
(VCM), nding lower error with OLGA. In contrast, Besier et al. [31]
found few differences in either within- and between-assessor
repeatability of a conventional anatomical landmark model
compared to a newer model with functional (motion) calibration.
No conclusions can be made from this review as to whether
particular models are more repeatable than others, due to the
diverse methodology used, the varied statistical analysis and
variable study quality.
4.3. Methodological considerations: statistical analysis
A key question in the reliability of 3DGA data is whether the
measures are reliable enough for clinical decision-making.
Although indices such as the CMC and ICC were commonly
reported, it is now well-recognized that, in isolation, correlation
indices do not tell us whether the measures are reliable enough,
with even high values potentially hiding measurement errors
judged to be of clinical importance [63]. Furthermore, expressing
data variability as a coefcient results in units that are difcult to
interpret clinically [29]. To be most useful, variability should be
expressed in a manner that can be directly related to the
measurement itself, in the same measurement units (e.g. degrees)
[64]. This is a signicant limitation to much of the existing
literature, with only around half of the papers reporting error in
absolute terms. Interpretation of reliability indices according to
reference ranges of arbitrary acceptable or unacceptable values
also occurred in some of the studies. This is now generally regarded
as unreasonable with preference that the adequacy of reliability
outcomes should be reported in the context of the intended
research or clinical utilisation.
The prevalence of reports using the CMC warrants particular
attention, as the calculation method of the CMC is markedly
inuenced by the joint range of motion (ROM). As noted by
previous authors [23,34], joints with large ROM typically record
high CMCs, and conversely, joints with low ROM typically show
poorer reliability. Furthermore, lower limb joints vary greatly in
ROM across patient groups and individual patients, and subjects
with gait pathology may show either increased (e.g. pelvic tilt) or
reduced (e.g. knee exion) ROM. Inspection of reported data
conrms this pattern, with generally low and variable CMC values
achieved for pelvic tilt (for example see [23,28,30]) and only
relatively high values of >.85 reported for sagittal knee motion
(Table 3). The marked variation in joint range across the lower limb
seriously limits the utility of this measure, and caution is
advocated in interpreting such results. We recommend that this
technique should not be used in isolation in future reliability
studies.
It is recommended that future studies reporting reliability of
3DGA data include absolute measures of measurement error such
as the S.D., S.E.M. or alternate forms. Consideration should also be
given to the investigation and development of minimum levels of
detectable change (MDC), or minimal clinically important differ-

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369

368

Table 4
Factors to consider when planning or reporting a 3DGA gait reliability study.
Descriptor
Methods
Participants (gait)
Participants (assessors)
Protocol and model

Study design

Steps to reduce bias


Sample size
Statistical methods

Results
Participants (gait)
Participants (assessors)

Data

Eligibility criteria. Recruitment strategy.


Eligibility criteria. Recruitment strategy.
Description of setting, measurement
protocol, data capture systems and
biomechanical models (in sufcient
detail to allow study to be repeated).
Single or multiple assessors and/or labs.
Number and timing of sessions and
trials within session. Standardisation of
assessment intervals. Variables to be
investigated.
Has blinding of assessors occurred if
appropriate?
How has sample size been determined?
Description of statistical measurements.
Do these provide outcomes with the same
units as the measured variables to ensure
clinical applicability of results?

Description of participant characteristics.


Description of participant characteristics
with specic emphasis on professional
background and experience.
Report of basic temporal data parameters
along with more complex gait data.
Consider reporting estimates of variance
of various sources: i.e. inter-trial,
within-assessor, between-assessor etc

ences (MCID) [65]. Further evidence may also be sought for the
responsiveness of 3DGA measures. Whether the error magnitudes
are sufciently low will be relative to the magnitude of expected
intervention effect size and specic population context. Further
studies are necessary in typical clinical populations to provide high
quality evidence indicating whether 3DGA measures are sufciently reliable to detect clinically important change.
5. Considerations and recommendations for future research
A number of limitations should be considered when interpreting the ndings of this review. All papers were retained for
inclusion regardless of study quality, in order to provide a
comprehensive overview of available data. Statistical synthesis
of the data was not performed. The ndings of this review are
limited to the published papers identied by the search strategies.
Potential publication bias was not assessed and may have resulted
in an over-estimation of reliability. Study quality was only
reviewed by the criterion tool developed for the study purpose.
Future studies of the reliability of 3DGA require careful
consideration of optimal design to enhance the generalisability
of the ndings. If the intention is to apply the reliability estimates
to clinical populations, then careful attention is necessary to
recruit and describe samples which are representative of the
clinical populations of interest. Assessor recruitment and characterization warrants comparable attention. Protocols should
carefully consider what standardised measurement interval is
most appropriate and minimise predictable sources of assessor
bias. Appropriate statistical strategies should include reliability
estimates in units of degrees to enhance interpretation. Future
studies should also consider evaluation of the reliability of kinetics
and consider study designs that allow evaluation of the responsiveness of 3DGA. Table 4 proposes a list of factors that should be
considered when designing or reporting a study of the reliability of
3DGA.

As an alternative to research with clinical participants, small


studies using low numbers of healthy participants may also be
appropriate, to more easily enable between-laboratory comparisons of specic techniques or biomechanical models. Further
renement and adoption of a standard test protocol using
methods such as those outlined by Schwartz et al. [7] may be
useful. Such a protocol could specify an agreed number of trials and
sessions, incorporate methods to minimise assessor bias, and
adopt a specied time interval such as 1 week. This may provide a
useful and more feasible approach to investigating model or
technique-specic questions, prior to denitive studies in clinical
populations when necessary.
This review concludes that although most errors in gait analysis
are probably acceptable, they are generally not small enough to be
ignored during clinical data interpretation. A goal of any clinical
measurement technology must be to provide measurements that
are free from any measurement error that might affect interpretation. There is thus still a need for modifying measurement
techniques to reduce levels of error. Many current techniques rely
heavily on the skill of assessors in accurately placing markers, and
inaccurate marker placement is almost certainly the principal
source of error. New techniques are now emerging based on
functional calibration techniques which are, in principle, less
dependent on the accuracy of marker placement (for example, see
[66,67]). It is hoped that these may further reduce measurement
error in clinical gait analysis. The denition of what measurement
error is acceptable is, of course, dependent on the particular clinical
application.
This review provides evidence that clinically acceptable errors
are possible in gait analysis. Variability between studies, however,
suggests that they are not always achieved and that particular care
is required to achieve acceptable results.
Acknowledgements
This project was funded by a National Health and Medical
Research Council Grant (ID 264597) to the Centre for Clinical
Research Excellence in Gait Analysis and Gait Rehabilitation,
Murdoch Childrens Research Institute, Melbourne, Australia.
Conict of interest
Author RB has received research support funding from VICON.
The other authors state there were no conicts of interest.
References
[1] Gorton G, Hebert D, Goode B. Assessment of the kinematic variability between
12 Shriners motion analysis laboratories. Gait & Posture 2001;13:247.
[2] Noonan K, Halliday S, Browne R, OBrien S, Kayes K, Feinberg J. Inter-observer
variability of gait analysis in patients with cerebral palsy. Journal of Pediatric
Orthopaedics 2003;23:27987.
[3] Wright JG. Pro: interobserver variability of gait analysis. Journal of Pediatric
Orthopaedics 2003;23:2889.
[4] Gage JR. Con: interobserver variability of gait analysis. Journal of Pediatric
Orthopaedics 2003;23:2901.
[5] Rothstein JM, Echternach JL. Primer on measurement: an introductory guide to
measurement issues. Alexandria, VA: American Physical Therapy Association
(APTA); 1993.
[6] Baker R. Gait analysis methods in rehabilitation. Journal of NeuroEngineering
and Rehabilitation 2006;3:4.
[7] Schwartz MH, Trost JP, Wervey RA. Measurement and management of errors in
quantitative gait data. Gait & Posture 2004;20:196203.
[8] Kallen M. Understanding reliability when using measurement instruments in
the VA population. METRIC Newsletter (Measurement Excellence and Training
Resources Information Center); 2005 [Fall].
[9] Monaghan K, Delahunt E, Cauleld B. Increasing the number of gait trial
recordings maximises intra-rater reliability of the CODA motion analysis
system. Gait & Posture 2007;25:30315.

J.L. McGinley et al. / Gait & Posture 29 (2009) 360369


[10] National Health and Medical Research Council. How to review the evidence:
systematic identication and review of the scientic literature. Canberra,
Australia: Biotext; 1999.
[11] Mulrow C, Cook DJ, Davidoff F. Systematic reviews: critical links to the great
chain of evidence. Annals of Internal Medicine 1997;126:38991.
[12] Hestbaek L, LeBoeuf-Yde C. Are chiropractic tests for the lumbo-pelvic spine
reliable and valid? A systematic critical literature review. Journal of Manipulative and Physiological Therapeutics 2000;23:25875.
[13] Jordan K. Assessment of published reliability studies for cervical spine rangeof-motion measurement tools. Journal of Manipulative and Physiological
Therapeutics 2000;23:18095.
[14] van der Wurff P, Hagmeijer RHM, Meyne W. Clinical tests of the sacroiliac
joint. A systematic review. Part 1: reliability. Manual Therapy 2000;5:
306.
[15] Dobson F, Morris ME, Baker R, Graham HK. Gait classication in children with
cerebral palsy: a systematic review. Gait & Posture 2007;25:14052.
[16] Terwee C, Bot S, de Boer M, van der Windt D, Knol D, Dekker J, et al. Quality
criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology 2007;60:3442.
[17] Whiting P, Rutjes A, Dinnes J, Reitsma J, Bossuyt P, Kleijnen J. Development and
validation of methods for assessing the quality of diagnostic accuracy studies.
Health Technology Assessment 2004;8.
[18] Higgins J, Green S, editors. Cochrane handbook for systematic reviews of
interventions 426 [updated September 2006] The Cochrane Library, vol. issue
4. Chichester, UK: John Wiley & Sons, Ltd.; 2006.
[19] Gorton G, Stevens C, Masso P, Vannah W. Repeatability of the walking patterns
of normal children. Gait & Posture 1997;5:155.
[20] Miller F, Castagno P, Richards J, Lennon N, Quigley E, Njiler T. Reliability of
kinematics during clinical gait analysis: a comparison between normal and
children with cerebral palsy. Gait & Posture 1996;4:16970.
[21] Mackey AH, Walt SE, Lobb GA, Stott NS. Reliability of upper and lower limb
three-dimensional kinematics in children with hemiplegia. Gait & Posture
2005;22:19.
[22] Quigley E, Miller F, Castagno P, Richards J, Lennon N. Variability of gait
measurements for typically developing children and children with cerebral
palsy. Gait & Posture 1999;10.
[23] Steinwender G, Saraph V, Scheiber S, Zwick EB, Uitz C, Hackl K. Intrasubject
repeatability of gait analysis data in normal and spastic children. Clinical
Biomechanics 2000;15:1349.
[24] Yavuzer G, Oken O, Elhan A, Stam HJ. Repeatability of lower limb threedimensional kinematics in patients with stroke. Gait & Posture 2008;27:
315.
[25] Gok H, Ergin S, Yavuzer G. Reliability of gait measurement in normal subjects.
Journal Rheumatic Medicine and Rehabilitation 2002;13:7680.
[26] Palisano R, Rosenbaum P, Walter S, Russel LD, Wood E, Galuppi B. Development and reliability of a system to classify gross motor function in children
with cerebral palsy. Developmental Medicine and Child Neurology 1997;39:
21423.
[27] Charlton IW, Tate P, Smyth P, Roren L. Repeatability of an optimised lower
body model. Gait & Posture 2004;20:21321.
[28] Kadaba MP, Ramakrishnan HK, Wootten ME, Gainey J, Gorton G, Cochran GV.
Repeatability of kinematic, kinetic, and electromyographic data in normal
adult gait. Journal of Orthopaedic Research 1989;7:84960.
[29] Leardini A, Sawacha Z, Paolini G, Ingrosso S, Nativo R, Benedetti MG. A new
anatomically based protocol for gait analysis in children. Gait & Posture 2007
Oct;26:56071.
[30] Tsushima H, Morris ME, McGinley J. Test-retest reliability and inter-tester
reliability of kinematic data from a three-dimensional gait analysis system.
Journal of the Japanese Physical Therapy Association 2003;6:917.
[31] Besier TF, Sturnieks DL, Alderson JA, Lloyd DG. Repeatability of gait data using
a functional hip joint centre and a mean helical knee axis. Journal of Biomechanics 2003;36:115968.
[32] Ferber R, McClay Davis I, Williams D, Laughton C. A comparison of within- and
between-day reliability of discrete 3D lower extremity variables in runners.
Journal of Orthopaedic Research 2002;20:113945.
[33] Gorton G, Hebert D, Goode B. Assessment of the kinematic variability between
twelve Shriners motion analysis laboratories Part 2: short-term follow up. Gait
& Posture 2002;16:S6566.
[34] Growney E, Meglan D, Johnson M, Cahalan T, An K-N. Repeated measures of
adult normal walking using a video tracking system. Gait & Posture 1997;6:
14762.
[35] Shrout PE, Fleiss JL. Intra-class correlations: uses in assessing rater reliability.
Psychology Bulletin 1979;86:4208.
[36] Maynard V, Bakheit AMO, Oldham J, Freeman J. Intra-rater and inter-rater
reliability of gait measurements with CODA mpx30 motion analysis system.
Gait & Posture 2003;17:5967.
[37] Cowman J, Jenkinson A, OConnell P, OBrien T. A model for establishing
reliability and quantifying error associated with routine gait analysis. Gait
& Posture 1998;8:79.

369

[38] Murphy A, McGinley J, Tirosh O. Reliability of kinematic gait measurements in


adult hemiplegic stroke. In: Proceedings of the 12th annual gait and clinical
movement analysis society; 2007.
[39] Deville WL, Buntinx F, Bouter LM, Montori VM, de Vet HCW, van der Windt
DAWM, et al. Conducting systematic reviews of diagnostic studies: didactic
guidelines. BMC Medical Research Methodology 2002.
[40] Eve L, McNee A, Shortland A. Extrinsic and intrinsic variation in kinematic data
from the gait of healthy adult subjects. Gait & Posture 2006;24:S567.
[41] Schwartz MH, Viehweger E, Stout J, Novacheck TF, Gage JR. Comprehensive
treatment of ambulatory children with cerebral palsy: an outcome assessment. Journal of Pediatric Orthopedics 2004;24:4553.
[42] Fosang AL, Galea MP, McCoy AT, Reddihough DS, Story I. Measures of muscle
and joint performance in the lower limb of children with cerebral palsy.
Developmental Medicine & Child Neurology 2003;45:66470.
[43] McDowell B, Hewitt V, Nurse A, Weston T, Baker R. The variability of goniometric measurements in ambulatory children with spastic cerebral palsy. Gait
& Posture 2000;12:11421.
[44] Lijmer JG, Willem B, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JHP,
et al. Empirical evidence of design-related bias in studies of diagnostic tests.
Journal of the American Medical Association 1999;282:10616.
[45] Fritz J, Wainner R. Examining diagnostic tests: an evidence-based perspective.
Physical Therapy 2001;81:154664.
[46] Gage JR, Koop SE. Clinical gait analysis: application to management of cerebral
palsy. In: Allard P, Stokes IAF, Blanchi J-P, editors. Three-dimensional analysis
of human movement. Champaign, IL: Human Kinetics; 1995. p. 34962.
[47] Patrick J. Case for gait analysis as part of management of incomplete spinal
cord injury. Spinal Cord 2003;41:497582.
[48] Smith PA, Hassani S, Reiners K, Vogel LC, Harris GF. Gait analysis in children
and adolescents with spinal cord injuries. Journal of Spinal Cord Medicine
2004;27:S449.
[49] Perry J. The use of gait analysis for surgical recommendations in traumatic
brain injury. Journal of Head Trauma Rehabilitation 1999;14:11635.
[50] Morris M, Iansek R, McGinley J, Matyas T, Huxham F. 3-Dimensional gait
biomechanics in Parkinsons disease: evidence for a centrally mediated amplitude regulation disorder. Movement Disorders 2005;20:4050.
[51] Gutierrez E, Bartonek A, Haglund-Akerling Y, Saraste H. Kinetics of compensatory gait in persons with myelomeningocele. Gait & Posture 2005;21:1223.
[52] Gage JR, DeLuca PA, Renshaw TS. Gait analysis: principle and applications with
emphasis on its use in cerebral palsy. Instructional Course Lectures
1996;45:491507.
[53] Stolze H, Kuhtz-Buschbeck J, Mondwurf C, Johnk K, Friege L. Retest reliability
of spatiotemporal gait parameters in children and adults. Gait & Posture 1998;
7:12530.
[54] Bruton A, Conway JH, Holgate ST. Reliability: what is it, and how is it
measured? Physiotherapy 2000;86:949.
[55] Portney LG, Watkins MP. Foundations of clinical research. Applications to
practice. New Jersey: Prentice Hall Health; 2000.
[56] de Vet HCW, Terwee CB, Bouter LM. Current challenges in clinimetrics. Journal
of Clinical Epidemiology 2003;56:113741.
[57] Davis R, Davids J, Gorton G, Aiona M, Scarborough N, Oefnger D, Tylkowski
CAB. A minimum standardized gait analysis protocol: development and
implementation by the Shriners Motion Analysis Laboratory Network (SMALnet). In: Harris GF, Smith PA (Eds.). Pediatric gait: a new millennium in clinical
care and motion analysis technology. IEEE; 2000.
[58] Bell KJ, Ounpuu S, DeLuca PA, Romness MJ. Natural progression of gait in children
with cerebral palsy. Journal of Pediatric Orthopaedics 2002;22:67782.
[59] Gough M, Eve LC, Robinson RO, Shortland AP. Short-term outcome of multilevel surgical intervention in spastic diplegic cerebral palsy compared with the
natural history. Developmental Medicine & Child Neurology 2004;46:917.
[60] Watkins M, Riddle D, Lamb R, Personius W. Reliability of goniometric measurements and visual estimates of knee range of motion obtained in a clinical
setting. Physical Therapy 1991;71:906.
[61] van der Linden ML, Kerr AM, Hazlewood ME, Hillman SJM, Robb JE. Kinematic
and kinetic gait characteristics of normal children walking at a range of
clinically relevant speeds. Journal of Pediatric Orthopaedics 2002;22:8006.
[62] Diss CE. The reliability of kinetic and kinematic variables used to analyse
normal running gait. Gait & Posture 2001;14:98103.
[63] Luiz RR, Szklo M. More than one statistical strategy to assess agreement of
quantitative measurements may usefully be reported (commentary). Journal
of Clinical Epidemiology 2005;58:2156.
[64] Keating J, Matyas T. Unreliable inferences from reliable measurements. Australian Journal of Physiotherapy 1998;44:510.
[65] Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and
measures used in physical therapy. Physical Therapy 2006;86:73543.
[66] Schwartz MH, Rozumalski A. A new method for estimating joint parameters
from motion data. Journal of Biomechanics 2005;38:10716.
[67] Reinbolt JA, Schutte JF, Fregly BJ, Koh BI, Haftka R, George A, Mitchell K.
Determination of patient-specic multi-joint kinematic models through twolevel optimization. Journal of Biomechanics 2005;38:6216.

You might also like