You are on page 1of 11

J Occup Rehabil

DOI 10.1007/s10926-017-9734-x

REVIEW

Updating the Evidence on Functional Capacity Evaluation


Methods: A Systematic Review
Stijn De Baets1   · Patrick Calders2 · Noortje Schalley1 · Katrien Vermeulen3 ·
Sofie Vertriest3 · Lien Van Peteghem3 · Marieke Coussens1 · Fransiska Malfait4 ·
Guy Vanderstraeten2,3 · Geert Van Hove5 · Dominique Van de Velde1   

© Springer Science+Business Media, LLC 2017

Abstract  Objectives To synthesize the evidence on test–retest reliability and moderate to high inter-rater reli-
the psychometrics functional capacity evaluation (FCE) ability. Low internal and external responsiveness were found
methods. Methods A systematic literature search in nine for the PWPE, predictive validity was high. The predictive
databases. The resulting articles were screened based on validity of the short-form FCE was also high but need to be
predefined in- and exclusion criteria. Two reviewers inde- further examined on several psychometric properties. Low
pendently performed this screening. Included studies were discriminative and convergent validity were found for the
appraised based on their methodological quality. Results work disability functional assessment battery. The WorkHab
The search resulted in 20 eligible studies about nine differ- showed moderate to high test–retest, inter- and intra-rater
ent FCE methods. The Baltimore Therapeutic Equipment reliability. Conclusion Well-known FCE methods have been
work simulator showed a moderate predictive validity. The rigorously studied, but some of the research indicates weak-
Ergo-Kit (EK) showed moderate variability and high inter- nesses in their reliability and validity. Future research should
and intra-rater reliability. Low discriminative abilities and address how these weaknesses can be overcome.
high convergent validity were found for the EK. Concurrent
validity of the EK and the ERGOS Work Simulator was low Keywords  FCE · Functional capacity evaluation · Return
to moderate. Moderate to high test–retest, inter- and intra- to work · Assessment instruments · Psychometric
reliability was found in the Isernhagen Work-Systems (IWS)
FCE. The predictive validity of the IWS was low. The physi-
cal work performance evaluation (PWPE) showed moderate Introduction

A synthesis of research conducted by the Organization for


* Stijn De Baets
Economic Co-operation and Development (OECD) showed
Stijn.debaets@ugent.be
that almost all countries in the OECD experienced a sig-
1
Occupational Therapy Program, Department nificant social and economic impacts from the high num-
of Rehabilitation Sciences and Physiotherapy, Faculty ber of workers permanently leaving the labor market due to
of Medicine and Health Sciences, Ghent University, De
health problems or disability. Additionally, research shows
Pintelaan 185, 9000 Ghent, Belgium
2
that people with reduced work capacity or with more work
Department of Rehabilitation Sciences and Physiotherapy,
absence, are less likely to remain employed [1, 2]. The
Faculty of Medicine and Health Sciences, Ghent University,
Ghent, Belgium budget for disability benefits makes up a significant propor-
3 tion of public expenditure across OECD countries, with an
Department of Physical and Rehabilitation Medicine, Ghent
University Hospital, Ghent, Belgium average of 1.2% of the gross domestic product (GDP) [1]. In
4 the Netherlands, Norway and Sweden the proportion of GDP
Centre for Medical Genetics, Ghent University Hospital,
Ghent, Belgium is even higher at 3.5% [1]. Furthermore, the employment
5 rates of people with a disability are on average 40% lower
Department of Special Needs Education, Faculty
of Psychology and Educational Sciences, Ghent University, than the overall level of employment [1]. These low employ-
Ghent, Belgium ment rates are accompanied by high social costs [1, 3, 4] due

13
Vol.:(0123456789)
J Occup Rehabil

to unemployment benefits, lower incomes and much higher rehabilitation program to improve the ability to return to
poverty risk [1]. work can be developed. FCE’s have also become part of
Changes in the labor market due to the global financial medico-legal assessments to determine whether claimants
crisis, such as increased unemployment rates, might lead to a should receive disability benefits, based on their assessed
higher number of people dependent on sickness and disabil- functional abilities [7]. In conclusion, many disciplines from
ity benefits [1]. Meanwhile, the incidence of occupational various organizations use FCE’s as part of their clinical prac-
injuries leading to long-term absenteeism and potentially tice, including making recommendations that have implica-
leading to job loss is still on the rise in high income coun- tions for people with work-related disabilities, employers,
tries [5, 6]. There is a growing awareness under healthcare insurers, other health professionals and stakeholders. There-
professionals and policymakers about the return to work fore, it is important that FCE users know whether FCE meth-
theme. Even though, the number of people depending on ods provide reliable and valid information and subsequently
sickness and disability benefits is still increasing. Next to which method is preferable.
that, also the prevalence of occupational disabilities and There is a growing interest in the psychometric proper-
related costs continue to increase. So, one can conclude ties because “a FCE should give reliable and valid meas-
that current return to work approaches are not always that urements” [12]. Numerous studies have been conducted to
adequate [5]. validate FCE methods and to demonstrate their reliability in
Furthermore, “except for a few countries, the share of varying client groups. These studies have also been reviewed
spending on vocational rehabilitation and employment over the years. The most recent comprehensive review on
programs is less than 8%, and in most cases less than 4%, FCE dating from 2006 [7] states that there is a clear need for
of total disability-related spending” [1]. The discrepancy an updated global review of the existing evidence. Especially
between the costs associated with unemployment and those since the past years, new, promising FCE methods have been
related to incapacity and disability are of concern. There is developed and known methods have been refined and more
a need for increased investment in vocational rehabilitation thoroughly researched [7]. Another important reason for
in an effort to increase employment rates, including employ- updating current comprehensive evidence on FCE validity
ment of people with disabilities. In order to determine work and reliability, is the continuing use of FCE’s in important
capacity, it is necessary to have appropriate measurement decision-making processes regarding occupational rehabili-
tools. Functional capacity evaluation instruments (FCE) tation, insurances and disability benefits. Although, the main
could play a key role in decreasing incapacity-related spend- scope of this review is to give an overview of current FCE
ing by determining work capacity and matching this with methods and their psychometric qualities; nevertheless, we
appropriate employment; either by matching the abilities to should also highlight the importance of bio-psycho social
job requirements or by identifying necessary modifications reasoning regarding the subject of returning to work with
to the work environment or workload [7]. “An FCE is an a disability. In practice, all components of evidence based
evaluation of capacity of activities that is used to make rec- practice should be used. In this case, scientific evidence must
ommendations for participation in work while considering be used in therapy, but we may nog forget that client centere-
the person’s body functions and structures, environmental dness will always be the central concept in rehabilitation, so
factors, personal factors and health status” [8]. The purpose is the return to work approach. Next to that, we also need to
of FCE is considered to be the evaluation of a person’s abil- take into account the psychological and social factors con-
ity to participate in work [8] by matching the capacities with cerning labor. Only when all these abovementioned factors
functional job requirements [9]. The underlying assumption are included in the decision-making, one can make a well-
here, is that a better performance in the FCE is associated informed and deliberate decision and an accurate advice
with faster return-to work and lower risk of re-injury or towards clients, employers and society.
pain exacerbation [9]; however, it is important to note that
many other factors which are not measured by FCE, such as
personal causation, also influence return-to-work outcomes Materials and Methods
[10].
FCE’s are used most commonly with individuals who For this systematic review, a literature search was conducted
have work disabilities [6] in a clinical setting to develop a to identify relevant studies from the following electronic
customized rehabilitation program; to adjust the rehabilita- databases; Web of science, Trip Database, Journal Stor-
tion program; to measure changes in physical abilities and age, Pubmed, Embase, PEDro and OTSeeker. Before the
to determine functional work abilities and match these with actual literature search, scoping searches were performed
employment prior to return to work [6, 7, 11]. When the to determine which of these search terms would provide the
FCE shows that the functional abilities of the patient are highest number of relevant results. These were used as key
insufficient to meet the physical job demands, a vocational terms (Table 1) and were combined into search phrases using

13
J Occup Rehabil

Table 1  Used keywords and Assessment Study objective Study topic


their synonyms
Functional capacity evaluation(s) Psychometric properties Vocational rehabilitation
FCE Psychometrics Return to work
Functional assessment Vocational participation
Reliability Work
Reliable Job
Repeatable
Validity
Valid

Boolean operators. The following search string was used from a disease or disability, were included. Next to that,
(Functional Capacity Evaluation OR FCE) AND (Psycho- psychometric properties (validity and reliability), FCE meth-
metrics OR Psychometric properties OR Validity OR Reli- ods that assess the global physical/functional capacity of
ability) AND (Return to Work OR Vocational rehabilitation the subject, publication date after May 2004 and written in
OR Work OR Job OR Participation in work). In Fig. 1, an English, French or Dutch were important criteria. Articles
overview can be found of the whole search process. Fil- were excluded when studies only measured certain specific
ters for publication date and type of study were used when functional/physical capacities, studies of FCE methods with
available. a specific target population or studies that are not relevant
These hits were screened by applying the inclusion and to the research question. In the first phase of the screen-
exclusion criteria. Articles were included when the type ing process, the inclusion criteria were applied to the title
of study was an RCT, CCT, meta-analysis or systematic and abstract. When studies showed questionable relevance
review. A second criterion was the inclusion of participants in the first phase, they were included and more thoroughly
in selected studies. Articles were selected when healthy sub- screened in the following phase. In the second phase, the
jects or subjects with an occupational disability resulting selected studies were screened based on their full texts. In

Fig. 1  PRISMA flow diagram

13
J Occup Rehabil

this phase of the screening process, references of included comparison with the Ergo-Kit FCE [18]. No studies were
systematic reviews were also screened for relevant stud- found on the reliability or validity of the Blankenship FCE,
ies through citation searching. To ensure objectivity in 1 was found on the sensitivity and specificity [19]. Three
the screening process, 2 independent reviewers (2 junior studies were found on the reliability of the Isernhagen work
researchers under supervision of the research group) were systems (IWS), One on the test–retest reliability and repro-
asked to independently complete the same process of screen- ducibility [20], one on the inter-rater reliability [21] and 1
ing. The percentage of agreement by Cohen’s kappa was on both inter-rater as intra-rater reliability [22]. Two stud-
calculated. ies on the predictive validity [8, 23]. Two studies on the
The methodological quality of the included studies was reliability of the ErgoScience physical work performance
assessed using the three-level quality appraisal scale, devel- evaluation (PWPE) were found, 1 on the test–retest reli-
oped by Gouttebarge et al. [6]. The items “internal consist- ability/reproducibility [24] and 1 on the inter-rater reliability
ency‟ and “responsiveness‟ were added, based on the cri- [25]. One study on the predictive validity of the PWPE was
teria proposed by Terwee et al. [32]. Relevant data were found [26] and 1 on the internal and external responsiveness
extracted and gathered into a standardized data-extraction of the assessment [27]. There were no studies on the reli-
form. The core findings in each study were expressed by ability of the Short-Form Functional Capacity Evaluation,
measures of validity and/or reliability. All the above men- and only 1 article on its predictive validity [28]. No studies
tioned aspects were independently assessed by the two first were retrieved on the reliability of the Work Disability Func-
authors under supervision of the research team. tional Assessment Battery (WD-FAB), 1 study was retrieved
on the discriminant/divergent and convergent validity of the
assessment [29]. Two studies were found on the reliability of
Results the WorkHab, 1 researching test–retest reliability [30] and
one on intra- and inter-rater reliability [31]. Another paper
A total of 1381 hits were retrieved from the nine databases. researched the WorkHab’s internal consistency [30].
Of these, 200 were duplicates. After screening the remain- The overall methodological quality of the reviewed stud-
ing 1181 references by applying the inclusion and exclu- ies (Table 2) was moderate to high. Sixteen studies showed
sion criteria to their titles and abstracts, 1142 were excluded high methodological quality [11, 29–34, 37, 39, 42–45],
in phase 1. Citation searches were performed by using the 3 studies showed moderate quality [19, 20, 24], only 1
reference lists included in these reviews and applying the study was rated with low quality [22]. Nearly all studies
inclusion and exclusion criteria, resulting in 1 additional reported whether a full FCE method or subtests was stud-
eligible original paper. Other citations were excluded based ied and clearly mentioned the study objective. Four studies
on title or abstract. In phase 2, inclusion and exclusion cri- did not report data on the health or work status, while the
teria were applied to the 40 eligible full-texts, resulting in number of subjects, their gender-distribution and mean age
the exclusion of 20 full-texts. This left 20 original studies were always mentioned. Overall, the methodological qual-
to be included in this review. Agreement between the two ity scores decreased because of a questionable study design
reviewers on the inclusion of studies based on abstract and or the use of inadequate statistics [32]. Notwithstanding
title, was high. 98.17% agreement and κ = 0.96 (SE = 0.03; the fact that the overall quality of the reviewed studies was
95% CI 0.910–1.00), and excellent after deliberation/dis- considered moderate to high based on the three-level qual-
cussion, with 100% agreement and κ = 1.00. Disagreements ity appraisal scale from Gouttebarge [6], some important
on the inclusion or exclusion of studies were resolved by a aspects remained unclear because this system does not con-
discussion between the two reviewers and the research team. sider all important factors to determine the methodological
Studies on 9 FCE methods were reviewed (Tables 2 and quality of reviewed studies. For instance: none of the stud-
3). The literature on the Baltimore Therapeutic Equipment ies reported that study objects were randomized. Also, with
work simulator (BTE) resulted in no studies on the reliabil- regard to the psychometrics that were investigated: some
ity. Two studies were found on the predictive validity of the studies investigated only a limited number of psychometric
BTE [13, 14]. There is 1 article that gave an overview of the properties (see Table 4) and that limits the ability to compare
intra- and inter-rater reliability of the Ergo-Kit functional the quality of the included studies.
capacity evaluation (EK) [15] and 1 on the inter-rater reli- Most studies did not report the subject selection method,
ability and agreement [16]. One study described the dis- or used convenience sampling. Almost all studies reported
criminant, divergent and convergent validity [17] another the used inclusion and exclusion criteria. Of the studies in
study researched the concurrent validity of the EK and the which allocation was relevant, only three performed ran-
ERGOS Work Simulator [18]. No studies were found on dom allocation. The subjects were never reported to be
the reliability of the ERGOS work simulator (EWS). One blinded. In most cases, rater blinding was not performed or
study was found on the concurrent validity of the EWS in not mentioned. Many studies were faced with missing data.

13
J Occup Rehabil

Table 2  Studies sorted by FCE method


Baltimore Thera- Blankenship Ergo-Kit (EK) ERGOS work Isernhagen work systems FCE Physical work Short-form func- Work disabil- WorkHab func-
peutic Equip- WorkEval func- functional capac- simulator performance tional capacity ity functional tional capacity
ment (BTE) tional capacity ity evaluation evaluation evaluation assessment bat- evaluation
work simulator evaluation (PWPE) tery (WD-FAB)
Full FCE studied Full FCE studied Full FCE studied Full FCE studied Full FCE studied Full FCE studied Full FCE studied Full FCE studied Full FCE studied

Cheng and Brubaker et al. Reneman et al. [20] Brassard et al. Branton et al.
Cheng [13] [19] [24] [28]
Cheng and Gross and Battié [9] Durand et al.
Cheng [14] [25]
Durand et al.
[27]
Lechner et al.
[26]
Subtest(s) Subtest(s) Subtest(s) Subtest(s) studied Subtest(s) Subtest(s) Subtest(s) Subtest(s) Subtest(s) studied
studied studied studied studied studied studied studied

Gouttebarge et al. [15] Rustenburg et al. [18] Reneman et al. Meterko et al. James et al. [30]
[21] [29]
Gouttebarge et al. [16] Trippolini et al. James et al. [31]
[22]
Gouttebarge et al. [17] Gross and Battié
[33]
Rustenburg et al. [18]

Some studies are mentioned more than once, because they study multiple FCE methods

13
J Occup Rehabil

Table 3  Results of the methodological quality appraisal


Author(s) and year of publication FCE method Objective Population Procedure Statistics Methodological quality

Studies on the Baltimore Therapeutic Equipment (BTE) work simulator


 Cheng and Cheng [13] + + + + + 5 = High
 Cheng and Cheng [14] + + + + + 5 = High
Studies on the Blankenship WorkEval functional capacity evaluation
 Brubaker et al. [19] + - + + + 3 = Moderate
Studies on the Ergo-Kit (EK) functional capacity evaluation
 Gouttebarge et al. [15] + + + + and ± + 4–5 = High
 Gouttebarge et al. [16] + + + ± + 4 = High
 Gouttebarge et al. [17] + + + + and + + 5 = High
Studies on the ERGOS work simulator (EWS)
 Rustenburg et al. [18] + + + + + 5 = High
Studies on the Isernhagen work systems (IWS) functional capacity evaluation
 Gross and Battié [9] + + + + + 5 = High
 Gross and Battié [33] + + + + + 5 = High
 Reneman et al. [20] + + ± ± + 3 = Moderate
 Reneman et al. [21] + + ± + + 4 = High
 Trippolini et al. [22] - + ± −* and + + 1–2* = Low
Studies on the physical work performance evaluation (PWPE)
 Brassard et al. [24] + + + − + 3 = Moderate
 Durand et al. [25] + + + + + 5 = High
 Durand et al. [27] + + + + ± 4 = High
 Lechner et al. [26] + + + + + 5 = High
Studies on the short-form functional capacity evaluation
 Branton et al. [28] + + + + + 5 = High
Studies on the work disability functional assessment battery (WD-FAB)
 Meterko et al. [29] + + ± + + 4 = High
Studies on the WorkHab functional capacity evaluation
 James et al. [30] + + + + and + + 5 = High
 James et al. [31] + + − + and + + 4 = High

*The test–retest time interval was 10 months long. However, video recordings of subjects were used to assess intra- and inter-rater reliability,
preventing learning effects and carry-over effects in the subjects

All studies did report the necessary outcomes, but in some outcomes on the lifting tests [16]. Agreement between raters
studies it was unclear whether extra results were reported. strongly varied for the lifting tests of the EK from low to
In studies which needed to use two equivalent groups of high, but was mostly high [15, 16]. The same applies to
subjects, it was mostly not mentioned whether these groups agreement within raters [15]. Low discriminative abilities
were really comparable at baseline. In most of the studies were found for the EK lifting tests (discriminant/divergent
that did report this aspect, groups were equivalent for impor- validity). Little or no association was found between the
tant characteristics. Lastly, it was sometimes unclear whether EK lifting tests and the Von Korff Questionnaire, indicat-
studies received funding from involved parties. Four studies ing good convergent validity [17]. The EWS shows a low
clearly reported that there was no financial influence, while to moderate concurrent validity in comparison with the EK
most others did not mention financial support. [18]. Within the IWS a varying test–retest reliability and
The job-specific BTE has a moderate predictive valid- reproducibility was found in the material-handling compo-
ity for return-to-work and employment status [13, 14]. The nent with moderate to high agreement between outcomes.
Blankenship FCE has a sensitivity of 80.0% and specificity Overall, this IWS component provided relatively stable out-
of 84.2% [19]. This indicates good diagnostic abilities. The comes with limited variation [20]. Agreement between raters
EK shows an overall agreement for outcomes on the lift- was moderate for the lifting tests of the IWS [21] and moder-
ing tests ranged from 8.6 to 3.4, which might be interpreted ate to high agreement between raters was found for the phys-
as moderate and indicates moderate variability between ical and behavioral scales used in the IWS [22]. Agreement

13
J Occup Rehabil

Table 4  Studies sorted by topic (type of studied psychometric property)


Reliability Validity Other
Reproducibility/test–retest reliability Criterion-related validity Construct validity Diagnostic properties Responsiveness Internal con-
sistency
Agreement Reliability Concurrent Predictive Discriminant/ Convergent Sensitivity Specificity Internal External
validity validity divergent Validity responsive- responsive-
validity ness ness

Gouttebarge Brassard et al. [24] Rustenburg Branton et al. Gouttebarge Gouttebarge Brubaker Brubaker Durand et al. Durand et al. James et al.
et al. [16] Reneman et al. [20] et al. [18] [28] et al. [17] et al. [17] et al. [19] et al. [19] [27] [27] [30]
Inter-rater Intra-rater
reliability reliability
James et al. Durand et al. Gouttebarg Cheng and Meterko et al. Meterko et al.
[30] [25] et al. [15] Cheng [13] [29] [29]
Gouttebarge James et al. Cheng and
et al. [15] [31] Cheng [14]
Gouttebarge Trippolini Gross and
et al. [16] et al. [22] Battié [9]
James et al. Gross and
[31] Battié [33]
Reneman Lechner et al.
et al. [21] [26]
Trippolini
et al. [22]

13
J Occup Rehabil

within raters was moderate to high for the physical and The predictive validity, for example, is low in some meth-
behavioral scales [22]. Studies show that performance in ods, but high in others. An important question, however,
the IWS (number of failed tasks and weight lifted) has no or remains whether FCE’s should be used to predict, establish
low predictive value for recovery outcomes such as timely or diagnose work abilities. In case of the latter, sensitivity
or sustained return-to-work and future pain, based on days and specificity become more important. Inter-rater reliability
to benefit suspension, days to claim closure and recurrence seems to be good in most FCE methods, as well as intra-
[9, 33]. A varying test–retest reliability and reproducibility rater reliability. Test–retest outcomes are also promising in
was found for the sections of the PWPE with moderate to most cases. It can be suggested that the reliability of current
high agreement between outcomes and moderate agreement FCE methods is generally well researched, although validity
between outcomes of the overall PWPE [24]. In other words, has been studied less frequently. Overall, the FCE methods
the PWPE provides relatively stable outcomes with limited included in this review show variable results, however, most
variation. Agreement between raters was moderate to high outcomes show moderate to good validity.
for the PWPE sections and high for the overall PWPE score An important side note to the study results is that,
[25]. Change within subjects was observed by two PWPE “according to several authors, when a rater is involved in
sections (the dynamic strength section and the position tol- scoring the evaluation, intra-rater reliability is equivalent
erance section), but not by the mobility sections or by the to test–retest reliability because the accuracy of the FCE is
overall PWPE (internal responsiveness) [27]. The PWPE influenced by the skill of the rater” [34]. In present study
was not able to distinguish change on the subjects’ outcomes however, the term test–retest reliability is used, seeing this is
for reference measures (concurrent and empirical data) of the term that was chosen in the original papers. Furthermore,
health status (external responsiveness) [27]. A study by concurrent validity should be determined by studying the
Lechner et al. [26] shows that performance in the PWPE has relation between the studied assessment and its gold stand-
a high predictive value in return to work. The performance ard [6]. In FCE however, there is no gold standard available
items in the short-form FCE has got predictive value for [6]. Therefore, the use of the term “concurrent validity” in
recovery outcomes such as timely and sustained return-to- the study by Rustenburg et al. [18] seems to be inadequate.
work [28]. Good discriminative abilities were found for the It could be better to speak in terms of ‘comparison’ or ‘cor-
WD-FAB physical functioning and behavioral health scales relation’ [6]. Another aspect that should be taken into con-
for differentiating between physical and mental disability sideration is the study sample in which the psychometric
(discriminant and divergent validity) [29]. Low to moderate properties of the FCE methods are researched. The outcomes
association was found between the WD-FAB physical func- of the reviewed studies should not be generalized to broader
tioning scale and the PROMIS (convergent validity) [29]. populations, because they are specific to the study sample
Low to high associations were found between the WD-FAB at hand. Validity or reliability of FCE methods in certain
behavioral health scale and the BASIS [29]. A moderate to pathologies might not be the same for the general popula-
high agreement between outcomes in the three subtests of tion. Finally and most important, this review only included
the WorkHab manual handling component was found and evidence published since May 2004. The outcomes of the
a high agreement between outcomes on the overall manual present study should be integrated with the pre-existing evi-
handling score (test–retest reliability and reproducibility) dence found in systematic reviews by Innes and Straker [35,
[30]. Overall, the WorkHab manual handling score provided 36], Gouttebarge et al. [6], Wind et al. [37] and Innes [7],
stable outcomes with little variation. Agreement between for a comprehensive understanding.
raters was moderate to high for the outcomes of the subtests Since one of the main aims of this review was to enable
of the WorkHab manual handling component and moderate comparison of FCE methods in order to make objectively
for the overall manual handling component outcome (inter- informed decisions, it is also important to look beyond their
rater reliability) [31]. Agreement within raters was moder- psychometric properties (effectiveness). The efficiency of
ate to high for the subtests and high for the overall manual FCE methods is another essential aspect to take into account.
handling component (intra-rater reliability) [31]. James et al. According to Hart et al., safety should be achieved before
found a high internal consistency for the manual handling considering validity and reliability. When validity and reli-
component and the individual tests of the manual handling ability are demonstrated, practicality and utility should sub-
component [30]. sequently be taken into account, in that order [38]. Costs,
time spent on the assessment, user-friendliness and accept-
ability are some important factors, which might sometimes
Discussion play a bigger role in the choice of FCE method than their
psychometric properties. Many FCE methods have promis-
Overall, the psychometric properties of the studied FCE ing psychometric properties, but are sometimes inefficient
methods somewhat vary between, but also within, methods. in both time and cost [39]. Short-form FCEs or FCEs in the

13
J Occup Rehabil

form of a structured interview provide a potential answer Some researchers state that the literature search in sys-
to many problems. Nonetheless, the psychometrics of these tematic reviews should be exhaustive, so that all possibly
FCE methods need more scientific substantiation. Accept- relevant data are obtained [36]. However, given a limited
ability of FCE was studied by questioning its usefulness in time frame and limited resources, an exhaustive approach
an expert panel [40]. The results showed that two-thirds of could not be guaranteed for this review. Therefore, it is a
the experts found FCE useful because they confirm per- possibility that potentially relevant studies were missed. For
sonal judgements and provide objective information. How- example: studies that were not available in the researched
ever, reasons for not finding FCE useful were that it did not databases, studies that used different keywords or studies
seem objective and did not provide any new information. published in languages other than English, Dutch or French.
Job-specific FCE might provide a solution to the lack of An important limitation of this study is that names of known
new or non-specific information. The client-centeredness of FCE methods were not used as search terms. This may have
this kind of FCE is an important asset, as it should reduce resulted in some relevant studies being missed and this
redundant information and provide directly applicable input should be taken into consideration when interpreting the
for the vocational rehabilitation process. Furthermore, only results. Lastly, publication bias could have had a potential
20% of the experts judged FCE to be a useful prognostic confounding effect on the literature search, where studies
instrument. Most of them argued that FCE is an evaluative with negative outcomes were possibly not published. Nev-
measure and is not to be used for predictive purposes. Stud- ertheless, the literature search was performed as thoroughly
ies on the predictive validity of several FCE methods should as possible, by researching all available databases, using
replace these opinions with evidence. broad search terms and synonyms and applying the inclu-
RTW is often seen a binary event; either the person sion and exclusion criteria. Another limitation of the study
returns to work (in the same job or an adapted job) or not. is the fact that studies that focus on single tasks within a
Predicting whether the person will be able to return to work full FCE evaluation such as for instance a ‘lifting assess-
involves assessing his skills versus the demands from the ment’ were excluded. This choice could have diminished
job. Since RTW has become a central aspect on both politi- the power of the study results. Further research is necessary
cal and clinical level, improvement in the prediction of RTW to evaluate also the psychometric properties of the separate
could have a huge potential. Not only to inform vocational tasks of the FCE.
rehabilitation workers during the rehabilitation process but It is important to interpret the quality appraisal scores
also to inform policy makers in their RTW policy. Future in this review correctly. The scoring system used does not
research should therefore not only focus on reviewing and consider all important factors that can determine the qual-
researching predictors for RTW but also examining the ity of reviewed studies. Although a checklist might provide
incremental predictive validity beyond the described predic- more useful information on these factors, a scoring system
tors. In the Cochrane review of Mahmud et al. no evidence was used to facilitate the interpretation of the study results.
was found for or against the effectiveness of performance- Another reason for using the three-level quality appraisal
based measures in preventing re-injury after return to work scale by Gouttebarge et  al. [6] was that most validated
[41]. Kuijer et al. performed a systematic review on the pre- checklists, such as the COSMIN checklist, did not seem to
dictive validity of performance-based measures to predict be suitable for some of the reviewed studies. Other checklists
work participation in patients with MSD and concluded that were mostly designed for appraising experimental studies,
the predictive strength is in general modest [42]. They con- but not for studies on psychometric properties.
cluded that this needs not be surprising as work participation This study has also some additional limitations. Firstly,
can be seen as a multidimensional construct according to we decided to use only the search terms in English, Dutch
the ICF [43]. Personal and environmental factors influence and French. Expanding this with other languages could have
whether or not the patients returns to work; the results of an given a broader view on the FCE methods. Secondly, we
FCE can be integrated in the ICF model (domain of activi- only used Web of science, Trip Database, Journal Storage,
ties). The definition of an FCE also implies that the capacity Pubmed, Embase, PEDro and OTSeeker as search engines
to execute activities may be determined by a range of factors and did not include Scopus and Psychinfo. Our methodol-
or determinants [8, 44]. Using the ICF the complex nature ogy was primarily based on a previous systematic review by
of work participation can be captured. A combination of Gouttebarge et al. [6]. They also did not include ‘Scopus’
performance and non-performance based measures assessing and ‘Psychinfo’. In their study, no articles from Psychinfo
different constructs of work participation can also be used to were included. Therefore, we decided not to include this
improve the ability to predict work participation. Pas et al. search engine. We are however not sure whether additional
found that both RTW experts as clients found FCE useful information could have been added to this review. Thirdly,
for the advice on RTW and that clients felt they were taking many FCE methods have been used for a long time in
seriously by performing an FCE [45]. rehabilitation clinics and were evaluated internally. These

13
J Occup Rehabil

evaluations are likely to exist only as white papers. Unfor- 6. Gouttebarge V, Wind H, Kuijer PP, Frings-Dresen MH. Reli-
tunately, we did not have access to these white papers. In ability and validity of Functional capacity evaluation methods: a
systematic review with reference to Blankenship system, Ergos
future research, additionally, a scoping review, including work simulator, Ergo-Kit and Isernhagen work system. Int Arch
also other literature of for instance qualitative research Occup Environ Health. 2004;77(8):527–537.
on the use of FCE can also increase our knowledge or the 7. Innes E. Reliability and Validity of Functional Capacity Evalu-
length, costs and procession time. ations: An Update. Int J Disabil Manag. 2012;1(1):135–148.
8. Soer R, van der Schans CP, Groothoff JW, Geertzen JH, Rene-
As an inclusion criterion, we focused solely on RCT’s and man MF. Towards consensus in operational definitions in func-
CCT’s. This approach was specifically chosen because we tional capacity evaluation: a Delphi survey. J Occup Rehabil.
aimed to include only studies with a high level of evidence. 2008;18(4):389–400.
It has the advantage that the search strategy was clearly 9. Gross DP, Battié MC. Functional capacity evaluation perfor-
mance does not predict sustained return to work in claimants
delineated. However, a less strict method and including also with chronic back pain. J Occup Rehabil. 2005;15(3):285–294.
studies with less power could have broadened the scope of 10. Haglund L, Karlsson G, Kielhofner G, Lai JS. Validity of the
this review. Swedish version of the worker role interview. Scand J Occup
This systematic review has provided a more extensive Ther. 1997;4(1–4):23–29.
11. Chen J. Functional capacity evaluation & disability. Iowa
and updated representation of the psychometric qualities of Orthop J. 2007;27(1):121–127.
several FCE methods. Some more ground has been covered 12. King PM, Tuckwell N, Barrett TE. A critical review of func-
on the better known FCE methods, while new methods with tional capacity evaluations. Phys Ther. 1998;78(8):852–866.
different approaches are on the rise and gaining scientific 13. Cheng AS, Cheng SW. The predictive validity of job-specific
functional capacity evaluation on the employment status of
support as well. We also highlight the fact that quality evalu- patients with nonspecific low back pain. J Occup Environ Med.
ations of FCE methods were sometimes done by only one 2010;52(7):719–724.
research group. This can create a potential risk of bias. The 14. Cheng AS, Cheng SW. Use of job-specific functional capacity
newer approaches, such as the short-form FCE need to be evaluation to predict the return to work of patients with a distal
radius fracture. Am J Occup Ther. 2011;65(4):445–452.
further examined on several psychometric properties. Psy- 15. Gouttebarge V, Wind H, Kuijer PP, Sluiter JK, Frings-Dresen
chometrics of most of the well-known methods are thor- MH. Intra- and interrater reliability of the Ergo-Kit functional
oughly researched but some of the research indicates weak- capacity evaluation method in adults without musculoskeletal
nesses in their reliability and validity. Future research should complaints. Arch Phys Med Rehabil. 2005;86(12):2354–2360.
16. Gouttebarge V, Wind H, Kuijer PP, Sluiter JK, Frings-Dresen
address how these weaknesses can be overcome, while also MH. Reliability and agreement of 5 Ergo-Kit functional capac-
taking into account practicality and utility-aspects of the ity evaluation lifting tests in subjects with low back pain. Arch
FCE. Phys Med Rehabil. 2006;87(10):1365–1370.
17. Gouttebarge V, Wind H, Kuijer PP, Sluiter JK, Frings-
Acknowledgements  The research team wants to thank Professor Ev Dresen MH. Construct validity of functional capacity evalu-
Innes, Professor Haije Wind, Professor Michiel Reneman, Mr Dirk ation lifting tests in construction workers on sick leave as a
Vandamme and Ms Linda Gabriël for their feedback. result of musculoskeletal disorders. Arch Phys Med Rehabil.
2009;90(2):302–308.
Compliance with ethical standards  18. Rustenburg G, Kuijer PP, Frings-Dresen MH. The concur-
rent validity of the ERGOS Work Simulator and the Ergo-Kit
with respect to maximum lifting capacity. J Occup Rehabil.
Conflict of interest  Stijn De Baets, Patrick Calders, Noortje Schal- 2004;14(2):107–118.
ley, Katrien Vermeulen, Sofie Vertriest, Lien Van Peteghem, Marieke 19. Brubaker PN, Fearon FJ, Smith SM, McKibben RJ, Alday J,
Coussens, Fransiska Malfait, Guy Vanderstraeten, Geert Van Hove, Andrews SS, et al. Sensitivity and specificity of the blankenship
and Dominique Van de Velde declares that they have no conflict of FCE system’s indicators of submaximal effort. J Orthop Sports
interest. Phys Ther. 2007;37(4):161–168.
20. Reneman MF, Brouwer S, Meinema A, Dijkstra PU, Geertzen
JH, Groothoff JW. Test-retest reliability of the Isernhagen work
systems functional capacity evaluation in healthy adults. J Occup
References Rehabil. 2004;14(4):295–305.
21. Reneman MF, Fokkens AS, Dijkstra PU, Geertzen JH, Groothoff
1. OECD. Sickness, disability and work: breaking the barriers. Paris: JW. Testing lifting capacity: validity of determining effort level
OECD Publishing. by means of observation. Spine. 2005;30(2):E40–E46.
2. Andrén D. Work, sickness, earnings, and early exits from the labor 22. Trippolini MA, Dijkstra PU, Cote P, Scholz-Odermatt SM,
market. An empirical analysis using Swedish longitudinal data. Geertzen JH, Reneman MF. Can functional capacity tests predict
Göteborg: Göteborg University; 2001. future work capacity in patients with whiplash-associated disor-
3. Hakim C. The social consequences of high unemployment. J Soc ders? Arch Phys Med Rehabil. 2014;95(12):2357–2366.
Policy. 1982;11(4):433–467. 23. Numally JC. Psychometric theory. New York: McGraw-Hill; 1978.
4. Dooley D, Fielding J, Levi L. Health and unemployment. Annu 24. Brassard B, Durand MJ, Loisel P, Lemaire J. Étude de fidélité test-
Rev Public Health. 1996;17(1):449–465. retest de L’Évaluation des Capacités Physiques reliées au Travail.
5. Takala J, Hamalainen P, Saarela KL, Yun LY, Manickam K, Jin Can J Occup Ther. 2006;73(4):206–214.
TW, et al. Global estimates of the burden of injury and illness at 25. Durand MJ, Loisel P, Poitras S, Mercier R, Stock SR, Lemaire
work in 2012. J Occup Environ Hyg. 2014;11(5):326–337. J. The interrater reliability of a functional capacity evaluation:

13
J Occup Rehabil

the physical work performance evaluation. J Occup Rehabil. 37. Wind H, Gouttebarge V, Kuijer PFM, Frings-Dresen MHW.
2004;14(2):119–129. Assessment of functional capacity of the musculoskeletal system
26. Lechner DE, Page JJ, Sheffield G. Predictive validity of a func- in the context of work, daily living, and sport: a systematic review.
tional capacity evaluation: the physical work performance evalu- J Occup Rehabil. 2005;15(2):253–272.
ation. Work. 2008;31(1):21–25. 38. Hart DL, Isernhagen SJ, Matheson LN. Guidelines for functional
27. Durand MJ, Brassard B, Hong QN, Lemaire J, Loisel P. Respon- capacity evaluation of people with medical conditions. J Orthop
siveness of the physical work performance evaluation, a functional Sports Phys Ther. 1993;18(6):682–686.
capacity evaluation, in patients with low back pain. J Occup Reha- 39. Gross DP, Battie MC, Asante AK. Evaluation of a short-form
bil. 2008;18(1):58–67. functional capacity evaluation: less may be best. J Occup Rehabil.
28. Branton EN, Arnold KM, Appelt SR, Hodges MM, Battie MC, 2007;17(3):422–435.
Gross DP. A short-form functional capacity evaluation predicts 40. Wind H, Gouttebarge V, Kuijer PPFM, Sluiter JK, Frings-Dresen
time to recovery but not sustained return-to-work. J Occup Reha- MH. Het nut van Functionele Capaciteit Evaluatie: de visie van
bil. 2010;20(3):387–393. experts. TBV – Tijdschrift voor Bedrijfs- en Verzekeringsge-
29. Meterko M, Marfeo EE, McDonough CM, Jette AM, Ni P, neeskunde. 2005;13(10):359–366.
Bogusz K, et al. Work disability functional assessment battery: 41. Mahmud N, Schonstein E, Schaafsma F, Lehtola MM, Fassier
feasibility and psychometric properties. Arch Phys Med Rehabil. JB, Verbeek JH, Reneman MF. Functional capacity evalua-
2015;96(6):1028–1035. tions for the prevention of occupational re-injuries in injured
30. James C, Mackenzie L, Capra M. Test-retest reliability of the man- workers. Cochrane Database Syst Rev. 2010;(7):CD007290.
ual handling component of the WorkHab functional capacity eval- doi:10.1002/14651858.CD007290.pub2.
uation in healthy adults. Disabil Rehabil. 2010;32(22):1863–1869. 42. Kuijer PP, Gouttebarge V, Brouwer S, Reneman MF, Frings-
31. James C, Mackenzie L, Capra M. Inter- and intra-rater reliability Dresen MH. Are performance-based measures predictive of
of the manual handling component of the WorkHab functional work participation in patients with musculoskeletal disor-
capacity evaluation. Disabil Rehabil. 2011;33(19–20):1797–1804. ders? A systematic review. Int Arch Occup Environ Health.
32. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, 2012;85(2):109–123.
Dekker J, et al. Quality criteria were proposed for measurement 43. WHO. International classification of functioning, disability, and
properties of health status questionnaires. J Clin Epidemiol. health: ICF. Version 1.0. Geneva: World Health Organization;
2007;60(1):34–42. 2001.
33. Gross DP, Battie MC. Does functional capacity evaluation predict 44. Escorpizo R, Finger ME, Reneman MF. Integration and applica-
recovery in workers’ compensation claimants with upper extrem- tion of the International Classification of Functioning, Disability
ity disorders? Occup Environ Med. 2006;63(6):404–410. and Health (ICF) in return to work. In: Schultz IZ, Gatchel RJ,
34. Gibson LA, Dang M, Strong J, Khan A. Test-retest reliability of editors. Handbook of return to work. Boston: Springer; 2016.
the GAPP functional capacity evaluation in healthy adults. Can J p. 99–118.
Occup Ther. 2010;77(1):38–47. 45. Pas LW, Kuijer PPFM, Wind H, Sluiter JK, Groothoff JW, Brou-
35. Innes E, Straker L. Validity of work-related assessments. Work. wer S, et al. Clients’ and RTW experts’ view on the utility of
1999;13(2):125–152. FCE for the assessment of physical work ability, prognosis for
36. Innes E, Straker L. Reliability of work-related assessments. Work. work participation and advice on return to work. Int Arch Occup
1999;13(2):107–124. Environ Health. 2014;87(3):331–338.

13

You might also like