You are on page 1of 6

International Journal of Industrial Ergonomics 71 (2019) 111–116

Contents lists available at ScienceDirect

International Journal of Industrial Ergonomics


journal homepage: www.elsevier.com/locate/ergon

Intra-rater and inter-rater reliability of the rapid entire body assessment T


(REBA) tool
Adam H. Schwartz∗, Thomas J. Albin, Susan G. Gerberich
University of Minnesota, United States

A R T I C LE I N FO A B S T R A C T

Keywords: Background: Ergonomics researchers and practitioners use many techniques to assess risk. The Rapid Entire Body
Intra-rater reliability Assessment (REBA) is a common tool used to facilitate the measurement and evaluation of the risks associated
Inter-rater reliability with working postures as a part of ergonomic workload. However, little work has been reported regarding the
REBA reliability of REBA reporting.
Assessment
Objective: This study assesses the reliability of this commonly used tool for research and practice.
Methods: The study was conducted as part of the larger Safe Workload Ergonomic Exposure Project (SWEEP),
which is a University of Minnesota research initiative for custodians. For this effort, a secondary data analysis
was conducted on data collected during a study of custodians’ exposures to risks of musculoskeletal disorders.
Eight observers used the REBA tool to sequentially evaluate tasks performed two times in succession by the same
individual.
Results: This study reports high intra-rater reliability (ICC = 0.925) for REBA raw scores and moderate inter-
rater reliability (IRR) (Fleiss kappa = 0.54) for a categorical scoring of REBA.
Conclusion: A moderate amount of IRR was found, and a standardized training and calibration protocol is
proposed as a potential means to improve intra- and inter-rater reliability.

1. Introduction financial and temporal resources and as such need to do triage for in-
tervention.
Ergonomists use many tools to measure risk of injury. The Rapid However, little work has been reported regarding the validity or
Entire Body Assessment (REBA) is a practitioner's field tool reliability of REBA. This is a concern due to lack of knowledge of the
(McAtamney and Hignett, 2000; Coyle, 2005; Motamedzade et al., consistency of this tool. The Safe Workload Ergonomic Exposure Project
2011) that is designed to facilitate the measurement and evaluation of (described below) used REBA as an assessment tool during a study of
risks associated with working postures as a part of ergonomic workload. custodians’ exposure to risk factors associated with musculoskeletal
It is “sensitive to musculoskeletal risks in a variety of tasks, divides the disorders. There were multiple instances of two successive observations
body into segments to be coded individually, regarding movement of a task by the same observer; in addition, multiple observers eval-
planes, and provides a scoring system for muscle activity caused by uated multiple tasks performed by a single individual. These data af-
static, dynamic, rapidly changing or unstable postures” (McAtamney forded an assessment of the intra-rater and inter-rater reliability of
and Hignett, 2000). The REBA tool is also commonly utilized in re- REBA.
search settings (Janowitz et al., 2006; Jones and Kumar, 2010; Kee and
Karwowski, 2007; Pascual and Naqvi, 2008; Nawi et al., 2013; Lee 1.1. Safe Workload Ergonomic Exposure Project (SWEEP) study
et al., 2008). It is a popular assessment tool, well represented in the
technical literature (Dempsey et al., 2005); for example, a general The SWEEP study is a mixed prospective and retrospective cohort
Google Scholar search for “REBA Ergonomics” found 2700 results. study of unionized janitors in the Twin Cities, Minnesota. It was in-
David (2005) concluded that assessments that use observations itiated by the University of Minnesota, Division of Environmental
(such as the REBA) “provide the levels of costs, capacity, versatility, Health Sciences and examined mental workload, physical workload,
generality and exactness best matched to the needs of occupational sleep, stress, job satisfaction, fitness, and occupational injury burden on
safety and health practitioners.” These practitioners have limited janitors. Reflecting the population, the study was conducted to


Corresponding author.
E-mail address: schw1562@umn.edu (A.H. Schwartz).

https://doi.org/10.1016/j.ergon.2019.02.010
Received 27 March 2018; Received in revised form 4 January 2019; Accepted 19 February 2019
Available online 18 March 2019
0169-8141/ © 2019 Elsevier B.V. All rights reserved.
A.H. Schwartz, et al. International Journal of Industrial Ergonomics 71 (2019) 111–116

accommodate English, Spanish, and Somali languages. use of percent agreement on the basis that it does not account for
chance agreement between raters.
1.2. Current data regarding reliability and validity of REBA Janowitz et al. (2006), reported a similar level of inter-rater relia-
bility of a modified version of REBA when classifying risks to the back:
1.2.1. Studies of inter-method reliability 0.54 for the upper back and 0.66 for the lower back. However, the
Several studies (Jones and Kumar, 2010; Kee and Karwowski, 2007; modification of the REBA tool reported by Janowitz et al. (2006), and
Ansari and Sheikh, 2014; Manavakun, 2017; Gentzler and Stader, 2010) the restricted scope limiting scoring to the back, limits the ability to
have compared the results of REBA analyses with results from appli- generalize these results to the standard form of the REBA tool. Simi-
cation of other risk assessment tools. These studies might be described larly, although Lamarão et al. (2014) also report modest agreement for
as reports of inter-method reliability, where inter-method reliability can intra- and inter-rater reliability of the REBA tool, the version they
be thought of as an instance of parallel forms reliability; that is, two studied was a Portuguese-language version, somewhat limiting the
different assessment tools intended to measure the same concept, such ability to generalize to an English-language version of REBA.
as the risk of musculoskeletal injury, should give consistent results. This
could also be considered convergent validity, which is a form of con- 1.2.3. Raw scoring versus categorical scoring of REBA
struct validity. REBA assigns raw scores to risk analyses of tasks; those scores are
These inter-method reliability studies of REBA have compared the continuous and range from a minimum of 0–11 and upward. In turn,
risk, or action categories, to other tools such as the Rapid Upper Limb those raw scores are interpreted by converting them to one of five ac-
Assessment (RULA) and the Ovako Working Posture Analysis System tion, or risk, categories. Those categories are Negligible Risk, Low Risk,
(OWAS). Medium Risk, High Risk, and Very High Risk. Thus, the action expected
Ansari and Sheikh (2014) assessed risk levels in the same sample of of the evaluator depends on the action category, rather than on a spe-
workers in India with REBA and RULA. RULA classified 40% workers as cific raw score. As an example, raw scores of 8, 9 and 10 fell into the
at a high-risk level, compared to REBA, which placed 53% of the High-Risk action category, and the recommended action is “Investigate
workers evaluated at the same level, as high risk. and Implement Change”.
Manavakun (2017) observed tree cutting in Thailand, using both the
REBA and OWAS, and found that “postural load [as measured] by REBA
1.2.4. Predictive validity of REBA
was generally higher than by OWAS … 22.6% of 248 postures were
The purpose of the REBA and other similar tools is to enable prac-
classified at the action category 3 or 4 by OWAS, about 72.6% of the
titioners to identify jobs that are at risk for development of muscu-
postures were classified into action level 3 or 4 by REBA … OWAS
loskeletal disorders, that is, they must first effectively and efficiently
underestimated posture-related risk compared to REBA.”
discriminate at-risk jobs from jobs not at-risk, before attempting to
Kee and Karwowski (2007) used OWAS, REBA and RULA to eval-
discriminate levels of risk. An ideal assessment tool would have high
uate tasks in various industries (iron, steel, electronics, chemical,
sensitivity and high specificity – that is, minimizing false positives and
medical, and automotive) and found that the “inter-method reliability
false negatives.
for postural load category between OWAS and RULA was … 29.2%, and
However, before the predictive validity of REBA risk scores can be
the reliability between RULA and REBA was 48.2%.” They inferred that,
examined, it is necessary to establish that the measurements made by
“compared to RULA, OWAS and REBA generally underestimated pos-
different observers are consistent and reliable. Reliability of a measure
tural loads for the analyzed postures (Kee and Karwowski, 2007).
is a prerequisite to establishing validity (Moskal and Leydens, 2000;
Jones and Kumar (2010) evaluated sawmill workers using five dif-
Cook and Beckman, 2006).
ferent assessment tools, RULA, REBA, the American Conference of
Two types of validity are of interest; intra-rater and inter-rater re-
Government Industrial Hygienists Threshold Limit Value for Mono-Task
liability. Without some knowledge of the consistency of ratings within
Hand Work (ACGIH-TLV), the Strain Index (SI) and the Concise Ex-
individual raters and between different raters, it is not possible to assess
posure Index (OCRA). They scored the different tools in two ways: first
the predictive validity of REBA.
for agreement on risk category (at risk vs. not at risk) and for perfect
agreement between risk levels (e.g., both tool A and tool B rate the job
as low, high or very high risk). Excepting the ACGIH-TLV, there was a 2. Methods
high level of agreement between the evaluation tools for at risk versus
not at-risk classifications and a more modest level of perfect agreement 2.1. Intra-rater reliability
of classifications.
Gentzler and Stader (2010) found that for firefighters lifting hoses In collecting the data for a field study of the musculoskeletal risks
“above the shoulder for drainage, the NIOSH lifting equation showed a associated with custodial tasks (described below), several observers
danger of lifting the hose from the ground to chest height and especially used the REBA tool to sequentially evaluate a task performed two times
from chest height to above the shoulders, and the REBA determined in succession by the same individual. Comparing the successive ratings
that there was a very high level of risk for injury” for the same action. provided an opportunity to assess the intra-rater reliability of REBA risk
While these studies suggest that REBA is reliable in the sense that it assessments (Table 1). The participants in this project were all experi-
measures similar things, as do other tools intended to measure mus- enced custodians. There were 30 people, 15 men and 15 women.
culoskeletal risk, they do not describe inter- or intra-rater reliability of
Table 1
REBA. Knowledge of intra- and inter-rater reliability is necessary as a
Ergonomic impact (REBA score) by task.
prerequisite for establishing the validity of the tool.
Task Mean Standard Deviation Median Minimum Maximum
1.2.2. Intra-and inter-rater reliability of REBA
Toilet cleaning 10.40 2.11 10 7 13
At present, there are limited data available regarding the intra- and Dusting 8.92 2.41 9 4 13
inter-rater reliability of the REBA tool. Hignett and McAtamney Large trash 10.49 1.09 11 4 13
(McAtamney and Hignett, 2000) reported that inter-observer reliability Small trash 10.68 1.44 11 4 13
of REBA scoring ranged between 62 and 85 percent, but it is not clear if Mopping 9.13 1.86 10 2 12
Mirror cleaning 9.27 1.68 9 5 13
the agreement referred to raw or categorical scores. Lamarão et al.
Sink cleaning 8.96 2.36 9 4 13
(2014) also reported modest percent agreement for two observers in a Vacuuming 9.65 2.08 10 1 13
Portuguese-language version of REBA. Cohen (1960) has criticized the

112
A.H. Schwartz, et al. International Journal of Industrial Ergonomics 71 (2019) 111–116

Balanced (10 in each age group) tertile cut points of the sample were (20%) is attributable to measurement error …”. Onis gives the formula
made with the groups being ages 21–39, 40–56, and 60–71. The for calculating R as
average height for women 1.59 m and their average weight was 73 kg.
(TEMinter )2
On average, men were 1.68 m tall and weighed 80.5 kg. R=1−
SD 2
The consistency of the repeated measurements (intra rater relia-
bility) was assessed using an Intra Class Correlation (ICC) as suggested where TEMinter is the inter-rater TEM and SD2 is an estimate of the
in previous work on intra-rater reliability (Bennell et al., 1998; Berg variance of all measurements.
et al., 1995). ICCs are “measures of the relative similarity of quantities
which share the same observational units …”, and in areas of applica- 2.4. Intra-rater reliability of REBA in the SWEEP study
tion such as “reliability studies (e.g., products from the same machine,
measurements of characteristics for the same person), and “… persons During the SWEEP study, eight observers completed 189 repeated
contacted by the same interviewer (Koch, 2004).” observations (first and second observations) of nine tasks. These paired
Lamarão et al. (2014) note that there are two approaches to using data were used to compute an Intra-Class Correlation (ICC).
tools such as REBA to evaluate injury risk, observation in situ and ob-
servation of video. In this instance, we used live, in situ evaluations, as 2.5. Inter-rater reliability of REBA in the SWEEP study
this is a common practice amongst practitioners. While there is some
possibility of variation in the manner in which a task is performed The Technical Error Measurement and reliability coefficients were
between successive trials, each instance was performed by the same, calculated for a total of eight observers who evaluated eight simulated
experienced individual in exactly the same environment. custodial tasks (emptying large trash cans, emptying small trash cans,
mopping, vacuuming, dusting, cleaning toilets, mirrors, and sinks) as
2.2. Inter-rater reliability they were performed by an experienced custodian. The simulated tasks
were created from information gathered from focus groups and inter-
During the study of the potential musculoskeletal risks associated views with custodians; experienced custodians helped design the tasks
with custodial tasks, several observers concurrently evaluated tasks to be realistic and representative. Seven of the observers were novices,
performed by the same individual. These data provided an opportunity one was a professional ergonomist experienced in risk assessments. Not
to assess the inter-rater reliability of REBA risk assessments. all observers were able to evaluate all tasks.
Gwet (2014) describes inter-rater reliability as two or more raters
“classifying subjects or objects into predefined classes or categories” 2.6. Simulated custodial tasks
and that “the extent to which these two categorizations coincide re-
presents what is often referred to as inter-rater reliability.” As defined For all of the simulated tasks, janitors were asked to complete the
in this paper, inter-rater reliability refers to the consistency of mea- tasks as they would normally.
surements of postures made by multiple observers of the same work Emptying large trash cans: Janitors were asked to move two pre-
tasks performed by the same individual. The consistency of these loaded 18.2 kg trash cans on rolling platforms 6.1 m and then empty the
measurements is described using the Technical Error of Measurement cans, after which they were to place new bags in the cans.
(TEM). The TEM is used to calculate a reliability coefficient (R). Emptying small trash cans: There were four 6.8 kg trash cans; ja-
nitors were asked to grab them from under desks, empty, tie off the
bags, put new liners, put the small bags into a large bin, and then re-
2.3. Technical Error of Measurement place the small bins under the desks.
Mopping: A 3 m by 3 m polished concrete floor was marked off with
Lewis (1999) noted that when two individuals measure the same yellow tape. Inside the area was a table and three folding chairs.
thing the value obtained will not always be the same, thus, producing Janitors were given a Kentucky string mop and mop bucket with
the Technical Error of Measurement (TEM). The TEM is commonly used wringer and asked to clean the floor. Janitors either moved the chairs
to assess intra-rater and inter-rater reliability of measurements made by and the table out of the 3 m by 3 m area, or simply mopped around the
anthropometrists (Lewis, 1999; Ulijaszek and Kerr, 1999; WHO furniture.
Multicentre Growth Reference Study Groupde Onis, 2006; Geeta et al., Vacuuming: Two 0.91 by 2.44 m rugs were placed side by side,
2009). The TEM may be thought of as the standard deviation of re- creating a 1.83 × 4.9 m area. Janitors were asked to plug a standing
plicate measures (Marks et al., 1985). style vacuum into an extension cord, vacuum the rugs, and then unplug
This paper utilizes TEM to assess the inter-rater reliability of mea- the cord.
surements of postures made concurrently by multiple individuals while Dusting: In an office cubical, purple duct tape was placed on several
observing the same individual performing a custodial task. surfaces to provide a standardized cleaning routine. The horizontal
A formula to calculate inter-rater reliability for multiple observers is surfaces were the: top of the cubical walls, top of the monitor and
given by Onis as computer tower and a filing cabinet. Vertical surfaces included a tele-
N K K 2 1/2 phone, part of a desk lamp, and several pictures on the wall. Janitors
⎛1 1 ⎡ i 2 ( ∑J =i 1 Yij ) ⎤⎞ were provided a feather duster.
TEMinter = ⎜
N
∑ (K ⎢ ∑ Yij − ⎥⎟
i=1 i − 1) ⎢ j = 1 Ki ⎥ Cleaning toilets: Janitors were asked to wipe down the toilet outer
⎝ ⎣ ⎦⎠
surface with a cloth, scrub the bowl with a long-handled brush, and
where Yij is one of the measures made by observer j for task i, Ki is the refill the toilet paper dispenser.
number of observers that measured task i, and N is the number of tasks Mirrors: Janitors were asked to clean a mirror that was approxi-
observed (WHO Multicentre Growth Reference Study Groupde Onis, mately 0.61 m × 0.91 m. They were given a cloth and a spray bottle
2006). It is important to note that the formula can accommodate dif- that weighed approximately 2.3 kg.
fering numbers of observers for each of the tasks observed. Sinks: Janitors were asked to clean a white porcelain sink that was
Onis (WHO Multicentre Growth Reference Study Groupde Onis, 0.76 m wide, 0.61 m deep, and 0.35 m tall. They were given a cloth and
2006) describes the Coefficient of Reliability, R, as an estimate of “the a spray bottle that weighed 2.2 kg.
proportion … of the total measurement variance that is not due to Each observer recorded his or her postural assessments on a mod-
measurement error. A reliability coefficient of 0.8 means that 80% of ified REBA scoresheet. The scoresheets were modified to avoid details
the total variability is true variation, while the remaining proportion unnecessary to the data gathering stage, namely the table calculations

113
A.H. Schwartz, et al. International Journal of Industrial Ergonomics 71 (2019) 111–116

Fig. 1. Modified version of REBA scoresheet. With permission from Hignett, S., McAtamney, L. (2000) Rapid Entire Body Assessment (REBA). Applied Ergonomics 31,
201-205.

and score rankings. The scoresheets were collected from all observers; consistency in both the manner in which the tasks were performed by
final REBA scores (Raw Scores) were then determined based on the the custodians on each of the two successive trials and in the REBA
postural assessments recorded on the modified scoresheets. The mod- rating assigned to each trial by the observers.
ified scoresheet is shown in Fig. 1.
3.2. Inter rater reliability
2.7. Inter-rater reliability for raw scores
The Reliability Coefficient, R, of the continuous, raw data was cal-
The Technical Error of Measurement and Reliability coefficients culated to be 0.41. That is, about 59 percent of the total variation in the
were calculated for the REBA raw scores for each of the eight jobs using raw scores was due to inter-rater variation.
the formulas suggested by Onis et al. (WHO Multicentre Growth However, REBA evaluates risk categorically. Consequently, the raw
Reference Study Groupde Onis, 2006). scores were converted to categorical ratings based on the guidelines
described on the REBA scoresheet and the inter-rater reliability of the
2.8. Categorical scoring measurements was assessed.
The Fleiss kappa for the categorical scoring was 0.54. According to
Each raw score was converted to a risk assessment category, based Landis and Koch (Gore et al., 1996), this is considered moderate
on the raw score ranges described for REBA: Negligible Risk, Low Risk, agreement.
Medium Risk, High Risk, and Very High Risk. The risk categories were
then assigned a number from 1 to 5, with 1 corresponding to Negligible 4. Discussion
Risk and 5 to Very High Risk. A Fleiss kappa was calculated for six
observers who concurrently assessed four tasks using categorical scores. This study suggests that there is a strong intra-rater reliability
among individual observers when they immediately observe the same
3. Results task twice. However, the results for inter-rater reliability are more
complex. The moderate agreement among multiple raters regarding
3.1. Intra rater reliability classification of risk categories suggests that categorical classification,
rather than raw scores, should be used to classify risk. A similar
The ICC value for 189 pairs of observations made by nine observers methodology has been used by Motamedzade et al. (2011), and Jones
of eight tasks was determined to be 0.925. These were raw scores, not et al., (Jones and Kumar, 2010).
categorical. The high level of intra-rater reliability suggests that observations
The high ICC value indicates that there was a high degree of made by a single observer are generally comparable. That is, the

114
A.H. Schwartz, et al. International Journal of Industrial Ergonomics 71 (2019) 111–116

assessments made by a single practitioner should be internally com- 5. Conclusion


parable with a higher score indicating higher risk. However, the mod-
erate level of agreement for inter-rater reliability suggests that com- REBA appears to be a promising tool for field use; but, further work
parisons of risk ratings between multiple observers should be in establishing training protocols is necessary. Studies of the intra-rater
interpreted with caution. and inter-rater reliability of risk ratings made with REBA by well-
A moderate level of inter-rater reliability is problematic for practi- trained and calibrated observers are necessary to facilitate future re-
tioners when prioritizing tasks for intervention. It is not uncommon for liability and validity studies.
two or more observers to observe different tasks to evaluate muscu- A second conclusion is that care must be taken when comparing risk
loskeletal injury risk. category classifications made using REBA by different observers. Unless
If the various tasks are prioritized with regard to allocation of a there is reference to some common standard, it may not be possible to
common pool of resources based on REBA scores, an efficient allocation directly compare such classifications. We would recommend that all
of those limited resources depends on the comparability of the risk REBA users be trained and periodically re-calibrated using video of
ratings among the different observers. If the inter-rater reliability is standardized tasks.
suspect, then it is much more difficult to confidently prioritize tasks as
at risk for musculoskeletal disorders. Consequently, it is critical to be Funding
confident of the equivalence of ratings assigned by multiple evaluators.
A second concern with moderate inter-rater reliability relates to the This project was funded by the Midwest Center for Occupational
study of the validity of the REBA tool as a means of predicting risk of Health and Safety (MCOHS), Education and Research Center, Pilot
MSDs. Validity studies necessarily involve pooling large numbers of Projects Research Training Program supported by the National Institute
observations performed by large numbers of evaluators. Confidence in for Occupational Safety and Health (NIOSH), Centers for Disease
the equivalence of those risk ratings among the multiple evaluators will Control and Prevention (OH008434). The contents of this effort are
be essential to demonstrate predictive validity of REBA. solely the responsibility of the authors and do not necessarily represent
An intuitive suggestion is that training in the use of REBA might the official view of the National Institute for Occupational Safety and
affect accuracy and consistency of observations. As one example of how Health, Centers for Disease Control and Prevention, or other associated
a training system and accreditation system for REBA users might be entities.
structured, Gore et al. (1996) described a model developed by the In-
ternational Society for the Advancement of Kinanthropomety (ISAK) for References
the accreditation of anthropometrists. The goal of the system was to
establish an objective means of demonstrating accuracy and reliability Ansari, N.A., Sheikh, M.J., 2014. Evaluation of work posture by RULA and REBA: a case
in the measurement of anthropometric data. The Technical Error of study. IOSR J. Mech. Civ. Eng. 11 (4), 18–23.
Bennell, K., Talbot, R., Wajswelner, H., Techovanich, W., Kelly, D., Hall, A.J., 1998. Intra-
Measurement (TEM) is used as the means of assessing reliability of rater and inter-rater reliability of a weight-bearing lunge measure of ankle dorsi-
measurement. flexion. Aust. J. Physiother. 44 (3), 175–180.
In the ISAK scheme, individuals seeking accreditation must make a Berg, K., Wood-Dauphinee, S., Williams, J.I., 1995. The Balance Scale: reliability as-
sessment with elderly residents and patients with an acute stroke. Scand. J. Rehabil.
series of specified anthropometric measurements. The individual is Med. 27 (1), 27–36.
accredited as an observer only if the intra-rater reliability and the inter- Bland, J.M., Altman, D., 1986. Statistical methods for assessing agreement between two
rater TEMs are within specified values. A similar system might be de- methods of clinical measurement. Lancet 327 (8476), 307–310.
Buchholz, B., Paquet, V., Punnett, L., Lee, D., Moir, S., 1996. PATH: a work sampling-
veloped for REBA users, perhaps using videos of different task scenarios based approach to ergonomic job analysis for construction and other non-repetitive
for which the true posture scores, forces, etc. were known, and accep- work. Appl. Ergon. 27 (3), 177–187.
table levels for TEM of assessments using these standardized tools could Cohen, J., 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20
(1), 37–46.
be developed. A slightly different method of using such standardized
Cook, D.A., Beckman, T.J., 2006. Current concepts in validity and reliability for psy-
videos might be to calibrate observers’ scores to the standard. Bland and chometric instruments: theory and application. Am. J. Med. 119 (2) 166-e7.
Altman (1986) have suggested that bias in the difference of two scores Coyle, A., 2005. Comparison of the Rapid Entire body assessment and the New Zealand
can be “estimated by the mean difference d̄ and the standard deviation manual Handling'Hazard Control record', for assessment of manual handling hazards
in the supermarket industry. Work 24 (2), 111–116.
of the differences (s). If there is consistent bias, we can adjust for it by David, G.C., 2005. Ergonomic methods for assessing exposure to risk factors for work-
subtracting d̄…". related musculoskeletal disorders. Occup. Med. 55 (3), 190–199.
The predictive validity of the REBA tool has not been established. Dempsey, P.G., McGorry, R.W., Maynard, W.S., 2005. A survey of tools and methods used
by certified professional ergonomists. Appl. Ergon. 36 (4), 489–503.
Neither has the utility of multiple risk categories; for example, would Geeta, A., Jamaiyah, H., Safiza, M.N., Khor, G.L., Kee, C.C., Ahmad, A.Z., et al., 2009.
four categories, or even two, be sufficient? Reliability, technical error of measurements and validity of instruments for nutri-
Finally, a REBA observation is a single snapshot of task perfor- tional status assessment of adults in Malaysia. Singap. Med. J. 50 (10), 1013.
Gentzler, M., Stader, S., 2010. Posture stress on firefighters and emergency medical
mance. It is unclear as to how representative such a brief assessment technicians (EMTs) associated with repetitive reaching, bending, lifting, and pulling
can be, especially of jobs that are not highly routinized. For example, tasks. Work 37 (3), 227–239.
Bucholz et al. (Buchholz et al., 1996) have suggested time-sampling Gore, C., Norton, K., Olds, T., Whittingham, N., Birchall, K., Clough, M., et al., 1996.
Accreditation in anthropometry: an Australian model. Anthropometrica 395–411.
methodologies better characterize risk exposures.
Gwet, K.L., 2014. Handbook of Inter-rater Reliability: the Definitive Guide to Measuring
the Extent of Agreement Among Raters. Advanced Analytics. LLC.
Janowitz, I.L., Gillen, M., Ryan, G., Rempel, D., Trupin, L., Swig, L., et al., 2006.
Measuring the physical demands of work in hospital settings: design and im-
4.1. Limitations
plementation of an ergonomics assessment. Appl. Ergon. 37 (5), 641–658.
Jones, T., Kumar, S., 2010. Comparison of ergonomic risk assessment output in four
A limitation of this study is that the novice observers received only a sawmill jobs. Int. J. Occup. Saf. Ergon. 16 (1), 105–111.
brief introduction (less than 1 h) to the REBA assessment tool, prior to Kee, D., Karwowski, W., 2007. A comparison of three observational techniques for as-
sessing postural loads in industry. Int. J. Occup. Saf. Ergon. 13 (1), 3–14.
using it. However, this is realistic for ergonomic assistants in real world Koch, G.G., 2004. Intraclass Correlation Coefficient, vol. 6 Encyclopedia of statistical
circumstances. That a moderate IRR was found in this project speaks to sciences.
the utility of this tool. A second limitation is that the intra-rater relia- Lamarão, A.M., Costa, L., Comper, M.L., Padula, R.S., 2014. Translation, cross-cultural
adaptation to Brazilian-Portuguese and reliability analysis of the instrument Rapid
bility was assessed using observations that were contemporaneous, Entire Body Assessment-REBA. Braz. J. Phys. Ther. 18 (3), 211–217.
often separated only by a few minutes at most. It remains an open Lee, S.S., Kim, Y.H., Choi, A.R., Mun, J.H., 2008. A study on ergonomics design of
question as to whether intra-rater reliability would be consistent over wheelbarrow for melon farm on protected horticulture. J. Biosys. Eng. 33 (3),
157–166.
longer time periods.

115
A.H. Schwartz, et al. International Journal of Industrial Ergonomics 71 (2019) 111–116

Lewis, S.J., 1999. Quantifying Measurement Error. Oxbow Books (for the Sci. JRHS [Internet]. Vol. 11, Journal of Research in Health Sciences. Univ. of
Osteoarchaeological Research Group). Medical Sciences.
Manavakun N. A comparison of OWAS and REBA observational techniques for assessing Nawi, N.S.M., Deros, B.M., Nordin, N., 2013, December. Assessment of oil palm fresh fruit
postural loads in tree felling and processing. [cited 2017 Nov 28]; Available from: bunches harvesters working postures using reba. In: Advanced Engineering Forum,
https://www.formec.org/images/proceedings/2014/a115.pdf. vol. 10. Trans Tech Publications Ltd, pp. 122–127.
Marks, G.C., Habicht, J.P., Mueller, W.H., 1985. Reliability, dependability and precision Pascual, S.A., Naqvi, S., 2008. An investigation of ergonomics analysis tools used in in-
of anthropometric measurements: the second national health and nutrition execution dustry in the identification of work-related musculoskeletal disorders. Int. J. Occup.
survey, 1975-1980. Am. J. Epidemiol. 30, 57–87. Saf. Ergon. 14 (2), 237–245.
McAtamney, L.Y.N.N., Hignett, S., 2000. REBA: Rapid Entire body assessment. Appl. Ulijaszek, S.J., Kerr, D.A., 1999. Anthropometric measurement error and the assessment
Ergon. 31, 201–205. of nutritional status. Br. J. Nutr. 82 (3), 165–177.
Moskal, B.M., Leydens, J.A., 2000. Scoring rubric development: validity and reliability. WHO Multicentre Growth Reference Study Group, de Onis, M., 2006. Reliability of an-
Practical Assess. Res. Eval. 7 (10), 71–81. thropometric measurements in the WHO Multicentre Growth reference study. Acta
Motamedzade, M., Ashuri, M.R., Golmohammadi, R., Mahjub, H., 2011. J. Res. Health Paediatr. 95, 38–46.

116

You might also like