You are on page 1of 9

International Journal of Industrial Ergonomics 72 (2019) 71–79

Contents lists available at ScienceDirect

International Journal of Industrial Ergonomics


journal homepage: www.elsevier.com/locate/ergon

Reliability of a new risk assessment method for visual ergonomics T


a,∗ a a b c
Camilla Zetterberg , Marina Heiden , Per Lindberg , Per Nylén , Hillevi Hemphälä
a
Centre for Musculoskeletal Research, Department of Occupational Health Science and Psychology, University of Gävle, SE-801 76 Gävle, Sweden
b
Swedish Work Environment Authority, SE-112 79 Stockholm, Sweden
c
Division of Ergonomics and Aerosol Technology, Design Sciences, Lund University, SE-221 00 Lund, Sweden

ABSTRACT
Keywords:
Eyestrain Introduction: The Visual Ergonomics Risk Assessment Method (VERAM) is a newly developed and validated
Musculoskeletal
method to assess visual ergonomics at workplaces. VERAM consists of a questionnaire and an objective eva-
Lighting
Illuminance
luation.
Glare Objective: To evaluate reliability of VERAM by assessing test-retest reliability of the questionnaire, and intra- and
Flicker inter-rater reliability of the objective evaluation.
Methods: Forty-eight trained evaluators used VERAM to evaluate visual ergonomics at 174 workstations. The
time interval for test-retest and intra-rater evaluations was 2–3 weeks, and the time interval for inter-rater
evaluations was 0–2 days. Test-retest reliability was assessed by intraclass correlation (ICC), the standard error
of measurement (SEM) and the smallest detectable change (SDC). Intra- and inter-rater reliability were assessed
with weighted kappa coefficients and absolute agreement. Systematic changes were analysed with repeated
measures analyses of variance and Wilcoxon sign rank test.
Results: The ICC of the questionnaire indices ranged from 0.69 to 0.87, while SEM ranged from 7.21 to 10.19 on
a scale from 1 to 100, and SDC from 14.42 to 20.37. Intra-rater reliability of objective evaluations ranged from
0.57 to 0.85 (kappa coefficients) and the agreement from 69 to 91%. Inter-rater reliability of objective eva-
luations ranged from 0.37 to 0.72 (kappa coefficients) and the agreement from 52 to 87%.
Conclusion: VERAM is a reliable instrument for assessing risks in visual work environments. However, the re-
liability might increase further by improving the quality of training for evaluators. Complementary evaluations
of VERAM's sensitivity to changes in the visual environment are needed.
Relevance to industry: It is advantageous to set up a work environment for maximal visual comfort to avoid
negative effects on work postures and movements and thus prevent visual- and musculoskeletal symptoms. This
method, VERAM, satisfies the need of a valid and reliable tool for determining risks associated with the visual
work environment.

1. Introduction musculoskeletal symptoms in the neck and shoulders (Agrawal et al.,


2017; Collins and O'Sullivan, 2015; Wærsted et al., 2010; Woods,
Many occupations and work tasks involve near work. Even though 2005), and there are some evidence indicating an association between
computer-based near work tasks are common, there are many other them (Helland et al., 2008; Hemphälä and Eklund, 2011; Lie and
work tasks that involve demanding near work, e.g. inspection and Watten, 1987; Treleaven and Takasaki, 2014; Wiholm et al., 2007;
manufacturing of small details, and medical care such as surgery, Zetterberg et al., 2017). Thus, it is important to set up a work en-
dentistry and ophthalmology. The most frequent health problems as- vironment for maximum visual comfort to prevent visual- and muscu-
sociated with near work are visual and ocular symptoms loskeletal symptoms, but also improve productivity and work perfor-
(Gowrisankaran and Sheedy, 2015; Hashemi et al., 2017; Mowatt et al., mance (Juslen and Tenner, 2005; Veitch et al., 2013; Yeow and Nath
2018; Ranasinghe et al., 2016; Toomingas et al., 2014) and Sen, 2004; Zetterberg, 2016). In Sweden, the current regulation on

Abbreviations: ICC, Intra Class Correlation; SDC, Smallest Detectable Change; SEM, Standard Error of Measurement; VERAM, Visual Ergonomics Risk Assessment
Method

Corresponding author. Centre for Musculoskeletal Research, Department of Occupational and Public Health Sciences, University of Gävle, SE-801 76 Gävle,
Sweden.
E-mail addresses: camilla.zetterberg@hig.se (C. Zetterberg), marina.heiden@hig.se (M. Heiden), per.lindberg@hig.se (P. Lindberg), per.nylen@av.se (P. Nylén),
hillevi.hemphala@design.lth.se (H. Hemphälä).

https://doi.org/10.1016/j.ergon.2019.04.002
Received 30 November 2018; Received in revised form 29 March 2019; Accepted 8 April 2019
Available online 02 May 2019
0169-8141/ © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

physical work load and ergonomics stress that visual conditions should majority were physiotherapists and/or ergonomists. Remaining practi-
be targeted to avoid negative effects on work postures and movements tioners were work environment engineers, occupational therapists and
(Swedish Work Environment authority, 2012). a low vision therapist. Each received a 7-day course with specific
Many factors influence the effects of the visual environment: factors training in visual ergonomics and in VERAM. The OHS-practitioners are
related to the physical environment (e.g. lighting, ergonomics, and hereafter referred to as trained evaluators.
workplace design), the work task (e.g. the work object, visibility, con-
trasts and readability), and the individual's visual ability (e.g. visual 2.2. Study sample
acuity, age, and visual defects) (Anshel, 2005; Blehm et al., 2005;
Gowrisankaran and Sheedy, 2015; Long, 2014; Osterhaus et al., 2015). The trained evaluators recruited participants to the study mainly
To evaluate and improve visual ergonomics at workplaces and prevent from their regular customers in the OHS sector. They were instructed to
visual and musculoskeletal problems, all these factors need to be as- recruit participants with diverse characteristics to ensure high varia-
sessed. Unfortunately, available instruments, both for practice and re- bility within the data, e.g. variability in work tasks, age, sex, and level
search, are mainly questionnaires used to assess subjective symptoms of eye- and musculoskeletal symptoms. The participants are hereafter
and the workers' experience of the visual work environment. In parti- referred to as workers.
cular, many questionnaires are developed to evaluate computer-related
work (Conlon et al., 1999; Hayes et al., 2007; Knave et al., 1985). For
2.3. Data collection
example, two questionnaires measuring computer vision syndrome
(CVS) were recently developed (Gonzalez-Perez et al., 2014; Segui
Data collection took place during autumn 2015 and spring 2016.
et al., 2015) but neither considers the importance of factors such as
The workers answered the web based questionnaire twice. Trained
lighting and work station design. Two checklists showing moderate to
evaluators performed objective evaluations, either repeatedly as a test-
good reliability in quantifying risks associated with computer work
retest evaluation of the same workstation (intra-rater evaluation), or of
include objective assessments concerning the monitor's placement and
the same workstation as a second evaluator (inter-rater evaluation).The
the risk for glare(Pereira et al., 2016; Sonne et al., 2012), but do not
workers were advised to perform similar work during both evaluations.
include other relevant visual ergonomic factors. There are also some
The trained evaluator provided recommendations to the worker only
practical check lists, e.g. the “Computer Workplace Questionnaire”
after completing the second evaluation.
(Anshel, 2005) and OSHA's checklist for computer work stations
For the test-retest and intra-rater evaluation, the interval between
(https://www.osha.gov/SLTC/etools/computerworkstations/checklist.
the two evaluations was 2–3 weeks. It was considered long enough to
html), but they have not been validated.
prevent recall bias, for both the worker and the trained evaluator, and
There is a lack of existing instruments focusing on both the workers’
short enough for health-related symptoms and the visual environment
symptoms and their subjective experience of the visual environment, as
to remain unchanged.
well as objective measurements of the visual environment, and there is
For the inter-rater evaluation, the interval between the assessments
a need for a comprehensive and reliable method that can be used in
made by two trained evaluators of the same workstation was 0–2 days.
both practice and research. Therefore, a new method to assess visual
The trained evaluators independently assessed each worker's work-
ergonomics at workplaces has been developed, the Visual Ergonomics
station consecutively. The order of assessments was not randomised,
Risk Assessment Method – VERAM. VERAM consists of four parts:
but to ensure that the second evaluator was blind to outcomes of the
first evaluator, they did not communicate findings during or after the
1) a questionnaire for the worker with questions about e.g. eyestrain,
assessment to the worker nor to each other.
visual symptoms, lighting conditions and musculoskeletal dis-
comfort,
2) an objective evaluation form for the evaluator consisting of both 2.4. Statistical analyses
technical measurements and objective assessments. The evaluation
results in an overall risk evaluation (no risk, low risk or high risk) for The statistical analyses were performed in IBM SPSS Statistics 22.0
each of eight workplace factors: daylight, lighting, illuminance, for Windows (IBM Corp., Armonk, NY, USA). In all statistical tests,
glare, flicker, work space, work object and work postures, p < 0.05 was considered significant.
3) a section of follow-up questions based on the worker's responses, and To enable comparisons between indices in part 1, all indices were
4) a section for recommended changes. expressed as a percentage of the maximum possible score where a
higher percentage represents a higher discomfort score or more visual
A detailed description of the content and the development process symptoms.
of VERAM, together with assessment of its validity can be found in
Heiden et al. (submitted to International Journal of Industrial Ergo- 2.4.1. Part 1, the subjective questionnaire
nomics 2018). The aim of the present study is to assess the test-retest Test-retest reliability was assessed for the 6 indices (see appendix)
reliability of the first part of VERAM (the questionnaire for the worker), and for the total score (i.e. the six indices pooled) by intraclass corre-
and to evaluate the intra- and inter-rater reliability of the second part of lation (ICC) with 95% confidence intervals using two-way mixed effects
VERAM (the objective evaluation made by an evaluator). model for single measures and “absolute agreement”. . Systematic
changes were analysed with repeated measures analyses of variance
2. Materials and methods (rANOVA), and the standard error of measurement (SEM = √Mean
square error) and the smallest detectable change
The Regional Ethical Review Board in Lund, Sweden, approved the (SDC = 1.96 × √2 × SEM) was calculated from the rANOVAs
study (reference number 2015/2), all participants gave informed con- (Bjorklund et al., 2012; Kottner et al., 2011; Mokkink et al., 2010;
sent, and the study was conducted in accordance with the Declaration Terwee et al., 2007; Weir, 2005). Items included in each of the six in-
of Helsinki. dices are described in Appendix, Table A1.
In previous literature, ICC values above 0.8 are recommended, and
2.1. Trained evaluators 0.7 is often seen as a minimum standard (Nunnally and Bernstein, 1994;
Terwee et al., 2007; Wiitavaara and Heiden, 2018). SDC reflects the
Forty-eight practitioners from the Occupational Health Services smallest real change that can be detected with an instrument. If the
(OHS) in Sweden were recruited to perform the data collection. The measurement error is high, the SDC will also be high.

72
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

2.4.2. Part 2, the objective evaluation symptoms” and highest for the index “lighting conditions”. For the total
Intra- and inter-rater reliability were determined with weighted score (i.e. the six indices pooled), it was 0.89. No significant systematic
kappa coefficients (κ) and intra- and inter-rater agreement (i.e., the differences were found for the six indices or for the total score. Mea-
degree to which ratings were identical) for the risk evaluation of each of surement error of the indices, expressed as SEM, ranged from 7.21 to
the eight workplace factors (Gisev et al., 2013; Kottner et al., 2011; 10.19, being lowest for “lighting conditions” and “intensity of eye-
Mandrekar, 2011; Mokkink et al., 2010; Pereira et al., 2016). Sys- strain” and highest for “frequency of musculoskeletal discomfort”. For
tematic changes between baseline and re-test were analysed with Wil- the total score, it was 4.93. Smallest detectable change ranged from
coxon signed rank test. Items included in the risk evaluation for each 14.42 (lighting conditions) to 20.37 (frequency of musculoskeletal
workplace factor are described in Appendix, Table A2. discomfort). For the total score, it was 9.86.
In previous literature, κ-values exceeding 0.6 are considered to re-
flect substantial reliability, values above 0.4 reflects moderate relia-
3.3. Part 2 - the objective evaluation
bility, and values above 0.2 reflect fair reliability (Landis and Koch,
1977).
3.3.1. Intra-rater reliability
The number of valid assessments (at baseline and re-test) for each
3. Results workplace factor, the intra-rater agreement between baseline and re-
test, and the weighted kappa coefficients are presented in Table 3.
3.1. Study sample Descriptive statistics of the ratings are presented in Table 4. The
agreement between baseline and re-test ranged from 69 to 91%. The
For the test-retest and intra-rater evaluation, 141 workers were re- lowest agreement was found for the factor ”work object” and the
cruited, of which 38 were excluded due to incorrect test interval, and highest for “flicker”. The reliability measured with weighted Kappa
one was excluded due to discrepancy in work task during baseline and coefficients ranged from 0.57 (illuminance) to 0.85 (flicker). On
re-test. This resulted in 102 test-retest evaluations for the analysis. average, there was a difference between the evaluation at baseline and
Computer work was performed by 90 workers and 12 workers had the evaluation at re-test for two of the eight factors; flicker and work
other work tasks: assembly work (4), health care (4), transportation (2) space (Table 4). For flicker, a slightly higher risk was seen at the re-test
and technical work (2). evaluation, while the opposite was seen for work space, a slightly lower
For the inter-rater evaluation 77 workers were recruited, 3 of them risk at the re-test evaluation.
were excluded due to incorrect test interval, and one was excluded due Fig. 1 shows the agreement between baseline and re-test evaluations
to discrepancy in work task between evaluations. This resulted in 73 within each risk category (no risk, low risk, high risk) for the eight
inter-rater evaluations for the analysis. 69 workers performed computer workplace factors. The proportion of agreement, i.e., where the eva-
work and 4 workers performed other work tasks; assembly work (2), luator rated the same risk category on both occasions, ranged between
health care (1), and mail sorting (1). Worker characteristics are pre- 58% (for illuminance, low risk category) and 100% (for flicker and
sented in Table 1. Single missing values, when e.g. a worker or an work postures, high risk category). In general, the evaluators were more
evaluator did not complete all fields in the questionnaire or in the consistent in their ratings of high risk than the other risk categories. The
evaluation form, were excluded from each analysis. The number of exceptions were risk evaluations of work space and work object, where
valid included cases are reported in the text and in the tables. the highest agreement between the two occasions was reached for the
no risk and low risk category, respectively.
3.2. Part 1 - the subjective questionnaire
3.3.2. Inter-rater reliability
3.2.1. Test-retest reliability The number of valid independent assessments made by two trained
Summarised statistics are presented in Table 2. ICC values for the evaluators (E1 and E2) for each workplace factor, the inter-rater
indices ranged from 0.69 to 0.87, being lowest for the index “visual agreement between assessments, and the weighted kappa coefficients

Table 1
Characteristics of the workers for the test-retest and intra-rater evaluation and for the inter-rater evaluation. Values are the mean with the standard deviation in
brackets, unless otherwise indicated.
Characteristics of the workers Test-retest and intra-rater n Inter-rater n

Gender, no. of females (%) 70 (70%) 100 50 (69%) 73


Age in years 48 (10.6) 98 50 (9.5) 73
No. of participants witheyeglasses or lenses (%) 87 (85%) 102 62 (85%) 73
Frequency of eyestrain a
Smarting 0.58 (0.82) 101 0.64 (0.86) 72
Itching 0.43 (0.75) 101 0.49 (0.77) 72
Gritty feeling 0.50 (0.87) 101 0.61 (0.87) 72
Aching 0.40 (0.81) 101 0.39 (0.66) 72
Sensitivity to light 0.76 (0.98) 101 0.79 (0.96) 72
Redness 0.40 (0.71) 101 0.39 (0.82) 72
Teariness 0.50 (0.82) 101 0.56 (0.87) 72
Dryness 0.78 (0.99) 101 0.82 (1.04) 72
Eye fatigue 1.19 (0.97) 101 1.26 (1.02) 72
Frequency of musculoskeletal discomfort a
Neck 1.33 (1.09) 101 1.26 (1.04) 72
Shoulder 1.05 (1.09) 101 1.10 (1.02) 72
Upper back 0.62 (0.93) 101 0.78 (0.95) 72
Arm/hand 0.71 (1.04) 101 0.71 (1.04) 72
Frequency of stress symptoms a 1.50 (0.82) 102 1.63 (0.99) 72

Abbreviations: n = number of valid cases.


a
Scale 0–3, 0 = never, 3 = almost daily.

73
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

Table 2
Mean and standard deviation at baseline and re-test, intraclass correlations (ICC), standard error of the mean (SEM), and the smallest detectable change (SDC) for the
six indices, and for the total score. Systematic differences between baseline and retest were analysed with repeated measures ANOVA and reported with p-values
(n = 98 for all indices). The test-retest interval was 2–3 weeks.
Indexa Mean (sd) ICC (CI) SEM SDC p-value

Baseline Retest

Frequency of eyestrain 20.66 (19.79) 19.39 (18.52) 0.81 (0.74–0.87) 8.26 16.52 0.282
Intensity of eyestrain 18.43 (17.90) 18.48 (17.38) 0.83 (0.76–0.88) 7.26 14.52 0.960
Visual symptoms 21.32 (16.47) 19.53 (16.45) 0.69 (0.58–0.78) 9.08 18.15 0.170
Lighting conditions 19.66 (20.63) 18.84 (19.38) 0.87 (0.81–0.91) 7.21 14.42 0.430
Frequency of musculo- skeletal discomfort 31.12 (27.48) 28.32 (26.07) 0.85 (0.79–0.90) 10.19 20.37 0.057
Intensity of musculo-skeletal discomfort 18.43 (17.90) 18.48 (17.38) 0.82 (0.75–0.88) 7.88 15.76 0.161
Total score 20.42 (14.91) 21.80 (15.56) 0.89 (0.84–0.93) 4.93 9.86 0.054

Abbreviations: sd = standard deviation, ICC = intra class correlation.


CI = 95% confidence interval, SEM = standard error of the mean, SDC = smallest detectable change.
a
Items included in each index are described in Appendix, Table A1

Table 3 eight workplace factors. The proportion of agreement, i.e., where both
Intra-rater reliability. The number of valid evaluations for each workplace evaluators rated the same risk category, ranged between 38% (for work
factor, the agreement between baseline and re-test (in %), and the weighted postures, low risk category) and 100% (for daylight, high risk cate-
Kappa coefficient with 95% confidence interval. gory). In general, the evaluators had higher agreement in ratings of no
Workplace No. of valid Agreement Kappa coefficient risk than the other risk categories. The exceptions were risk evaluations
factora evaluations between baseline (95% confidence of daylight and flicker, where the highest agreement between the
(missing) and re-test (%) interval) evaluators was reached for the high risk and low risk category, re-
Daylight 100 (2) 77 0.70 (0.58–0.82)
spectively.
Lighting 98 (4) 71 0.59 (0.46–0.72)
Illuminance 97 (5) 70 0.57 (0.43–0.71) 4. Discussion
Glare 96 (6) 71 0.63 (0.51–0.76)
Flicker 94 (8) 91 0.85 (0.74–0.95)
Work space 95 (7) 76 0.62 (0.48–0.77) In the present study, the reliability of a new risk assessment method
Work object 97 (5) 69 0.60 (0.47–0.73) for visual ergonomics was assessed. Over all, the reliability of VERAM
Work postures 96 (6) 82 0.74 (0.62–0.86) was adequate in the population studied. The test-retest reliability for
a
part 1, i.e., the subjective questionnaire for the worker, was good for the
Items included in each workplace factor are described in Appendix, Table
total score and all indices except visual symptoms in its present form.
A1.
Thus, the questionnaire can be used to follow worker's perceptions of
visual exposure over time. Regarding the objective evaluation, the
are presented in Table 5. Descriptive statistics of the ratings are pre-
intra-rater reliability was slightly higher than the inter-rater reliability,
sented in Table 6. The inter-rater agreement ranged from 52 to 87%.
indicating that repeated evaluations by the same evaluator are more
The lowest agreement was found for work object and the highest for
consistent than evaluations made by different evaluators.However, for
flicker. The reliability measured with weighted kappa coefficients
two of the eight workplace factors (flicker and work space), the esti-
ranged from 0.37 (work object) to 0.72 (flicker). On average, there
mated risk differed between the first and second evaluation by the same
were no significant differences in the evaluations made at the same
evaluator. We recommend that these parts of the objective evaluation
workstation by two evaluators (E1 and E2), for any of the eight work-
are further developed to improve the reliability of VERAM.
place factors (Table 6). The inter-rater reliability for the factor work
object was, according to the literature fair (> 0.2), but did not reach
the level moderate reliability (> 0.4). This indicates that caution must 4.1. Test-retest reliability
be taken when interpreting the results in a practical setting.
Fig. 2 shows the inter-rater agreement between two trained eva- An instrument of good quality should have high correlations be-
luators within each risk category (no risk, low risk, high risk) for the tween repeated measurements (here: baseline and retest). As described
in the method section, 0.7 is often seen as a minimum standard for ICC

Table 4
Intra-rater reliability. Median, range, mean and standard deviation from the intra-rater evaluations with a time interval of 2–3 weeks (baseline and retest).
Systematic differences between baseline and re-test were analysed with Wilcoxon signed rank test and reported with p-values.
Workplace factor Median (min – max) Mean (sd) p-value

Baseline Retest Baseline Retest

Daylight (n = 100) 1 (0–2) 1 (0–2) 0.90 (0.77) 0.87 (0.76) 0.580


Lighting (n = 98) 1 (0–2) 1 (0–2) 0.90 (0.68) 0.95 (0.71) 0.369
Illuminance (n = 97) 0.5 (0–2) 1 (0–2) 0.62 (0.70) 0.72 (0.77) 0.101
Glare (n = 96) 1 (0–2) 1 (0–2) 0.99 (0.80) 0.99 (0.78) 1.00
Flicker (n = 94) 0 (0–2) 0 (0–2) 0.36 (0.58) 0.43 (0.61) 0.034
Work space (n = 95) 1 (0–2) 1 (0–2) 0.84 (0.70) 0.69 (0.67) 0.014
Work object (n = 97) 1 (0–2) 1 (0–2) 0.91 (0.77) 0.96 (0.72) 0.384
Work postures (n = 96) 0 (0–2) 0 (0–2) 0.49 (0.68) 0.54 (0.70) 0.225

Abbreviations: sd = standard deviation.


*(min – max) refers to the range of responses from the evaluators.

74
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

Fig. 1. Intra-rater agreement within each risk category for the eight workplace factors. The height of the bars represents the number of evaluated workstations
at baseline (T1) within each risk category (no risk, low risk, high risk). The colours in the bars represent the evaluated risk at retest (T2). The numbers inside the bars
shows the number of equal evaluations at baseline and retest, and the intra-rater agreement (in %) within each risk category. Example for the factor daylight: The
number of workstations evaluated as “no risk” at baseline for the factor daylight were 35 (the height of the first bar: T1-no risk), the number of equal evaluations at the retest
were27 (no risk =green), and, consequently, the intra-rater agreement was 77%. There were 40 “low risk” evaluations at baseline (the height of the second bar: T1-low risk),
the number of equal evaluations at retest were 30 (low risk = yellow), and the intra-rater agreement was 75%. Lastly, there were 25 “high risk” evaluations at baseline (the
height of the third bar: T1-High risk), the number of equal evaluations at retest were 20 (high risk = red), and the intra-rater agreement was 80%. (For interpretation of the
references to colour in this figure legend, the reader is referred to the Web version of this article.)

and values above 0.8 are recommended (Nunnally and Bernstein, 1994; comparison, VERAM shows higher test-retest reliability for the total
Terwee et al., 2007; Wiitavaara and Heiden, 2018). Here, the intra class score (ICC = 0.89), than two recently developed computer vision syn-
correlations (ICC) were all larger than 0.8, except for visual symptoms drome questionnaires (ICC = 0.80 and 0.85) (Gonzalez-Perez et al.,
(Table 2). As described in Heiden et al. (submitted to International 2014; Segui et al., 2015).
Journal of Industrial Ergonomics 2018) the item ‘double vision’ in the The smallest detectable change (SDC) depends on the measurement
visual symptoms index had limited information content and was error which in turn depends on the sample size, e.g. the larger the
therefore recommended to be excluded from the index. In this study sample size, the smaller the measurement error and the SDC. In this
sample, ICC for the visual symptoms index increased slightly above the study, with a sample size of 102, the highest SDC was found for the
minimum standard (0.71 instead of 0.69) when the item ‘double vision’ indices frequency of musculoskeletal discomfort and visual symptoms
was removed. Similarly, marginal changes in ICC was found for the (approx. 20 on a 1–100 scale), and the lowest was found for lighting
revised lighting conditions index (ICC 0 0.88 instead of 0.87). In conditions and intensity of eyestrain (approx. 15 on a 1–100 scale).

75
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

Table 5
Inter-rater reliability. The number of valid evaluations for each workplace factor, the agreement between two evaluators (in %), and the weighted Kappa coefficient
with 95% confidence interval.
Workplace factor No. of valid evaluations (missing) Agreement between two evaluators (%) Kappa coefficient (95% confidence interval)

Daylight 67 (6) 63 0.44 (0.26–0.62)


Lighting 68 (5) 65 0.51 (0.34–0.68)
Illuminance 65 (8) 66 0.46 (0.26–0.65)
Glare 67 (6) 61 0.48 (0.31–0.65)
Flicker 67 (6) 87 0.72 (0.54–0.89)
Work space 53 (20) 68 0.55 (0.38–0.72)
Work object 65 (8) 52 0.37 (0.19–0.54)
Work postures 61 (12) 62 0.43 (0.24–0.63)

Thus, to be able to detect differences in the subjective ratings with based checklist assessing ergonomic risk factors among office workers,
VERAM in future studies, it is important to include a relatively high shows slightly higher inter-rater reliability than VERAM, with κ-values
number of participants. ranging from 0.38 to 0.74 (Pereira et al., 2016). Notably, both QEC and
the checklist report unadjusted κ-coefficients.
4.2. Intra- and inter-rater reliability Even though technical measurements are not directly required for
assessing all factors, they lay the foundation for all factors except work
The reliability of the objective evaluations was, as recommended for postures. During training (i.e., the 7-day course), many evaluators ex-
ordinal measures, assessed with the weighted kappa coefficient (κ) pressed difficulties in performing and interpreting the technical mea-
(Terwee et al., 2007). As described in the method section, κ-values surements. In addition, the evaluators had some difficulties in rating
exceeding 0.6 reflect substantial reliability, values above 0.4 reflect postures. This lead to omitted information in the evaluation form
moderate reliability, and values above 0.2 reflect fair reliability (Landis concerning work postures, particularly for neck posture (see Heiden
and Koch, 1977). In this study sample, five factors in the objective et al., submitted to International Journal of Industrial Ergonomics
evaluation showed substantial intra-rater reliability, while three factors 2018).
had moderate reliability (lighting, illuminance and work object). Re- Difficulties in rating postures and difficulties in performing and
garding inter-rater reliability, one factor (flicker) reached substantial interpreting the technical measurements might depend on the evalua-
reliability, six factors had moderate reliability, while work object tors’ previous education and experience. This might be overcome by
showed fair reliability. customizing the practical and theoretical parts of the course, for ex-
As seen above, the intra-rater reliability was higher than the inter- ample by providing more training in rating postures for work en-
rater reliability. However, in the intra-rater evaluation, systematic vironment engineers, and more training in technical measurements for
differences between evaluation one and two was found for the factors ergonomists. With customized training, the evaluators might be more
flicker and work space (Table 4). Even though the differences were non- confident with the techniques, and hence both the intra- and inter-rater
significant after adjusting with Bonferroni-correction, caution must be reliability might improve.
taken when interpreting those factors. In addition, both Figs. 1 and 2
show a rather skewed distribution for the factor flicker, with very few 4.3. Strengths and limitations
“high risk” evaluations.
To our knowledge, there are no similar instruments assessing risk In this study, the evaluators required a 7-day course on visual er-
factors in the visual environment. However, there are instruments as- gonomics to use VERAM in the field. After that, one risk evaluation took
sessing other physical risk factors in the work environment. The Quick approximately 1 h per worker to complete, excluding the time it took
Exposure Check (QEC) is a widely used instrument assessing risks for for the worker to answer the subjective questionnaire (approx. 20 min).
work-related musculoskeletal disorders (David et al., 2008). Compared Thus, the method is comprehensive, and this can be seen as both a
with VERAM, QEC has lower intra-rater reliability, with κ-values ran- strength and a limitation. Many factors are important when evaluating
ging from 0.48 to 0.53. Further, the inter-rater reliability of QEC is risks related to the visual environment. So, there is a tradeoff between a
lower than VERAM for all factors but “work object”, with κ-values method covering the important factors, and a method that is quick and
ranging from 0.17 to 0.42. A more recently developed observation- easy to use. A comprehensive method, like VERAM, with e.g. technical

Table 6
Inter-rater reliability. Median, range, mean and standard deviation from the inter-rater evaluations made by two trained evaluators (E1 and E2) within two days.
Systematic differences between E1 and E2 were analysed with Wilcoxon sign rank test and reported with p-values.
Workplace factor Median (min – max)a Mean (sd) p-value

E1 E2 E1 E2

Daylight (n = 67) 0 (0–2) 0 (0–2) 0.54 (0.61) 0.66 (0.81) 0.163


Lighting (n = 68) 1 (0–2) 1 (0–2) 1.01 (0.72) 0.94 (0.690) 0.336
Illuminance (n = 65) 1 (0–2) 0 (0–2) 0.68 (0.66) 0.60 (0.70) 0.388
Glare (n = 67) 1 (0–2) 1 (0–2) 0.97 (0.70) 0.90 (0.82) 0.397
Flicker (n = 67) 0 (0–2) 0 (0–2) 0.37 (0.60) 0.45 (0.63) 0.190
Work space (n = 53) 1 (0–2) 1 (0–2) 0.79 (0.66) 0.81 (0.71) 0.808
Work object (n = 65) 1 (0–2) 1 (0–2) 1.06 (0.68) 0.97 (0.77) 0.303
Work postures (n = 61) 0 (0–2) 0 (0–2) 0.57 (0.69) 0.56 (0.74) 0.840

Abbreviations: E1 = trained evaluator 1, E2 = trained evaluator 2, sd = standard deviation.


a
(min – max) refers to the range of responses from the evaluators.

76
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

where the near-work tasks did not involve computer work. As stated in
the introduction, many occupations include work tasks that are de-
manding for the visual system, but does not include computer work.
Unfortunately, it was difficult to recruit workers and workplaces with
non-computer tasks such as. Inspection and manufacturing of small
details. Thus, more studies applying VERAM in settings not involving
computer work are warranted.
Further, little attention in VERAM is payed to the duration of
symptoms, duration of work, and duration of exposure to risk factors in
the visual environment (e.g. duration of direct glare from daylight). In
this version of the method, duration is handled implicitly through the
follow-up questions to the worker, consequently this procedure requires
an experienced evaluator with knowledge about how duration of ex-
posure affects risks.

4.4. Suggestions for improving and developing VERAM

It is possible that the reliability and usability of VERAM would


improve if the content and the pedagogic structure of the 7-day course
were revised. The content could be more focused on identified diffi-
culties in concepts and theory, measuring techniques and consensus
regarding risk assessments. Traditional lectures could include more
active learning and/or collaborative learning techniques and more
hands-on practice (Barkley et al., 2014; DeLozier and Rhodes, 2017).
VERAM would also be more practical to use if the manual was made
available electronically within the web based method.
According to the consensus-based standards for health measurement
instruments (COSMIN), an instrument not only needs to have adequate
validity and reliability, it should also be responsive and interpretable
(Bjorklund et al., 2017; Mokkink et al., 2010; van Kampen et al., 2013).
The ability of an instrument to detect change over time is crucial for its
use in evaluation of interventions designed to improve the visual work
environment. For VERAM to be useful in practice and in intervention
studies, it needs to be evaluated in a longitudinal study investigating
e.g. the minimal important change relative to an external criterion.
To enhance usability of VERAM, and to reduce both the time and
the costs required, a short version of the method could be developed. A
short version could be useful as a screening tool in order to identify
workers and/or workplaces where a more thorough evaluation is
warranted.

4.5. Conclusion

The present study suggests that VERAM is a reliable instrument for


Fig. 2. Inter-rater agreement within each risk category for the eight assessing risks visual work environments, especially for computer work
workplace factors. The height of the bars represents the number of work- tasks. However, the reliability might increase further by improving the
stations evaluated by evaluater 1 (E1) within each risk category (no risk, low quality of training for evaluators. These findings also justify further
risk, high risk). The colours in the bars represent the risk evaluated by evaluator evaluation of VERAM, e.g. in longitudinal studies assessing its sensi-
2 (E2). The numbers inside the bars shows the number of equal evaluations tivity to changes in the visual environment.
made by E1 and E2, and the inter-rater agreement (in %) within each risk ca-
tegory. Example for the factor daylight: The number of workstations evaluated as
“no risk” by E1 for the factor daylight was 35 (the height of the first bar: E1-no risk), Declaration of interest
the number of equal evaluations made by E2 were 27 (no risk =green), and, con-
sequently, the inter-rater agreement was 77%. There were 28 “low risk” evaluations Declarations of interest: none.
made by E1 (the height of the second bar: E1-low risk), the number of equal eva-
luations made by E2 were 11 (low risk = yellow), and the inter-rater agreement was
Funding
39%. Lastly, there were 4 “high risk” evaluations made by E1 (the height of the third
bar: E1-high risk), the number of equal evaluations made by E2 were 4 (high
risk = red), and the inter-rater agreement was 100%. (For interpretation of the This work was supported by AFA Insurance, Sweden [grant number
references to colour in this figure legend, the reader is referred to the Web 130166], Lund University and the University of Gävle.
version of this article.)
Acknowledgements
measurements, also requires skilled evaluators, which, in most cases
means that they have a thorough training in visual ergonomics. We wish to thank all the participants in this study, both workers and
One limitation of this study is the low number of risk assessments trained evaluators for their time and effort.

77
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ergon.2019.04.002.

Appendix

Table A1
Content of part 1 of VERAM.

Index Items Response alternatives

Frequency of eyestrain Smarting/itching/gritty/aching/sensitive to light/reddened/teary/dry/fati- 0-3: never/occasionally/a few times per week/almost daily
gued eyes

Intensity of eyestrain Smarting/itching/gritty/aching/sensitive to light/reddened/teary/dry/fati- 0-2: mild/moderate/severe


gued eyes

Visual symptoms Overall visual function 0-4: very good/good/satisfactory/bad/very bad


Blurred vision 0-3: never/occasionally/a few times per week/almost daily
Double vision 0-3: never/occasionally/a few times per week/almost daily
Ability to focus 0-3: never/occasionally/a few times per week/almost daily

Lighting conditions Disturbing daylight 0-3: never/sometimes/often/almost always


Satisfactory lighting for work task 0-3: never/sometimes/often/almost always
Disturbing bright light sources 0-3: never/sometimes/often/almost always
Disturbing reflexes from work object/surface 0-3: never/sometimes/often/almost always
Disturbing reflexes from computer screena 0-3: never/sometimes/often/almost always

Frequency of musculoskeletal dis- Neck/shoulders/upper back/arms or hands 0-3: never/occasionally/a few times per week/almost daily
comfort

Intensity of musculoskeletal discom- Neck/shoulders/upper back/arms or hands 0-10: 0 = no pain/discomfort; 10 = worst imaginable pain/
fort discomfort
a
Only applicable to computer work.

Table A2
Content of part 2 of VERAM.

Workplace factor Items Units/response alternatives

Daylight Sufficient daylight/possibility for view yes/no


Risk of daylight glare no risk/low risk/high risk

Lighting Direct light/indirect light/satisfactory color rendering yes/no


Ability to alter illumination yes (individually)/yes (group)/yes (venue)/no

Illuminance Measurements of illuminance lux


Illuminance requirements fulfilled yes/no

Glare Measurements of luminance cd/m2


Luminance conditions no risk/low risk/high risk
Risk of glare from luminaries no risk/low risk/high risk

Flicker Visual flicker yes/no


Non-visual flicker yes/no

Work space Glare/reflexes from work surface or material yes/no


Too shiny, bright or dark surfaces yes/no
Shadows on work space yes/no

Work object Distance between the eye and work object in relation to size of the work object no risk/low risk/high risk
Gaze angle no risk/low risk/high risk

Work postures Adverse postures in neck: flexion/extension/rotation/lateral flexion/protraction not at all/a small amount of time/about half of the time/almost all the time
Adverse postures in back: flexion not at all/a small amount of time/about half of the time/almost all the time

References Blehm, C., Vishnu, S., Khattak, A., Mitra, S., Yee, R.W., 2005. Computer vision syndrome:
a review. Surv. Ophthalmol. 50, 253–262.
Collins, J.D., O'Sullivan, L.W., 2015. Musculoskeletal disorder prevalence and psycho-
Agrawal, P.R., Maiya, A.G., Kamath, V., Kamath, A., 2017. Work related musculoskeletal social risk exposures by age and gender in a cohort of office based employees in two
disorders among medical laboratory professionals: a narrative review. Int J Res Med academic institutions. Int. J. Ind. Ergon. 46, 85–97.
Sci 2, 1262–1266. Conlon, E.G., Lovegrove, W.J., Chekaluk, E., Pattison, P.E., 1999. Measuring visual dis-
Anshel, J., 2005. Visual Ergonomics Handbook. CRC Press, Boca Raton, FL. comfort. Vis. Cognit. 6, 637–663.
Barkley, E.F., Major, C.H., Cross, K.P., 2014. Collaborative Learning Techniques : a David, G., Woods, V., Li, G.Y., Buckle, P., 2008. The development of the Quick Exposure
Handbook for College Faculty. John Wiley & Sons, Hoboken, N.J. Check (QEC) for assessing exposure to risk factors for work-related musculoskeletal
Bjorklund, M., Hamberg, J., Heiden, M., Barnekow-Bergkvist, M., 2012. The ProFitMap- disorders. Appl. Ergon. 39, 57–69.
neck–reliability and validity of a questionnaire for measuring symptoms and func- DeLozier, S.J., Rhodes, M.G., 2017. Flipped classrooms: a review of key ideas and re-
tional limitations in neck pain. Disabil. Rehabil. 34, 1096–1107. commendations for practice. Educ. Psychol. Rev. 29, 141–151.
Bjorklund, M., Wiitavaara, B., Heiden, M., 2017. Responsiveness and minimal important Gisev, N., Bell, J.S., Chen, T.F., 2013. Interrater agreement and interrater reliability: key
change for the ProFitMap-neck questionnaire and the Neck Disability Index in women concepts, approaches, and applications. Res. Soc. Adm. Pharm. 9, 330–338.
with neck-shoulder pain. Qual. Life Res. 26, 161–170. Gonzalez-Perez, M., Susi, R., Antona, B., Barrio, A., Gonzalez, E., 2014. The computer-

78
C. Zetterberg, et al. International Journal of Industrial Ergonomics 72 (2019) 71–79

vision symptom scale (CVSS17): development and initial validation. Investig. BMC Res. Notes 9, 150.
Ophthalmol. Vis. Sci. 55, 4504–4511. Segui, M.D., Cabrero-Garcia, J., Crespo, A., Verdu, J., Ronda, E., 2015. A reliable and
Gowrisankaran, S., Sheedy, J.E., 2015. Computer vision syndrome: a review. Work 52, valid questionnaire was developed to measure computer vision syndrome at the
303–314. workplace. J. Clin. Epidemiol. 68, 662–673.
Hashemi, H., Khabazkhoob, M., Forouzesh, S., Nabovati, P., Yekta, A.A., Sonne, M., Villalta, D.L., Andrews, D.M., 2012. Development and evaluation of an office
Ostadimoghaddam, H., 2017. The prevalence of asthenopia and its Determinants ergonomic risk checklist: ROSA - rapid office strain assessment. Appl. Ergon. 43,
among schoolchildren. J. Compr. Pediatr. 8. 98–108.
Hayes, J.R., Sheedy, J.E., Stelmack, J.A., Heaney, C.A., 2007. Computer use, symptoms, Swedish Work Environment authority, 2012. Ergonomics for the Prevention of
and quality of life. Optom. Vis. Sci. 84, 738–744. Musculoskeletal Disorders 2012, vol. 2 (Stockholm, Sweden).
Heiden M., Zetterberg C., Lindberg P., Nylén p., Hemphälä H., Validity of a computer- Terwee, C.B., Bot, S.D.M., de Boer, M.R., van der Windt, D.A.W.M., Knol, D.L., Dekker, J.,
based risk assessment method for visual ergonomics, Int. J. Ind. Ergon. (Submitted). Bouter, L.A., de Vet, H.C.W., 2007. Quality criteria were proposed for measurement
Helland, M., Horgen, G., Kvikstad, T.M., Garthus, T., Bruenech, J.R., Aarås, A., 2008. properties of health status questionnaires. J. Clin. Epidemiol. 60, 34–42.
Musculoskeletal, visual and psychosocial stress in VDU operators after moving to an Toomingas, A., Hagberg, M., Heiden, M., Richter, H., Westergren, K.E., Tornqvist, E.W.,
ergonomically designed office landscape. Appl. Ergon. 39, 284–295. 2014. Risk factors, incidence and persistence of symptoms from the eyes among
Hemphälä, H., Eklund, J., 2011. A visual ergonomics intervention in mail sorting facil- professional computer users. Work 47, 291–301.
ities: effects on eyes, muscles and productivity. Appl. Ergon. 43, 217–229. Treleaven, J., Takasaki, H., 2014. Characteristics of visual disturbances reported by
Juslen, H., Tenner, A., 2005. Mechanisms involved in enhancing human performance by subjects with neck pain. Man. Ther. 19, 203–207.
changing the lighting in the industrial workplace. Int. J. Ind. Ergon. 35, 843–855. van Kampen, D.A., Willems, W.J., van Beers, L.W.A.H., Castelein, R.M., Scholtes, V.A.B.,
Knave, B.G., Wibom, R.I., Voss, M., Hedström, L.D., Bergqvist, U.O., 1985. Work with Terwee, C.B., 2013. Determination and comparison of the smallest detectable change
video display terminals among office employees. I. Subjective symptoms and dis- (SDC) and the minimal important change (MIC) of four-shoulder patient-reported
comfort. Scand. J. Work. Environ. Health 11, 457–466. outcome measures (PROMs). J. Orthop. Surg. Res. 8, 40.
Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B.J., Hróbjartsson, A., Roberts, Veitch, J.A., Stokkermans, M.G., Newsham, G.R., 2013. Linking lighting appraisals to
C., Shoukri, M., Streiner, D.L., 2011. Guidelines for reporting reliability and agree- work behaviors. Environ. Behav. 45, 198–214.
ment studies (GRRAS) were proposed. J. Clin. Epidemiol. 48, 661–671. Weir, J.P., 2005. Quantifying test-retest reliability using the intraclass correlation coef-
Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical ficient and the SEM. J. Strength Cond. Res. 19, 231–240.
data. Biometrics 33, 159–174. Wiholm, C., Richter, H., Mathiassen, S.E., Toomingas, A., 2007. Associations between
Lie, I., Watten, R.G., 1987. Oculomotor factors in the aetiology of occupational cervico- eyestrain and neck-shoulder symptoms among callcenter operators. Scand. J. Work.
brachial diseases (OCD). Eur. J. Appl. Physiol. Occup. Physiol. 56, 151–156. Environ. Health 33, 54–59.
Long, J., 2014. What is visual ergonomics? Work 47, 287–289. Wiitavaara, B., Heiden, M., 2018. Content and psychometric evaluations of ques-
Mandrekar, J.N., 2011. Measures of interrater agreement. J. Thorac. Oncol. 6, 6–7. tionnaires for assessing physical function in people with neck disorders: a systematic
Mokkink, L.B., Terwee, C.B., Knol, D.L., Stratford, P.W., Alonso, J., Patrick, D.L., Bouter, review of the literature. Disabil. Rehabil. 40, 2227–2235.
L.M., de Vet, H.C.W., 2010. The COSMIN checklist for evaluating the methodological Woods, V., 2005. Musculoskeletal disorders and visual strain in intensive data processing
quality of studies on measurement properties: a clarification of its content. BMC Med. workers. Occup. Med. (Lond.) 55, 121–127.
Res. Methodol. 10, 22. Wærsted, M., Hanvold, T.N., Veiersted, K.B., 2010. Computer work and musculoskeletal
Mowatt, L., Gordon, C., Santosh, A.B.R., Jones, T., 2018. Computer vision syndrome and disorders of the neck and upper extremity: a systematic review. BMC Muscoskelet.
ergonomic practices among undergraduate university students. Int. J. Clin. Pract. 72, Disord. 11, 79.
e13035. Yeow, P.H., Nath Sen, R., 2004. Ergonomics improvements of the visual inspection pro-
Nunnally, J.C., Bernstein, I.H., 1994. Psychometric Theory. McGraw-Hill, New York. cess in a printed circuit assembly factory. Int. J. Occup. Saf. Ergon. 10, 369–385.
Osterhaus, W., Hemphala, H., Nylen, P., 2015. Lighting at computer workstations. Work Zetterberg, C., 2016. The Impact of Visually Demanding Near Work on Neck/shoulder
52, 315–328. Discomfort and Trapezius Muscle Activity: Laboratory Studies. Faculty of Medicine,
Pereira, M.J., Straker, L.M., Comans, T.A., Johnston, V., 2016. Inter-rater reliability of an Uppsala University, Uppsala, Sweden.
observation-based ergonomics assessment checklist for office workers. Ergonomics Zetterberg, C., Forsman, M., Richter, H.O., 2017. Neck/shoulder discomfort due to vi-
59, 1606–1612. sually demanding experimental near work is influenced by previous neck pain, task
Ranasinghe, P., Wathurapatha, W.S., Perera, Y.S., Lamabadusuriya, D.A., Kulatunga, S., duration, astigmatism, internal eye discomfort and accommodation. PLoS One 12,
Jayawardana, N., Katulanda, P., 2016. Computer vision syndrome among computer e0182439.
office workers in a developing country: an evaluation of prevalence and risk factors.

79

You might also like