You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/333451632

Accuracy of Commercially Available Smartwatches in Assessing Energy


Expenditure During Rest and Exercise

Article in Journal for the Measurement of Physical Behaviour · May 2019


DOI: 10.1123/jmpb.2018-0037

CITATIONS READS

10 6,176

5 authors, including:

Zachary Clark Pope Nan Zeng


University of Oklahoma Health Sciences Center Colorado State University
110 PUBLICATIONS 1,872 CITATIONS 76 PUBLICATIONS 1,691 CITATIONS

SEE PROFILE SEE PROFILE

Wenfeng Liu Zan Gao


Hunan Normal University University of Tennessee
17 PUBLICATIONS 118 CITATIONS 342 PUBLICATIONS 6,514 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Zan Gao on 28 April 2020.

The user has requested enhancement of the downloaded file.


Journal for the Measurement of Physical Behaviour, (Ahead of Print)
https://doi.org/10.1123/jmpb.2018-0037
© 2019 Human Kinetics, Inc. ORIGINAL RESEARCH

Accuracy of Commercially Available Smartwatches in Assessing


Energy Expenditure During Rest and Exercise
Zachary C. Pope Nan Zeng
University of Minnesota Colorado State University

Xianxiong Li and Wenfeng Liu Zan Gao


Hunan Normal University University of Minnesota

Background: This study examined the accuracy of Microsoft Band (MB), Fitbit Surge HR (FS), TomTom Cardio Watch (TT),
and Apple Watch (AW) for energy expenditure (EE) estimation at rest and at different physical activity (PA) intensities. Method:
During summer 2016, 25 college students (13 females; Mage = 23.52 ± 1.04 years) completed four separate 10-minute exercise
sessions: rest (i.e., seated quietly), light PA (LPA; 3.0-mph walking), moderate PA (MPA; 5.0-mph jogging), and vigorous PA
(VPA; 7.0-mph running) on a treadmill. Indirect calorimetry served as the criterion EE measure. The AW and TT were placed on
the right wrist and the FS and MB on the left—serving as comparison devices. Data were analyzed in late 2017. Results: Pearson
correlation coefficients revealed only three significant relationships (r = 0.43–0.57) between smartwatches’ EE estimates and
indirect calorimetry: rest-TT; LPA-MB; and MPA-AW. Mean absolute percentage error (MAPE) values indicated the MB
(35.4%) and AW (42.3%) possessed the lowest error across all sessions, with MAPE across all smartwatches lowest during the
LPA (33.7%) and VPA (24.6%) sessions. During equivalence testing, no smartwatch’s 90% CI fell within the equivalence region
designated by indirect calorimetry. However, the greatest overlap between smartwatches’ 90% CIs and indirect calorimetry’s
equivalency region was observed during the LPA and VPA sessions. Finally, EE estimate variation attributable to the use of
different manufacturer’s devices was greatest at rest (53.7 ± 12.6%), but incrementally decreased as PA intensity increased.
Conclusions: MB and AW appear most accurate for EE estimation. However, smartwatch manufacturers may consider
concentrating most on improving EE estimate accuracy during MPA.

Keywords: measurement bias, indirect calorimetry, validity

Wearable technology devices offer tremendous promise in of a weight loss strategy or, more generally, for health promotion
promoting physical activity (PA) and health among diverse popu- purposes.
lations (Case, Burwick, Volpp, & Patel, 2015; Kenney, Gortmaker, Currently, several smartwatches are popular among consu-
Evenson, Goto, & Furberg, 2015), with great potential to aid in the mers. Using each manufacturer’s proprietary algorithms, these
development of personalized health behavior change interventions smartwatches combine demographic (age, sex), anthropometric
(Bai et al., 2016; Ferguson, Rowlands, Olds, & Maher, 2015; (height, weight), and bodily movement data collected via triaxial
Flores, Glusman, Brogaard, Price, & Hood, 2013; Hood, Balling, accelerometer technology to provide daily EE estimates at rest,
& Auffray, 2012; Sasaki et al., 2015). For example, advancing during activities of daily living, and during PA or exercise (Fitbit,
technology has facilitated development of sophisticated smart- 2016; TomTom, 2017). Only a paucity of the available literature,
watches (Kenney et al., 2015)—many providing health metric however, has conducted smartwatch EE estimate validation.
Indeed, literature has mostly examined the validity of smartwatches
data output for heart rate, energy expenditure (EE), PA, and sleep,
in the measurement of laboratory-based and free-living PA dura-
among other metrics. Notably, smartwatches’ capability to provide tion and steps (Bai et al., 2016; Bunn, Navalta, Fountaine, & Reece,
EE estimates has played a crucial role in these devices’ popularity 2018; Evenson, Goto, & Furberg, 2015; Lee & Gorelick, 2011).
as consumers track this metric and modify kcaloric (kcal) con- Among the few smartwatch EE estimate validation studies to date
sumption and PA in a manner necessary to promote appropriate and (Bai et al., 2016; Diaz et al., 2015; Ferguson et al., 2015), mean
sustainable weight loss (Kenney, Wilmore, & Costill, 2015b). Yet, validity coefficients for EE were moderate to strong (range
if smartwatches are not providing accurate EE estimates, these r = 0.74–0.85), with mixed findings regarding smartwatches’
inaccuracies may prevent the effective use of these devices as part over- or under-estimation of EE versus various criterion measures.
Notably, however, these studies were almost exclusively con-
ducted using specific models of the Fitbit and Jawbone despite
Pope is with the Division of Epidemiology and Community Health, School of Public
the rising popularity of other smartwatches (e.g., Apple Watch).
Health, University of Minnesota, Minneapolis, MN. Zeng is with the Department of
Food Science and Human Nutrition, Colorado State University, Fort Collins, CO. Li
Moreover, few studies have employed indirect calorimetry as the
and Liu are with the School of Physical Education, Hunan Normal University, criterion EE measure—an assessment method commonly consid-
Changsha, China. Gao is with the School of Kinesiology, University of Minnesota, ered the ‘gold standard’ for EE measurement (Kenney, Wilmore, &
Minneapolis, MN. Pope (popex157@umn.edu) and Gao (gaoz@umn.edu) are Costill, 2015c). Finally, a newer statistical methodology, termed
Q1 corresponding authors. “equivalence testing” (Dixon, Saint-Maurice, Kim, Hibbing, &
1
2 Pope et al.

Welk, 2018), has been developed and may provide better insight wearing a mask attached to the metabolic cart. The metabolic
into smartwatch health metric data accuracy than the validity cart conducted indirect calorimetry measurements via gas analyses
statistics employed in past studies. at rest and during exercise from which body temperature, pressure,
These limitations are not only notable given how consumers and saturated-adjusted EE values for each exercise session were
often use smartwatch EE estimates (e.g., to monitor daily EE and measured. In simplest terms, indirect calorimetry measures parti-
subsequently modify PA and/or dietary behaviors), but these cipants’ respiratory gas exchange rates of oxygen and carbon
limitations may also impair health professionals’ abilities to dioxide which is then used to provide EE measurements (Kenney
employ smartwatches as a health promotion tool. Specifically, et al., 2015c), with more detailed descriptions available regarding
smartwatches are more often being cited as important components how indirect calorimetry measures EE and why this measure
of a healthcare approach referred to as “systems medicine” (Flores has been widely considered the ‘gold standard’ EE measurement
et al., 2013; Hood et al., 2012), a multi-faceted wellness perspec- method (Branson & Johannigman, 2004; Holdy, 2004). Impor-
tive leveraging novel technology (e.g., smartwatches, smartphones, tantly, the Pulsar treadmill and Cortex Metalyzer II have been used
social media) to collect and analyze (via big data analysis) an in previous studies among various populations when assessing
individual’s health behaviors and develop personalized health EE (Bailey et al., 2012; Cockcroft et al., 2015; Peters, Heelan, &
behavior change interventions thereafter based on these data Abbey, 2013). Notably, the Cortex Metalyzer II was calibrated
(Flores et al., 2013; Pope & Gao, 2017). Given smartwatch using a 3-liter syringe prior to each participant’s session, with the
technology’s emerging uses, a need exists to assess several popular calibration process completed per manufacturer specifications.
smartwatches’ EE estimates versus a gold standard EE criterion Comparison Devices. Four wrist-worn smartwatches provided
measure like indirect calorimetry during different PA intensities— EE estimates and served as the comparison devices. The smart-
employing the statistical methodology of equivalence testing to watches included were the MB (Microsoft; Redmond, WA, USA),
conduct these analyses. Therefore, this study’s purpose was to FS (Fitbit, Inc.; San Francisco, CA, USA), TT (TomTom;
investigate the accuracy of the Microsoft Band (MB), Fitbit Surge Amsterdam, The Netherlands), and AW (Apple; Cupertino, CA,
HR (FS), TomTom Cardio Watch (TT), and Apple Watch (AW) in USA). Each smartwatch can assess several metrics including heart
estimating EE at rest and at different PA intensities versus indirect rate, activity (i.e., minutes of activity, steps/day), sleep, stairs
calorimetry EE measurements. The current study’s observations climbed, and calories burned (i.e., EE). Notably, only one smart-
may inform consumers and health professionals alike of the watch from each of the preceding manufacturers was included.
capability of various popular smartwatches to provide accurate Regarding smartwatch placement, the MB and FS were worn on the
EE estimates capable of assisting in effective health behavior left wrist while the TT and AW were placed on the right, with the
change intervention development. smartwatches spaced 1 cm apart. Smartwatches were monitored
throughout the sessions to ensure no contact was made between
devices that might have impacted the results. This study’s smart-
Method watch placement mirrored that of other studies placing multiple
Participants and Research Setting smartwatches to participants’ wrists (Ferguson et al., 2015;
Fokkema, Kooiman, Krijnen, Van Der Schans, & De Groot, 2017).
This cross-sectional study recruited a convenience sample of To ensure the most accurate EE estimates were provided by each
healthy young adults at a south-central Chinese university in smartwatch, each participant’s age, sex, weight, and height were
summer 2016. Participant inclusion criteria were (a) 18–25 years entered into each smartwatch prior to initiating the testing session
old; (b) body mass index ≥ 18.5 kg/m2; (c) no diagnosed physical (see procedures below), with the side upon which each smartwatch
or mental disability; and (d) signed informed consent. Exclusion was worn (i.e., left or right) programmed as well. Finally, while the
criteria included (a) anyone currently using medication which wrist upon which the smartwatches were placed did not differ
might affect cardiovascular function (e.g., beta-blockers); (b) a between participants, potential bias of smartwatch placement was
history of documented cardiovascular or metabolic diseases/ reduced by randomizing which smartwatch was distal and which
conditions; or (c) unaccustomed to high-intensity exercise eliciting was more proximal from participant to participant.
EE > 300 kcals/session. Participants completed a comprehensive Anthropometrics. Height and weight were measured using a
medical and health history questionnaire prior to study participa- stadiometer and digital weight scale, respectively. Specifically,
tion, with the experiment conducted in a highly controlled labora- height was measured using a Seca stadiometer (Seca; Hamburg,
tory setting. All procedures performed were in accordance with Germany) and recorded to the nearest half-centimeter. As for
Q2 the ethical standards of the institution and/or national research weight, this measurement was performed with a Detecto digital
committee and with the 1964 Helsinki Declaration and its later weight scale (Detecto; Webb City, MO, USA), with weight
amendments or comparable ethical standards (World Medical documented to the nearest tenth of a kilogram.
Association, 2018). Additionally, this research was completed in
agreement with the most recent ethical standards for sport and
exercise research (Harriss, Macsween, & Atkinson, 2017). Finally, Procedures
Q3 University Research Ethics Committee approval and informed
consent were obtained prior to testing. All participants were instructed to abstain from eating or drinking
anything except water eight hours prior to visiting the lab in addition
Instrumentation to refraining from any vigorous PA (VPA) during the 24 hours prior
to study participation. Participants were asked to come into the lab
Criterion Device. Criterion EE data were collected via indirect in a fasted state for two reasons. First, we wanted to ensure that the
calorimetry with a Cortex Metalyzer II metabolic cart (Cortex; indirect calorimetry measurements during the resting trial were as
Germany). Briefly, the exercise tests were performed on a Pulsar accurate as possible. Indeed, basal metabolic rate assessed via
treadmill (H/P/Cosmos; Willich, Germany), with participants indirect calorimetry may be affected by prior food consumption
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 3

(Kenney et al., 2015c). Therefore, having the participants abstain coefficients were calculated to observe the association between
from food consumption until after study completion was important smartwatch EE estimates and indirect calorimetry EE measure-
to ensure the most valid comparison of indirect calorimetry EE ments at rest and each PA intensity (resting, LPA, MPA, and VPA).
measurements to smartwatch EE estimates during the resting Weak, moderate, and strong correlations were categorized as
(i.e., sitting) session. Second and more practically, participants r values of 0.20–0.39, 0.40–0.59, and 0.60–0.79, respectively,
were requested to be fasted prior to study participation to ensure with r values ≤0.19 classified as very weak and r values ≥0.80
no adverse gastrointestinal discomfort was experienced during the classified as very strong (Thomas, Nelson, & Silverman, 2011).
study—particularly during the higher-intensity sessions. Partici- Mean absolute percent errors (MAPE) were then calculated for
pants were informed of all experimental procedures and encouraged sitting and each PA intensity. Briefly, MAPE was reported as
to ask any questions before providing consent. Next, participants the average of the absolute difference between smartwatch EE
were asked to complete a comprehensive medical/health history estimates and indirect calorimetry EE measurements divided
questionnaire and a demographic information sheet after which by indirect calorimetric measurements multiplied by 100. These
anthropometric data (i.e., height and weight) were gathered. Demo- MAPE calculations were completed for each smartwatch at each
graphic and anthropometric data were subsequently loaded into PA intensity, with MAPE calculations providing an examination of
each smartwatch and into the metabolic cart’s software to ensure individual-level measurement error—an approach used in other
accurate EE estimation and measurement, respectively. Finally, a smartwatch and accelerometer device validation studies (Fokkema
mask connected to the metabolic cart was placed on each participant et al., 2017; Kim & Welk, 2015).
to measure oxygen consumption for determination of criterion EE. Equivalence testing was then used to assess the agreement of
Participants completed an 80-minute experimental protocol smartwatch EE estimates with indirect calorimetry EE measure-
which included four 10-minute PA sessions, each at a different ments using this testing approach’s confidence interval (CI) method.
PA intensity: resting (sitting quietly), light PA (LPA; walking at Equivalency testing is given fuller explanation in Dixon et al.
3.0 mph on treadmill), moderate PA (MPA; jogging at 5.0 mph), and (2018), but the following two aspects of equivalence testing are
VPA (running at 7.0 mph). Sessions were completed from lowest important: (1) equivalence testing’s null hypothesis states that the
(i.e., resting) to highest (i.e., VPA) intensity—ensuring the results of two measurement methods being compared are not equivalent; and
the lower-intensity trials were not biased by prior high-intensity (2) an alpha of 0.05 (i.e., 5%) is consistent with examining whether
physiological workload. The PA intensity classification criterion the entire 90% CI for a given smartwatch at a given PA intensity
were consistent with a previous study among Chinese young adults falls within a proposed equivalency region situated around the mean
(Ren, Li, & Liang, 2017). Between each session, participants were indirect calorimetry EE measurement made at the same PA inten-
required to sit quietly until heart rate returned to ±10 beats/minute of sity. Congruent with Kim and Welk (2015) and Bai et al. (2016), we
that observed during the initial resting session (Goto et al., 2007). stated the equivalency region to be ±10% of the mean indirect
Following each of the four exercise sessions each participant calorimetry EE measurements made at a given PA intensity. Finally,
completed, EE data were obtained directly from the smartwatches coefficients of variation (CV) examined percentage variation in
themselves, with these data recorded immediately to prevent any smartwatch EE estimates attributable to the use of different man-
data loss or misinterpretation. All four smartwatches in this study ufacturer’s devices—as done in prior literature (Driller, McQuillan,
provided “average calories burned [i.e., EE]” estimates over the & O’Donnell, 2016). SPSS 25.0 (IBM Inc.; Armonk, NY) was
specified time interval pre-programmed by the researchers. There- employed for all analyses, with alpha set at 0.05.
fore, prior to each of the participant’s resting and exercise trials, we
pre-programmed the smartwatches for a 10-minute exercise session Results
—starting each smartwatch’s 10-minute program immediately upon
each participant’s initiation of their session. This pre-programming Participants were 25 college students (13 females; Mage = 23.52 ±
ensured that no EE data were included outside of the 10-minute 1.04; Mheight = 168.6 ± 7.4 cm; Mweight = 61.5 ± 10.1 kg). Table 1
exercise session and, further, requested each smartwatch to save the presents descriptive statistics for smartwatch EE estimates and
exercise session to its internal memory in case we needed to verify indirect calorimetry EE measurements. As expected, EE values
these data at a later time. It is also noteworthy that two researchers increased linearly as PA intensity increased.
collected EE data from the smartwatches immediately after each Pearson correlation coefficients between smartwatch EE esti-
participant finished their respective exercise session—allowing mates and indirect calorimetry EE measurements at each PA
each participant’s smartwatch data from each exercise session to intensity revealed only three significant correlations (r range =
be double-checked (i.e., data quality control protocol). Finally, data −0.19–0.57; Table 2). Specifically, moderate correlations were
regarding each participant’s EE were placed immediately into an seen for the following smartwatches at the denoted PA intensities
Excel file for later analysis by one researcher and double-checked versus indirect calorimetry: Rest–TT (r = 0.57, p < .01); LPA–MB
by a second researcher after each trial for each participant. Impor- (r = 0.43, p < .05); and MPA–AW (r = 0.43, p < .05). Notably, a
tantly, the times the smartwatches were started and stopped during marginally significant, but weak, correlation was observed between
each testing session were recorded per the software reporting the AW and indirect calorimetry during LPA (r = 0.37, p = .07).
indirect calorimetry EE measurements. Using this software’s time- No significant correlations were found between smartwatch EE
stamp allowed us to ensure that the start and stop times used to estimates and indirect calorimetry EE measurements during VPA.
segment indirect calorimetry EE measurements were identical to the Moreover, Table 3 contains MAPE values for each smartwatch’s
time segments during which the smartwatches were estimating EE. EE estimates at each PA intensity compared to indirect calorimetry.
Overall, MAPE values were lowest for the MB (35.4%) and AW
Statistical Analysis (42.3%), with the FS and TT demonstrating higher values (47.7%
and 51.0%, respectively). Finally, MAPE values were higher
Data were analyzed in late 2017 and were first screened for during the resting (52.9%) and MPA (65.3%) sessions versus the
physiological implausible values. Next, Pearson correlation LPA (33.7%) and VPA (24.6%) sessions.
(Ahead of Print)
4 Pope et al.

Table 1 Descriptive Statistics for Smartwatch Energy Expenditure and Indirect Calorimetry*
Microsoft Band Fitbit TomTom Apple Watch Indirect Calorimetry
M (SD) M (SD) M (SD) M (SD) M (SD)
Resting 16.7 (3.6) 18.4 (8.2) 33.4 (23.6) 36.3 (7.7) 21.4 (3.2)
Light Physical Activity 38.8 (13.0) 55.9 (16.0) 34.0 (11.8) 36.1 (9.8) 35.0 (5.4)
Moderate Physical Activity 86.7 (14.7) 90.6 (19.8) 82.5 (28.1) 79.9 (16.3) 53.9 (10.0)
Vigorous Physical Activity 102.2 (27.9) 94.4 (25.3) 95.7 (35.1) 88.0 (27.0) 96.5 (7.9)
*M ± SD total kilocalories burned during each 10-minute exercise session.

Table 2 Pearson Correlations Between Smartwatch Energy Expenditure and Indirect Calorimetry at Different PA
Intensities#
Indirect Calorimetry vs.
Microsoft Band Fitbit TomTom Apple Watch
Resting 0.02 0.21 0.57** 0.06
Light Physical Activity 0.43* 0.14 −0.19 0.37
Moderate Physical Activity 0.13 0.26 0.12 0.43*
Vigorous Physical Activity −0.03 0.25 −0.03 −0.09
#
Energy expenditure unit is kilocalories, with the correlations reflective of this metric; *Indicates significant correlation at p < .05 level; **Indicates significant correlation at
p < 0.01 level.

Table 3 Mean Absolute Percent Error for Each Smartwatch’s Energy Expenditure Measurement at Each Physical
Activity Intensity Versus Indirect Calorimetry*
Indirect Calorimetry vs.
Overall MAPE by
Microsoft Band Fitbit TomTom Apple Watch PA Intensity
M (SD) M (SD) M (SD) M (SD) M (SD)
Resting 23.6 (15.6) 31.8 (30.2) 83.3 (66.4) 73.0 (44.8) 52.9 (22.0)
Light Physical Activity 23.3 (23.9) 64.9 (44.1) 27.8 (26.3) 18.9 (18.8) 33.7 (16.4)
Moderate Physical Activity 69.2 (44.3) 73.3 (43.7) 64.2 (54.0) 54.5 (35.3) 65.3 (38.1)
Vigorous Physical Activity 25.6 (18.5) 21.0 (14.7) 28.8 (23.3) 22.8 (21.1) 24.6 (15.3)
Overall MAPE by Smartwatch 35.4 (12.1) 47.7 (21.5) 51.0 (22.4) 42.3 (14.1)
*Mean absolute percent error ± standard deviation for total kilocalories burned during each 10-minute exercise session.

The equivalence testing results for each smartwatch’s EE estimate variation attributable to the use of different manufacturer’s
estimates and indirect calorimetry’s EE measurements are pre- devices was highest at rest (53.7 ± 12.6%), but incrementally
sented in Table 4. Further, Figures 1–4 graphically present these decreased as PA intensity increased (LPA: 31.1 ± 10.5%; MPA:
results during the resting, LPA, MPA, and VPA sessions, respec- 18.3 ± 8.9%; and VPA: 16.9 ± 8.0%).
tively. As indicated by Table 4 and each Figure, no smartwatch’s
90% CI fell completely within the ±10% equivalency region
established by indirect calorimetry at rest (20.3–22.5 kcals) or
Discussion
during LPA (33.1–36.9 kcals), MPA (50.5–57.3 kcals), and VPA The present study examined the accuracy of four popular smart-
(93.8–99.3 kcals). Similar to the MAPE results, however, the watches’ EE estimates against indirect calorimetry at rest and at
greatest overlap between smartwatches’ 90% CIs and indirect different PA intensities. This comparison was significant given the
calorimetry’s equivalency region was observed during the LPA fact that few previous investigations have examined the accuracy
and VPA sessions (see Table 4, Figures 2 and 4). Specifically, the of multiple smartwatches’ EE estimates to that of ‘gold standard’
MB, TT, and AW possessed 90% CIs which overlapped with indirect calorimetry measurements, with most previous studies
indirect calorimetry’s equivalency region during the LPA session having only validated PA duration and step estimates made by
while all smartwatches achieved some overlap during the VPA different models of the Fitbit and Jawbone in comparison to
session. Notably, only the FS demonstrated any overlap with research-grade accelerometers like the ActiGraph.
indirect calorimetry’s equivalency region during the resting session Our data suggested the MB and AW possess the greatest EE
whereas no smartwatch demonstrated any overlap during the MPA estimate accuracy—particularly during LPA and VPA. Notably,
session. Lastly, Table 5 presents CVs. This metric indicated that EE despite the fact that all smartwatches demonstrated some EE
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 5

Table 4 90% Confidence Intervals for Energy Expen-


diture Measurements Made by Each Smartwatch and
Indirect Calorimetry at Each Physical Activity Intensity
Kilocalories 90% CI
M (LL, UL)
Resting Session
Q4 Indirect Calorimetry 21.4 (20.3, 22.5)
Microsoft Band 16.7 (15.5, 18.0)
Fitbit 18.4 (15.6, 21.2)
TomTom 33.4 (25.4, 41.5)
Apple Watch 36.3 (33.7, 38.9)
LPA Session
Indirect Calorimetry 35.0 (33.1, 36.9)
Microsoft Band 38.8 (34.4, 43.3)
Figure 2 — Comparisons of Smartwatches vs. Indirect Calorimetry
Fitbit 55.9 (50.5, 61.4) during light physical activity.
TomTom 34.0 (30.0, 38.1)
Apple Watch 36.1 (32.7, 39.4)
MPA Session
Indirect Calorimetry 53.9 (50.5, 57.3)
Microsoft Band 86.7 (81.7, 91.7)
Fitbit 90.6 (83.8, 97.4)
TomTom 82.5 (72.9, 92.1)
Apple Watch 79.9 (74.3, 85.5)
VPA Session
Indirect Calorimetry 96.5 (93.8, 99.3)
Microsoft Band 102.2 (92.6, 111.7)
Fitbit 94.4 (85.6, 103.2)
TomTom 95.7 (83.7, 107.7)
Apple Watch 88.0 (78.7, 97.2)
Abbreviations: CI = Confidence interval; LL = Lower limit for 90% confidence
interval; UL = Upper limit for 90% confidence interval. Figure 3 — Comparisons of Smartwatches vs. Indirect Calorimetry
during moderate physical activity.

Figure 1 — Comparisons of Smartwatches vs. Indirect Calorimetry at


Rest.
Figure 4 — Comparisons of Smartwatches vs. Indirect Calorimetry
during vigorous physical activity.
estimate inaccuracies, these inaccuracies are congruent with past
studies assessing various smartwatches’ capability to provide accu- four smartwatch’s (Fitbit Flex, Jawbone Up24, Nike Fuel Band SE,
rate EE estimates (Alharbi, Bauman, Neubeck, & Gallagher, 2016; Misfit Shine) EE estimate accuracy during aerobic activity to vary
Bai et al., 2016; Ferguson et al., 2015; Kenney et al., 2015; Sasaki between approximately 18–60%—congruent with the current in-
et al., 2015). For example, Bai et al. (2016) suggested the MAPEs for vestigation’s mean MAPE values for all smartwatches during the
(Ahead of Print)
6 Pope et al.

Table 5 CVs for Smartwatch Energy Expenditure at indirect calorimetry. Specifically, a smartwatch uses proprietary
Different Physical Activity Intensities algorithms to combine the user’s demographic and anthropometric
data with bodily movement data determined via an accelerometer
CV* to estimate EE (Fitbit, 2016; TomTom, 2017). Indirect calorimetry,
M (SD) on the other hand, measures the respiratory gas exchange rates of
Resting 53.7 (12.6%)
oxygen and carbon dioxide as the participant breathes into the mask
during exercise (Branson & Johannigman, 2004; Holdy, 2004;
Light Physical Activity 31.1 (10.5%)
Kenney et al., 2015c). Thus, when the participants were progres-
Moderate Physical Activity 18.3 (8.9%) sing from LPA to MPA, the body may have experienced slight
Vigorous Physical Activity 16.9 (8.0%) increases in physiological demand but marked increases in bodily
Note. CV = coefficient of variation. *CVs are percentages. movement. As smartwatches estimate EE based largely upon
bodily movement, it may be that the large changes in bodily
movement observed as PA intensity increased led to systematic
resting, LPA, and VPA conditions. Moreover, Lee, Kim, and Welk overestimation of EE by smartwatches versus the highly accurate
(2014) confirmed the Fitbit Zip and Fitbit One accurately estimate indirect calorimetry which measures actual physiological demand
EE in free-living conditions (mean overall MAPEs = 10.1% and via gas analysis. This explanation appears more plausible, too,
10.4%, respectively), with the other smartwatches tested (Jawbone when considering that during VPA smartwatch EE estimates from
Up, Directlife, Nike Fuel Band, Basis Band) possessing a MAPE all devices were found to be most accurate compared to indirect
range of 12.2–23.5%. Therefore, although the present study did not calorimetry (see MAPE and equivalence testing results)—a PA
observe the MAPE values for the MB and AW to be as low as that intensity requiring an even greater amount of bodily movement
observed by Lee et al. (2014) for that study’s two Fitbit devices, the and physiological demand than observed during MPA. Indeed,
fact that the MB and AW demonstrated relative accuracy versus great amounts of bodily movement would have resulted in an
similar literature suggests two additional smartwatch options may be increased physiological demand (e.g., increased need for oxygen
considered by individuals desiring a wearable device to estimate EE. and nutrients to be delivered to muscles/removal of carbon dioxide
Further, the MB and AW’s relative EE estimation accuracy during and other metabolic waste products—all processes which are
LPA is particularly promising given the fact that most individuals are facilitated via increased ventilation) and subsequently higher indi-
capable of being active at this PA intensity, with a growing body of rect calorimetry EE measurements. As a final point, more research
literature highlighting the health-promoting benefits of LPA (Powell, is also warranted regarding smartwatch EE estimate inaccuracy at
Paluch, & Blair, 2011; U.S. Department of Health and Human rest given the continued calls for the ability to accurately track and
Services, 2018). Therefore, consumers and health professionals modify sedentary behavior (Lewis, Napolitano, Buman, Williams,
might be able to use the MB and AW to develop PA programs & Nigg, 2017; U.S. Department of Health and Human Services,
which focus on higher LPA incorporation among previously seden- 2018). Undoubtedly, the high MAPE values and large variation
tary cohorts. Nonetheless, even the MB and AW demonstrated some in EE estimates between different manufacturer’s smartwatch EE
EE estimate inaccuracies, which suggests that these two devices’ use estimates during the resting condition suggests improvements are
within health programs should still take this error into account—a necessary if health professionals are to develop sedentary behavior
topic discussed further below. reduction interventions.
It is also noteworthy that the accuracy of smartwatch EE Smartwatches’ capability to provide EE estimates have
estimates decreased (i.e., mean differences between smartwatch increased interest among health professionals regarding utilizing
EE and indirect calorimetry EE values increased) as PA intensity these devices to assist with the development and implementation of
increased up to the level of MPA, with the greatest smartwatch EE personalized health behavior change programs among clients or
overestimation observed during the MPA session. This observation patients. Yet, the present study’s observations suggested that while
demonstrates majority alignment with literature examining smart- smartwatches may demonstrate relative accuracy at certain PA
watch EE estimate accuracy at different PA intensity levels (Bai intensities, no smartwatch provided EE estimates within the EE
et al., 2016; Diaz et al., 2015). For example, Diaz and colleagues equivalency regions designated by indirect calorimetry—even
(2015) examined smartwatch accuracy at different PA intensities under standardized, highly controlled laboratory-based conditions.
and indicated smartwatches overestimated EE by 52.4% during Aside from how these inaccuracies affect consumers’ use of
moderate walking and 33.3% during brisk walking. These re- smartwatch EE estimates, these inaccuracies render problematic
searchers’ observation of greater smartwatch EE estimate accuracy the use of patient/client smartwatch EE estimates collected under
during the highest walking intensity session, but less accuracy at free-living (i.e., less standardized conditions than the present
lower walking intensities, is congruent with the current study’s study) by health professionals when developing the health behavior
observation of increased accuracy during VPA, but decreased change programs. Healthcare is experiencing a paradigm shift
smartwatch EE estimate accuracy as PA intensity increased to the from reactive treatment (i.e., treating diseases/conditions following
level of MPA. Bai and associates (2016) also made observations onset) to preventive/proactive treatment (i.e., treating diseases/
similar to the current study. Indeed, these researchers observed conditions prior to onset or in the early stages of development)
smartwatches to generally overestimate EE during MPA. Given the (Flores et al., 2013; Hood et al., 2012). Coinciding with this
observations of prior literature and the present study, speculation is paradigm shift has been the previously mentioned idea of “systems
warranted as to the possible explanations for why smartwatch EE medicine” and the development of a healthcare model which is
estimates were quite accurate during VPA despite accuracy becom- (a) predictive: using novel technology like smartwatches to track
ing worse as participants increased PA intensity from LPA to MPA health behaviors/indices (e.g., PA, sedentary behavior, EE, etc.)
—with the largest inaccuracies during MPA. may facilitate subsequent correlation of these health behaviors/
The most plausible explanation lies in the difference in how indices with biomarkers (e.g., blood lipid levels, blood sugar), with
EE data is calculated by a smartwatch versus being measured by disease risk able to be discerned thereafter; (b) preventive: health
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 7

behavior change programs can be developed based upon a patient’s limitation is noteworthy as the exclusive use of this PA modality
health behaviors to improve the patient’s participation in health limits the current study’s generalizability to other PA modalities
behaviors conducive to better health and the prevention/attenuation that may use different proportions of muscle mass (e.g., biking),
of disease; (c) personalized: these health behavior change programs thus influencing EE values. Moreover, other PA modalities may
can be personalized to the patient’s unique physical activity and/or have differing degrees of upper body motion, thus contributing to
dietary preferences which may improve program adherence and greater or lesser degrees of motion artifact which some researchers
effectiveness; and (d) participatory: providing health education to have speculated might affect smartwatch EE calculations (Lee &
patients via web-based platforms may further improve patients’ Gorelick, 2011). Finally, although unlikely, placing two smart-
ability to engage in proper health behaviors in the long-term watches on each wrist may have biased smartwatch EE measure-
(i.e., after cessation of the formal health behavior change program) ments. It must be remembered, however, that while the FS and
through promotion of increased health literacy. MB were always placed on the left wrist and AW and TT on the
Smartwatch EE overestimation is particularly detrimental to right, which device was distal and which device was proximal
smartwatch use within a systems medicine framework as overesti- was randomized. Moreover, smartwatch placement, no matter
mation may diminish the effectiveness of weight loss programs distal or proximal, was as close to manufacturer specifications as
developed based upon smartwatch EE values. For instance, in- possible. Therefore, future studies would benefit from larger and
dividuals may be led to believe they need to consume more kcals more diverse samples and the assessment of smartwatch EE and/or
than needed based upon the current inaccuracies observed for heart rate data accuracy during different PA modalities. These
the current study’s smartwatches—particularly during MPA. For studies may also assess EE estimate inter-device reliability when
instance, an individual briskly walking for 30 minutes (i.e., MPA) employing multiple smartwatches from the same manufacturer
may have an actual EE of 200 kcals. Yet, even the most accurate to evaluate the device-dependency of EE estimations at different
watch observed during MPA in the current study (i.e., the AW) PA intensities.
could register an EE estimate of 309 kcals during this 30-minute
walking session, based upon the AW’s MPA MAPE results of Conclusion
±54.5% and the fact that all smartwatches overestimated EE during
MPA. This, again, is not ideal within a systems medicine framework Wearable technology devices like smartwatches are becoming
and so caution is urged among health professionals using smart- widely used by consumers, in addition to health professionals, for
watches to develop health behavior change programs for patients/ health promotion. Therefore, establishing smartwatch data accuracy
clients. More broadly, these observations suggest more cross- is paramount. Indeed, greater smartwatch data accuracy will allow
collaboration should be implemented between researchers and consumers and, importantly, health professionals to leverage these
smartwatch manufacturers to improve the algorithms used in smart- devices to track health metrics such as EE and PA—subsequently
watch EE estimation. developing highly personalized health behavior change programs to
The present study has merits in that it was (1) conducted in a improve health and prevent non-communicable diseases (Flores
highly controlled laboratory setting, thus limiting many confound- et al., 2013; Hood et al., 2012). This study indicated the MB and
ing variables (e.g., different wear times/locations/PA modality AW to provide the most accurate EE estimates overall—particularly
choices) which might have affected the analyses; (2) assessed during LPA and VPA. Notably, however, the accuracy of all
EE at four different PA intensities; (3) examined smartwatch smartwatches decreased as PA intensity increased, with the most
accuracy using equivalency testing; and (4) used indirect calorim- pronounced inaccuracies during MPA. These observations suggest
etry as the criterion measure—an assessment method considered a prudent approach should be taken by consumers and health
the ‘gold standard’ when assessing EE during aerobic exercise professionals when interpreting smartwatch EE estimates—
(Kenney et al., 2015c). However, several limitations in the present particularly when one is engaging in MPA. Similarly, smartwatch
study should be noted. First, all study participants were healthy use in the development and implementation of PA and dietary
young adults (i.e., a homogenous sample). Whether smartwatches’ behavior change programs by health professionals may be cau-
EE estimates are accurate in other populations, particularly clinical tioned until health professionals can confirm the health metric data
populations, remains unanswered. Second, the sample size was accuracy these devices provide. In the future, researchers may work
relatively small. Notably, while the use of indirect calorimetry is a alongside smartwatch manufacturers to ensure increased smart-
strength of the current study, connecting each participant to the watch accuracy through the testing and manipulation of smartwatch
metabolic cart for an 80-minute study session was intricate and health metric data algorithms.
time-consuming—limiting the number of participants tested and
precluding comparisons of how sex and BMI differences may
Acknowledgments
influence smartwatch EE estimates. Yet, the researchers felt the
current study’s sample size to be adequate as the sample size was This research did not receive any specific grant from funding agencies in
congruent with the most recent smartwatch validation studies the public, commercial, or not-for-profit sectors. While conducting this
conducted (Diaz et al., 2015; Ferguson et al., 2015; Fokkema study, the first author played a large role in data analysis and writing the
et al., 2017)—most of which did not employ indirect calorimetry as manuscript. The second author played a role in data sorting and editing
the criterion measure. Third, this study only assessed participants’ the manuscript. The third author played a role in data collection and editing
EE while neglecting other relevant health metric data output. For the manuscript. The fourth author played a role in data collection and
example, heart rate data accuracy might also be examined given editing the manuscript. The fifth played a role in developing the idea,
that heart rate is often used by health professionals to facilitate overseeing data collection/analysis, and writing the manuscript. No finan-
individuals’ participation in PA intensities necessary to promote cial disclosures were reported by the authors of this paper. The authors
improved health outcomes like increased cardiovascular fitness and have no conflicts of interest to disclose in relation to the current research.
aerobic capacity (Kenney, Wilmore, & Costill, 2015a). Fourth, the The results of this study are presented clearly, honestly, and without
exercise tests were conducted solely on a treadmill. The last fabrication, falsification, or inappropriate data manipulation.
(Ahead of Print)
8 Pope et al.

References Fokkema, T., Kooiman, T., Krijnen, W., Van Der Schans, C., & De Groot,
M. (2017). Reliability and validity of ten consumer activity trackers
Q5 Alharbi, M., Bauman, A., Neubeck, L., & Gallagher, R. (2016). Validation depend on walking speed. Medicine & Science in Sports & Exercise,
of the fitbit-flex as a measure of free-living physical activity in 49(4), 793–800. PubMed ID: 28319983 doi:10.1249/MSS.
a community-based phase III cardiac rehabilitation population. 0000000000001146
European Journal of Preventive Cardiology, 23(14), 1476–1485. Goto, C., Nishioka, K., Umemura, T., Jitsuiki, D., Sakagutchi, A.,
PubMed ID: 26907794 doi:10.1177/2047487316634883 Kawamura, M., : : : Higashi, Y. (2007). Acute moderate-intensity
Bai, Y., Welk, G., Nam, Y., Lee, J., Lee, J.-M., Kim, Y., : : : Dixon, P. exercise induces vasodilation through an increase in nitric oxide
(2016). Comparison of consumer and research monitors under bioavailability in humans. American Journal of Hypertension, 20,
semistructured settings. Medicine & Science in Sports & Exercise, 825–830. PubMed ID: 17679027 doi:10.1016/j.amjhyper.2007.
48(1), 151–158. PubMed ID: 26154336 doi:10.1249/MSS. 02.014
0000000000000727 Harriss, D., Macsween, A., & Atkinson, G. (2017). Standards for ethics
Bailey, T., Jones, H., Gregson, W., Atkinson, G., Cable, N., & Thijssen, D. in sport and exercise science research: 2018 update. International
(2012). Effect of ischemic preconditioning on lactate accumulation Journal of Sports Medicine, 38, 1126–1131. PubMed ID: 29258155
and running performance. Medicine & Science in Sports & Exercise, doi:10.1055/s-0043-124001
44(11), 2084–2089. PubMed ID: 22843115 doi:10.1249/MSS. Holdy, K. (2004). Monitoring energy metabolism with indirect calorime-
0b013e318262cb17 try: Instruments, interpretation, and clinical application. Nutrition in
Branson, R., & Johannigman, J. (2004). The measurement of energy Clinical Practice, 19, 447–454. PubMed ID: 16215138 doi:10.1177/
expenditure. Nutrition in Clinical Practice, 19, 622–636. PubMed 0115426504019005447
ID: 16215161 doi:10.1177/0115426504019006622 Hood, L., Balling, R., & Auffray, C. (2012). Revolutioning medicine in the
Bunn, J., Navalta, J., Fountaine, C., & Reece, J. (2018). Current state of 21st century through systems approaches. Biotechnology Journal, 7,
commercial wearable technology in physical activity monitoring 992–1001. PubMed ID: 22815171 doi:10.1002/biot.201100306
2015–2017. International Journal of Exercise Science, 11(7), 503– Hopkins, W. (2000). Measures of reliability in sports medicine and
515. PubMed ID: 29541338 science. Sports Medicine, 30(1), 1–15. PubMed ID: 10907753 doi:10.
Case, M., Burwick, H., Volpp, K., & Patel, M. (2015). Accuracy of 2165/00007256-200030010-00001
smartphone applications and wearable devices for tracking physical Kellar, S., & Kelvin, E. (2012). In S. Kellar& E. Kelvin (Eds.), Munro’s
activity data. Journal of the American Medical Association, 313(6), statistical methods for health care research (6th ed.). Philadelphia,
625–626. PubMed ID: 25668268 doi:10.1001/jama.2014.17841 PA: Lippincott Williams & Wilkins. Q6
Cockcroft, E., Williams, C., Tomlinson, O., Vlachopoulos, D., Jackman, Kenney, E., Gortmaker, S., Evenson, K., Goto, M., & Furberg, R. (2015).
S., Armstrong, N., & Barker, A. (2015). High intensity interval Systematic review of the validity and reliability of consumer-
exercise is an effective alternative to moderate intensity exercise wearable activity trackers. International Journal of Behavioral
for improving glucose tolerance and insulin sensitivity in adolescent Medicine and Physical Activity, 12(1), 5–10.
boys. Journal of Science and Medicine in Sport, 18(6), 720–724. Kenney, W., Wilmore, J., & Costill, D. (2015a). Adaptations to aerobic
PubMed ID: 25459232 doi:10.1016/j.jsams.2014.10.001 and anaerobic training. In W. Kenney, J. Wilmore, & D. Costill
Diaz, K., Krupka, D., Chang, M., Peacock, J., Ma, Y., Goldsmith, J., : : : (Eds.), Physiology of sport and exercise (6th ed., pp. 261–291).
Davidson, K. (2015). Fitbit: an accurate and reliable device for Champaign, IL: Human Kinetics.
wireless physical activity tracking. International Journal of Cardiol- Kenney, W., Wilmore, J., & Costill, D. (2015b). Body composition and
ogy, 185, 138–140. PubMed ID: 25795203 doi:10.1016/j.ijcard. nutrition for sport. In W. Kenney, J. Wilmore, & D. Costill (Eds.),
2015.03.038 Physiology of sport and exercise (6th ed., pp. 371–405). Champaign,
Dixon, P., Saint-Maurice, P., Kim, Y., Hibbing, P., & Welk, G. (2018). A IL: Human Kinetics.
primer on the use of equivalence testing for evaluating measurement Kenney, W., Wilmore, J., & Costill, D. (2015c). Energy expenditure and
agreement. Medicine & Science in Sports & Exercise, 50(4), 837– fatigue. In W. Kenney, J. Wilmore, & D. Costill (Eds.), Physiology
845. PubMed ID: 29135817 doi:10.1249/MSS.0000000000001481 of sport and exercise (6th ed., pp. 119–150). Champaign, IL: Human
Driller, M., McQuillan, J., & O’Donnell, S. (2016). Inter-device reliability Kinetics.
of an automatic-scoring actigraph for measuring sleep in healthy Kim, Y., & Welk, G. (2015). Criterion validity of competing
adults. Sleep Science, 9, 198–201. PubMed ID: 28123660 doi:10. accelerometry-based activity monitoring devices. Medicine & Science
1016/j.slsci.2016.08.003 in Sports & Exercise, 47(11), 2456–2463. PubMed ID: 25910051
Evenson, K., Goto, M., & Furberg, R. (2015). Systematic review of the doi:10.1249/MSS.0000000000000691
validity and reliability of consumer-wearable activity trackers. Inter- Lee, C., & Gorelick, M. (2011). Validity of the smarthealth watch to
national Journal of Behavioral Nutrition, 12, 159. doi:10.1186/ measure heart rate and energy expenditure during rest and exercise.
s12966-015-0314-1 Measurement in Physical Education and Exercise Science, 15(1),
Ferguson, T., Rowlands, A., Olds, T., & Maher, C. (2015). The validity of 18–25. doi:10.1080/1091367X.2011.539089
consumer-level, activity monitors in healthy adults worn in free- Lee, C., Gorelick, M., & Mendoza, A. (2011). Accuracy of an infrared
living conditions: A cross-sectional study. International Journal of LED device to measure heart rate and energy expenditure during rest
Behavioral Nutrition and Physical Activity, 12, 42. PubMed ID: and exercise. Journal of Sports Science, 29(15), 1645–1653. doi:10.
25890168 doi:10.1186/s12966-015-0201-9 1080/02640414.2011.609899
Fitbit. (2016). How does fitbit estimate how many calories I’ve burned. Lee, J., Kim, Y., & Welk, G. (2014). Validity of consumer-based physical
Retrieved from https://help.fitbit.com/articles/en_US/Help_article/1381 activity monitors. Medicine & Science in Sports & Exercise, 46(9), 1840–
Flores, M., Glusman, G., Brogaard, K., Price, N., & Hood, L. (2013). P4 1848. PubMed ID: 24777201 doi:10.1249/MSS.0000000000000287
medicine: how systems medicine will transform the healthcare sector Lewis, B., Napolitano, M., Buman, M., Williams, D., & Nigg, C. (2017).
and society. Personalized Medicine, 10(6), 565–576. PubMed ID: Future directions in physical activity intervention research: Expand-
25342952 doi:10.2217/pme.13.57 ing our focus to sedentary behaviors, technology, and dissemination.

(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 9

Journal of Behavioral Medicine, 40(1), 112–126. PubMed ID: for prediction of energy expenditure. Journal of Physical Activity
27722907 doi:10.1007/s10865-016-9797-8 and Health, 12(2), 149–154. PubMed ID: 24770438 doi:10.1123/
Peters, B., Heelan, K., & Abbey, B. (2013). Validation of omron jpah.2012-0495
pedometers using MTI accelerometers for use with children. Thomas, J., Nelson, J., & Silverman, S. (2011). Relationships among
International Journal of Exercise Science, 6(2), 106–113. variables. In J. Thomas, J. Nelson, & S. Silverman (Eds.), Research
Pope, Z., & Gao, Z. (2017). Mobile device apps in enhancing physical methods in physical activity (pp. 125–144). Champaign, IL: Human
activity. In Z. Gao (Ed.), Technology in physical activity and promo- Kinetics.
tion (pp. 106–128). London, UK: Routledge Publisher. TomTom. (2017). How calories are estimated on your watch. Retrieved
Powell, K., Paluch, A., & Blair, S. (2011). Physical activity for health: from http://uk.support.tomtom.com/app/answers/detail/a_id/19148/
What kind? how much? how intense? on top of what? In J. Fielding, ~/how-calories-are-estimated-on-your-watch
R. Brownson, & L. Green (Eds.), Annual review of public health U.S. Department of Health and Human Services. (2018). Physical activity
(Vol. 32, pp. 349–365). Palo Alto, CA: Annual Reviews. guidelines for Americans (2nd ed.). Washington, DC: Author.
Ren, Q., Li, Z., & Liang, G. (2017). Comparison of active and passive World Medical Association. (2018). World medical association declara-
movement on treadmill in healthy individuals. Space Medicine & tion of Helsinki: Ethical principles for medical research involving
Medical Engineering, 30(3), 185–190. human subjects. Retrieved from https://www.wma.net/policies-post/
Sasaki, J., Hickey, A., Mavilia, M., Tedesco, J., John, D., Keadle, S., & wma-declaration-of-helsinki-ethical-principles-for-medical-research-
Freedson, P. (2015). Validation of the fitbit wireless activity tracker involving-human-subjects/

(Ahead of Print)
Queries
Q1. Please ensure author information is listed correctly here and within the byline.
Q2. Please indicate the name of the institution
Q3. Please indicate the name of the university
Q4. Please provide a table footnote indicating what italics represents, or remove the italics
Q5. Please provide in-text for the following references: “Hopkins (2000)”, “Kellar and Kelvin. (2012)”, and “Lee et al. (2011).”
Q6. Please provide chapter title and page range for the reference “Keller and Kelvin (2012).”

View publication stats

You might also like