You are on page 1of 9

14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023].

See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Ultrasound Obstet Gynecol 2020; 56: 588–596
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/uog.21878

Development and validation of a machine-learning model for


prediction of shoulder dystocia
A. TSUR1,2 , L. BATSRY2 , S. TOUSSIA-COHEN2 , M. G. ROSENSTEIN3 , O. BARAK4 ,
Y. BREZINOV4 , R. YOELI-ULLMAN2 , E. SIVAN2 , M. SIROTA6 , M. L. DRUZIN1 ,
D. K. STEVENSON5 , Y. J. BLUMENFELD1 and D. ARAN6
1
Department of Obstetrics and Gynecology, Division of Maternal Fetal Medicine, Stanford University School of Medicine, Stanford, CA,
USA; 2 Department of Obstetrics and Gynecology, The Sheba Medical Center, Tel Hashomer, Israel; 3 Department of Obstetrics and
Gynecology, Division of Maternal Fetal Medicine, University of California, San Francisco, CA, USA; 4 Department of Obstetrics and
Gynecology, The Kaplan Medical Center, Rehovot, Israel; 5 Department of Pediatrics, Division of Neonatal and Developmental Medicine,
Stanford University School of Medicine, Stanford, CA, USA; 6 Bakar Computational Health Sciences Institute, University of California,
San Francisco, CA, USA

K E Y W O R D S: anthropometry; artificial intelligence; biometry; EHR; macrosomia

CONTRIBUTION vaginal deliveries, of which 131 were complicated by


What does this work add to what is already known? ShD, and the validation cohort included 2584 deliveries,
We developed and externally validated a machine-learning of which 31 were complicated by ShD. For each of these
model integrating maternal risk modifiers with fetal deliveries, we collected maternal and neonatal delivery
biometry for prediction of shoulder dystocia (ShD). The outcomes coupled with maternal demographics, obstetric
model was significantly more accurate than was prediction clinical data and sonographic fetal biometry. Biometric
based on estimated fetal weight either alone or combined measurements and their derived estimated fetal weight
with maternal diabetes and was able to stratify the risk were adjusted (aEFW) according to gestational age at
of ShD and neonatal injury in the context of suspected delivery. A ML pipeline was utilized to develop the model.
macrosomia. Results In the derivation cohort, the ML model pro-
vided significantly better prediction than did the current
What are the clinical implications of this work? clinical paradigm based on fetal weight and maternal
We demonstrated the potential clinical efficacy of applying diabetes: using nested cross-validation, the area under
this model to stratify the risk of ShD and related newborn the receiver-operating-characteristics curve (AUC) of the
injury among women carrying a fetus with estimated fetal model was 0.793 ± 0.041, outperforming aEFW com-
weight ≥ 4000 g. bined with diabetes (AUC = 0.745 ± 0.044, P = 1e−16 ).
The following risk modifiers had a positive beta that
was > 0.02, i.e. they increased the risk of ShD: aEFW
ABSTRACT (beta = 0.164), pregestational diabetes (beta = 0.047),
Objectives To develop a machine-learning (ML) model prior ShD (beta = 0.04), female fetal sex (beta = 0.04)
for prediction of shoulder dystocia (ShD) and to externally and adjusted abdominal circumference (beta = 0.03). The
validate the model’s predictive accuracy and potential following risk modifiers had a negative beta that was
clinical efficacy in optimizing the use of Cesarean delivery < −0.02, i.e. they were protective of ShD: adjusted
in the context of suspected macrosomia. biparietal diameter (beta = −0.08) and maternal height
(beta = −0.03). In the validation cohort, the model out-
Methods We used electronic health records (EHR) from performed aEFW combined with diabetes (AUC = 0.866
the Sheba Medical Center in Israel to develop the model vs 0.784, P = 0.00007). Additionally, in the validation
(derivation cohort) and EHR from the University of cohort, among the subgroup of 273 women carrying a
California San Francisco Medical Center to validate the fetus with aEFW ≥ 4000 g, the aEFW had no predictive
model’s accuracy and clinical efficacy (validation cohort). power (AUC = 0.548), and the model performed signifi-
Subsequent to application of inclusion and exclusion cantly better (0.775, P = 0.0002). A risk-score threshold
criteria, the derivation cohort included 686 singleton of 0.5 stratified 42.9% of deliveries to the high-risk group,

Correspondence to: Dr A. Tsur, 531 Maybell Avenue, Palo Alto, CA, 94306, USA (e-mail: avitsur@stanford.edu); and Dr D. Aran, Faculty
of Biology, Technion-Israel Institute of Technology, Haifa, Israel (e-mail: dviraran@technion.ac.il)
Accepted: 16 September 2019

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. ORIGINAL PAPER
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Utilizing AI to predict shoulder dystocia 589

which included 90.9% of ShD cases and all cases accom- METHODS
panied by maternal or newborn complications. A more
specific threshold of 0.7 stratified only 27.5% of the Derivation cohort
deliveries to the high-risk group, which included 63.6% To develop the model (derivation cohort), we used
of ShD cases and all those accompanied by newborn electronic health records (EHR) from the Sheba Medical
complications. Center, a tertiary-care medical center affiliated with
Conclusion We developed a ML model for prediction Tel-Aviv University in Israel. EHR data of all deliveries
of ShD and, in a different cohort, externally validated at Sheba are collected and stored by the obstetrics team
its performance. The model predicted ShD better than as part of routine clinical care. The study was approved
did estimated fetal weight either alone or combined with by the Sheba Medical Center Committee for Human
maternal diabetes, and was able to stratify the risk of Subjects Research (Institutional Review Board (IRB)
ShD and neonatal injury in the context of suspected number 5221-18-SMC). Because ShD is a rare event, we
macrosomia. Copyright © 2019 ISUOG. Published by enriched the cohort with ShD cases, to beyond the natu-
John Wiley & Sons Ltd. rally occurring incidence. For this purpose, we retrieved
EHR from 53 754 singleton vaginal deliveries between
January 2011 and August 2018. During this period,
238 cases of ShD were recorded (0.44% of singleton
INTRODUCTION
vaginal deliveries). These cases were compared with 1008
Shoulder dystocia (ShD) is an obstetric emergency, defined consecutive deliveries not complicated by ShD from 2
as difficulty in delivering the newborn’s shoulders after consecutive months in 2016. We reviewed manually the
successful delivery of the head, thus requiring additional charts of all cases and controls to identify and verify the
maneuvers1,2 . Once ShD occurs, even if all appropriate following 18 parameters that may modify the risk of ShD:
action is taken, there is an increased risk for permanent maternal age, gestational age (GA) at delivery, maternal
neonatal and maternal injury3–5 . height, maternal weight, smoking status, gravidity, parity,
Given the association between fetal macrosomia intrauterine fetal death, ShD in a previous pregnancy,
(excess fetal size) and ShD6 , the American College of pregestational diabetes, gestational diabetes, insulin treat-
Obstetricians and Gynecologists (ACOG) recommends ment, fetal sex, four sonographic biometric measurements
considering Cesarean delivery (CD) to prevent ShD for (fetal biparietal diameter (BPD), head circumference
women with diabetes carrying a fetus with an estimated (HC), abdominal circumference (AC) and femur length
fetal weight (EFW) of at least 4500 g1 and for non-diabetic (FL)) and the EFW derived from these measurements.
women carrying a fetus with an EFW of at least 5000 g1 . Additionally, for all ShD cases, we identified and
Since more than 50% of ShD cases occur at an EFW verified potentially related neonatal and maternal com-
below these cut-offs7 , many healthcare providers discuss plications. The neonatal complications identified were:
with their patient the options of CD at lower cut-offs. brachial plexus injuries (BPI), fractures of the clavicle or
In a large study investigating physician-documented humerus and hypoxic ischemic encephalopathy (HIE)3 .
indications for CD at Yale New Haven Hospital, CD The maternal complications were: severe perineal lac-
was generally offered at weight thresholds 500 g below eration (defined as third- or fourth-degree tear)4 and
the ACOG cut-offs for both diabetic and non-diabetic postpartum hemorrhage (PPH)5 .
women. In some circumstances, even lower thresholds Inclusion criteria were: (1) sonographic EFW assess-
were used8 . Unfortunately, any arbitrary lowering of the ment within 5 weeks prior to delivery; (2) complete data
ACOG cut-off leads to a growing number of potentially on all 18 parameters that may modify the risk of ShD;
avoidable CDs, which in turn can lead to obstetric and (3) GA at delivery > 34 + 0 weeks. We excluded any
morbidity in subsequent pregnancies9 . case with one or more outlier parameters, defined as 5
In order to optimize the prediction of ShD, several stud- SD over the average value for any of the variables. This
ies have utilized regression methods10–14 in an attempt filtering process, summarized in Table 1, reduced our
to integrate EFW and maternal diabetes with additional derivation cohort to 686 vaginal deliveries, of which 131
established maternal, fetal and intrapartum risk modifiers. (19.1%) had ShD.
These include prior ShD15 , the need for operative vaginal
delivery14 , maternal height and weight13 and fetal anthro- Validation cohort
pometric variations10 . Unfortunately, these studies have
not resulted in a validated and established clinical model, To validate the model’s accuracy and clinical efficacy
partly because of the limitations of previously employed (validation cohort), we used EHR from the University
statistical methods. Machine-learning (ML) algorithms of California, San Francisco (UCSF), a tertiary-care, aca-
complement and extend classic statistical methods and demic medical center. Clinical obstetric data from all
can detect patterns from large, heterogeneous and deliveries at UCSF are entered into a clinical research
complex data16,17 . database as part of routine clinical care by the obstetric
In the current study, therefore, we developed a ML and maternal-fetal medicine team. Data are validated by
model for prediction of ShD, and, in a different popula- research coordinators shortly thereafter. The UCSF Com-
tion, validated its predictive accuracy and clinical efficacy. mittee for Human Subjects Research approved this study

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
590 Tsur et al.

Table 1 Overview of data in derivation and validation cohorts of singleton vaginal deliveries used in development and validation of
machine-learning model for prediction of fetal shoulder dystocia (ShD)

Derivation cohort (n = 53 754) Validation cohort (n = 23 794)


ShD Controls ShD Without ShD

Potentially eligible cases 238 1008 302 23 492


Did not meet inclusion criteria
Biometry > 5 weeks prior to delivery 25 167 268 20 532
Not all parameters available 82 273 2 339
GA at delivery ≤ 34 + 0 weeks 0 9 0 46
Excluded 0 4 1 22
Final cohort 131 555 31 2553

Data are given as n. Derivation cohort included deliveries in period 2011–2018; validation cohort included deliveries in period 2003–2018.
GA, gestational age.

(IRB number #17-22929). We analyzed all UCSF deliver- (using the function pnorm in the R statistics package
ies between March 2003 and September 2018, identifying (https://www.R-project.org/)):
23 492 singleton vaginal deliveries that were not and 302  

cases that were complicated by ShD. We applied the same EFWi − EFW (USGA)
Pi =  .
inclusion and exclusion criteria as had been applied to the 0.12 · EFW (USGA)
derivation cohort, which resulted in 2584 vaginal deliv-
eries, of which 31 (1.2%) had ShD (Table 1). Details on Next, we converted the calculated percentiles back to
ShD in a previous pregnancy were not available in the val- values, according to GA at time of delivery, as follows.
idation cohort and thus could not be used. We reviewed We calculated the median expected EFW for the GA on
manually all charts of ShD cases to verify the diagnosis the day of delivery (DGA):
and collected data on the same neonatal and maternal  2
EFW (DGA) = e0.578+0.332·DGA −0.00354·DGA .
complications potentially related to ShD as had been col-
lected for the derivation cohort (BPI, clavicular or humeral We then calculated aEFW based on the values calculated
fracture, HIE, PPH and severe perineal laceration). in the previous steps:

aEFWi = EFW (DGA)
Adjusting sonographic biometry according to 
gestational age + 0.12 · EFW (DGA)· qnorm(Pi ).

We confirmed that, in both cohorts, EFW had been These steps were for adjusting EFW to aEFW. The
calculated from biometric parameters using the same same steps were performed, using the relevant equations,
Hadlock equation18 : to adjust BPD, HC, AC and FL to aBPD, aHC, aAC and
aFL, respectively.
Log10 EFW = 1.3596 − 0.00386 AC × FL As described in detail in Figure S1, the correlation
+ 0.0064 HC + 0.00061 BPD × AC coefficient of aEFW with birth weight (BW) was 0.84,
significantly higher than the correlations of BW with
+ 0.0424 AC + 0.174 FL. unadjusted EFW (Fisher’s r-to-Z transformation test
P = 0.0248). In addition, the correlation of aEFW
EFW may change between the time of the growth ultra-
with BW was superior to the correlation between the
sound examination and delivery (mean ± SD time inter-
weight percentile at the time of the most recent growth
val, 5.2 ± 6.7 days in derivation cohort and 18.1 ± 9.3
ultrasound examination and BW percentile (P = 0.006).
in validation cohort). Thus, we adjusted the biometric
Moreover, the correlation of aEFW was superior even
parameters according to GA at delivery. To do so, first,
to the correlation between clinical EFW obtained within
we calculated the percentile of each sonographic measure-
1 week of delivery and BW (P = 0.0349). Therefore, for
ment at the GA at which it had been performed, based
developing and validating the predictive model, we used
on the Hadlock equations19,20 , as follows. We calculated
the adjusted sonographic measurements.
the median expected EFW for GA on the day on which
the measurement was performed (USGA) based on the
Hadlock equation: Machine-learning pipeline
 2
EFW (USGA) = e0.578+0.332·USGA−0.00354·USGA . To develop the ML model we used the generalized
linear model algorithm with lasso penalty21 (using the R
For each observation, i, we calculated the percentile statistics package, glmnet). Lasso regression is a powerful
of the EFW at the time of delivery based on the normal technique for creating parsimonious models and reduces
distribution and the difference between the measured issues related to overfitting. To test the ability of ML to
EFW and the median EFW on the day of measurement predict ShD, we first split the derivation cohort into a

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Utilizing AI to predict shoulder dystocia 591

80% training set and a 20% test set. Using the training Between March 2003 and September 2018, there were
set, a 10-fold cross-validation analysis was performed, 23 794 singleton vaginal deliveries at the UCSF Medical
with ShD as the response vector. Misclassification error Center. Of these, 302 (1.27%) were complicated by ShD.
was used as loss function for the cross-validation. To Applying the inclusion and exclusion criteria (Table 1)
cope with the imbalance in classes, the observations reduced the validation cohort to 2584 singleton vaginal
were weighted with values of the fraction of the opposite deliveries, of which 31 (1.2%) were complicated by ShD
class: ShD observations were weighted with 0.8090, (Table 1). Among these cases with ShD, 12 (38.7%)
while control observations were weighted with 0.1906. were complicated by one or more of the following: PPH
This algorithm generates models for a range of λ, (n = 4; 12.9%), severe perineal laceration (n = 4; 12.9%),
which is a tuning parameter that controls the overall BPI (n = 8; 25.8%; two of these cases resolved before
strength of the penalty. We chose the model that is neonatal discharge) and clavicular or humeral fracture
based on the minimal value of λ that gives minimum (n = 1; 3.2%).
mean cross-validated error. Finally, the area under the Analysis in the derivation and validation cohorts of
receiver-operating-characteristics curve (AUC) was calcu- the 18 maternal and prenatal sonographic variables that
lated for the test set. Since the initial split for training and may modify the risk of ShD showed that both datasets
test sets was random, we performed nested analysis: the contained similar associations with ShD (Table 2 and
procedure described was repeated 500 times, each time Figure 1). BW, aEFW, aAC, pregestational diabetes and
with random split for training and test sets. Subsequently, insulin treatment were significantly associated with ShD
we used the entire derivation cohort as a training set to in both cohorts.
fit a robust model. This model was then applied to the
validation cohort. Analyses based on aEFW and maternal
Shoulder dystocia prediction model
diabetes (as defined in the ACOG practice bulletin1 :
either pre- or gestational diabetes) alone were performed Figure 2b depicts the ML model. The following risk
using logistic regression. In all analyses, the features were modifiers had a positive beta > 0.02, i.e. they increased
centered and scaled to mean zero with a SD of 1 (z-score) the risk of ShD: aEFW (beta = 0.164), pregestational
using parameters learned from the training cohort. diabetes (beta = 0.047), prior ShD (beta = 0.04), female
fetal sex (beta = 0.04) and adjusted AC (beta = 0.03). The
Statistical analysis following risk modifiers had a negative beta < −0.02, i.e.
they were protective of ShD: adjusted BPD (beta = −0.08)
Fisher’s exact test was used to calculate P-values for
association of binary variables with ShD (Table 2 and and maternal height (beta = −0.03). In contrast to
Figure 1). Logistic regression was used for continuous pregestational diabetes, gestational diabetes had a beta
variables, in which the odds ratios are the exponent of the of 0 and therefore its presence or absence did not modify
predictor coefficient, and the P-value is based on the nor- the risk of ShD.
mal distribution of the dispersion of the coefficient. The
AUC was used as the primary metric of the model’s perfor- Prediction of shoulder dystocia: derivation cohort
mance. A paired-samples t-test was used to test statistical
significance of the repeated AUCs in the derivation cohort. The AUC of aEFW to predict ShD was only
The standard non-parametric test for comparing two cor- 0.745 ± 0.044. Considering diabetes (as defined in
related ROC curves (DeLong’s test)22 was used to test the the ACOG practice bulletin1 : either pre- or gestational
significance of this comparison in the validation cohort. diabetes) in addition to aEFW did not improve its
Pearson’s coefficients were used for correlation analyses. prediction accuracy (AUC = 0.745 ± 0.044). We trained
The significance of differences in correlation coefficients a ML model using the 18 potential ShD risk modifiers.
was tested using Fisher’s r-to-Z transformation. On 500 random splits of training and test sets, the AUC
of the ML model was 0.793 ± 0.041, outperforming
aEFW alone as well as aEFW and diabetes in 95.2% of
RESULTS the repeats (P < 1e−16 , Figure 2a). We tested an ensemble
Demographics of ML algorithms but did not observe any improvement
over our regularized generalized linear model (Figure S2).
Between January 2011 and August 2018, there were We next utilized the entire derivation cohort to generate
53 754 singleton vaginal deliveries at the Sheba Medical a robust predictive model (Figure 2b). The variable with
Center in Israel. Of these, 238 (0.4%) deliveries were the highest contribution to the model was aEFW, but
complicated by ShD. Among these cases with ShD, other sonographic measurements were also important,
44 (18.5%) were complicated by one or more of the most notably aBPD, which had a negative association
following: PPH (n = 11; 4.6%), severe perineal laceration with ShD risk in the model. Independently, increasing
(n = 8; 3.4%), BPI (n = 10; 4.2%), clavicular or humeral aBPD was significantly associated with increased risk of
fracture (n = 19; 8.0%) and HIE (n = 1; 0.4%). Applying ShD (P = 0.028). However, after adjusting the aBPD to
the inclusion and exclusion criteria (Table 1) reduced our aEFW, we observed that increasing aBPD was associated
derivation cohort to 686 vaginal deliveries, of which 131 with a significantly reduced risk for ShD (P = 3e−7 )
(19.1%) were complicated by ShD. (Figure 2c).

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
592 Tsur et al.

Table 2 Demographics and odds ratios (OR) of risk modifiers in derivation and validation cohorts of singleton vaginal deliveries used in
development and validation of machine-learning model for prediction of shoulder dystocia (ShD)

Derivation cohort Validation cohort


Controls Cases Log2 OR Controls Cases Log2 OR
(n = 555) (n = 131) (95% CI) P (n = 2553) (n = 31) (95% CI) P

Maternal factors
Age (years) 32.5 ± 4.9 31.9 ± 5.1 –0.19 0.186 33.0 ± 5.3 32.4 ± 6.2 –0.18 0.486
(–0.46 to 0.09) (–0.67 to 0.33)
Weight (kg)* 74.9 ± 12.6 79.1 ± 15.0 0.43 0.001 79.0 ± 17.0 84.1 ± 17.5 0.35 0.101
(0.17 to 0.69) (–0.10 to 0.75)
Height (cm) 164.0 ± 6.1 163.2 ± 5.6 –0.18 0.215 163.3 ± 6.9 160.9 ± 6.6 –0.48 0.048
(–0.45 to 0.10) (–0.94 to 0.01)
Gravidity 2.7 ± 1.8 2.6 ± 1.7 –0.03 0.821 2.4 ± 1.6 2.6 ± 2.5 0.18 0.433
(–0.32 to 0.24) (–0.32 to 0.6)
Nulliparous 357 (64.3) 87 (66.4) 0.13 0.685 1178 (46.1) 12 (38.7) –0.44 0.471
(–0.32 to 0.24) (–1.49 to 0.61)
Previous ShD 0 (0) 4 (3.1) Inf < 1e−16 N/A N/A N/A N/A
Smoker 29 (5.2) 3 (2.3) –1.23 0.174 74 (2.9) 3 (9.7) 1.86 0.061
(–2.97 to 0.50) (0.11 to 3.61)
Current pregnancy
GA (days) 277.9 ± 9.6 280.3 ± 8.1 0.40 0.009 272.3 ± 10.4 274.0 ± 7.7 0.25 0.364
(0.11 to 0.70) (–0.27 to 0.82)
IUFD 2 (0.4) 2 (1.5) 2.10 0.167 3 (0.1) 0 (0) 0 1
(–0.74 to 4.94)
Diabetes
Pregestational 0 (0) 4 (3.1) Inf < 1e−16 74 (2.9) 5 (16.1) 2.67 0.002
diabetes (1.25 to 4.09)
Gestational 58 (10.5) 15 (11.5) 0.15 0.753 812 (31.8) 14 (45.2) 0.82 0.122
diabetes (–0.72 to 1.02) (–0.20 to 1.85)
Insulin treatment 8 (1.4) 7 (5.3) 1.95 0.013 406 (15.9) 10 (32.3) 1.33 0.024
(0.46 to 3.44) (0.23 to 2.43)
US examination
Female sex 273 (49.2) 80 (61.1) 0.70 0.015 1254 (49.1) 19 (61.3) 0.72 0.207
(0.14 to 1.26) (–0.33 to 1.76)
HC (adjusted, mm) 331.9 ± 11.3 335.3 ± 9.9 0.44 0.002 332.9 ± 13.6 336.2 ± 13.0 0.37 0.178
(0.16 to 0.72) (–0.16 to 0.91)
BPD (adjusted, mm) 93.2 ± 3.5 93.9 ± 3.2 0.32 0.028 92.7 ± 4.4 93.8 ± 3.5 0.39 0.150
(0.04 to 0.60) (–0.13 to 0.93)
AC (adjusted, mm) 335.1 ± 19.4 353.2 ± 15.6 1.75 < 1e−16 320.9 ± 21.3 344.7 ± 17.6 1.56 1e−9
(1.37 to 2.15) (1.06 to 2.07)
FL (adjusted, mm) 74.7 ± 3.2 76.4 ± 2.7 0.86 6e−8 73.6 ± 4.1 74.8 ± 3.7 0.42 0.13
(0.55 to 1.17) (–0.11 to 0.97)
EFW (adjusted, g) 3372.2 ± 401.7 3725.2 ± 340.8 1.50 1e−16 3349.2 ± 517.7 3838.8 ± 411.4 1.56 2e−7
(1.16 to 1.87) (1.06 to 2.07)
HC (percentile) 48.5 ± 1.2 48.8 ± 1.1 0.29 0.041 49.0 ± 1.4 49.2 ± 1.3 0.25 0.233
(0.01 to 0.56) (–0.26 to 0.77)
BPD (percentile) 49.3 ± 4.5 49.7 ± 4.4 0.14 0.324 50.1 ± 5.5 51.2 ± 4.4 0.28 0.287
(–0.14 to 0.41) (–0.23 to 0.81)
AC (percentile) 49.4 ± 1.3 50.6 ± 1.3 1.44 < 1e−16 50.1 ± 1.7 52.3 ± 1.3 1.68 4e−12
(1.12 to 1.79) (1.21 to 2.17)
FL (percentile) 46.4 ± 4.7 48.4 ± 4.4 0.63 1e−5 46.8 ± 5.5 48.0 ± 4.9 0.32 0.233
(0.35 to 0.92) (–0.20 to 0.83)
EFW (percentile) 36.5 ± 23.5 58.1 ± 24.0 1.26 < 1e−16 45.7 ± 28.6 75.4 ± 20.8 1.68 3e−7
(0.97 to 1.56) (1.07 to 2.38)
Birth weight (g) 3264.1 ± 427.5 3878.9 ± 387.9 2.66 < 1e−16 3157.8 ± 515.7 3880.3 ± 423.2 1.91 2e−13
(2.20 to 3.16) (1.41 to 2.44)

Data are given as mean ± SD or n (%), unless stated otherwise. Fisher’s exact test used to calculate P-values for binary variables; logistic
regression used for continuous variables: ORs are exponents of predictor coefficient and P-value is based on normal distribution of
dispersion of coefficient. *At admission to labor. AC, abdominal circumference; BPD, biparietal diameter; EFW, estimated fetal weight; FL,
femur length; GA, gestational age at delivery; HC, head circumference; Inf, infinity; IUFD, intrauterine fetal death; N/A, not available; US,
ultrasound.

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Utilizing AI to predict shoulder dystocia 593

Maternal age
Maternal height
Maternal weight
Gravidity
Nulliparous
Smoker
Gestational age
IUFD
Pregestational diabetes
Gestational diabetes
Insulin treatment
Sex (female)
Head circumference*
Femur length*
Biparietal diameter*
Abdominal circumference*
Sonographic EFW*
Previous ShD 0.0 2.5
–2 0 2 4 Odds ratio (log2)
Odds ratio (log2)

Figure 1 Odds ratios (log2 ) of association of features with shoulder dystocia (ShD) in derivation (left) and validation (right) cohorts.
Previous ShD data were not available for validation cohort. Maximum values were set as 4. *Gestational age-adjusted measurements. EFW,
estimated fetal weight; IUFD, intrauterine fetal death.

Prediction of shoulder dystocia: validation cohort Prior seminal studies attempting to create a model
for predicting ShD based on different risk-factor
The ML model, which was learned exclusively from the combinations, with and without sonographic EFW,
derivation cohort, achieved an AUC of 0.866, significantly were unable to outperform EFW10–12,14 . In contrast,
better than that of aEFW alone (0.772, P = 0.0002) or that
in both cohorts, our ML model performed significantly
of aEFW and diabetes (0.784, P = 0.00007, Figure 3a),
better than did EFW, either alone or combined with
validating in an independent cohort its ability to improve
diabetes (as defined in the ACOG practice bulletin1 :
the prediction of ShD.
either pre- or gestational diabetes). By integrating
We then evaluated the potential clinical efficacy
18 different parameters, we were able to untangle
of the model in stratifying the risk among women
and add precision to previously reported associations.
carrying a fetus with a sonographic aEFW ≥ 4000 g who
For example, we observed that pregestational diabetes
delivered vaginally. We identified 273 such deliveries,
of which 11 (4.0%) were complicated by ShD. In these significantly increased the risk of ShD, while gestational
deliveries, aEFW had minimal predictive power for ShD diabetes did not affect the risk at all. Additionally, the
(AUC = 0.548) and risk could not be stratified by increas- model was able to integrate the additive risk associated
ing weight. In contrast, the ML performed significantly with insulin treatment and obesity.
better (AUC = 0.775, P = 0.0002). In Table 3, we present After establishing the model accuracy in the validation
two thresholds for stratifying the risk of women carrying cohort, we advanced to evaluating the potential clinical
a fetus with EFW ≥ 4000 g. A risk-score threshold of 0.5 efficacy of the model in optimizing the use of CD in
stratified 42.9% of deliveries to the high-risk group, which the context of suspected macrosomia. In the validation
included 90.9% of ShD cases and all those accompanied cohort, we observed that increasing EFW beyond 4000 g
by maternal or newborn complications (Figure 3b). Using did not further affect the risk for ShD. Therefore, in
this threshold would require 11.7 CD to prevent one this group, risk could not be stratified by EFW alone.
ShD case (number needed-to-treat). Alternatively, a more Conversely, the ML model was able to stratify women
specific threshold of 0.7 stratified only 27.5% of the in this group into two further distinct groups in terms
deliveries to the high-risk group, which included 63.6% of their risk for ShD. A risk-score threshold of 0.5
of ShD cases, but all those accompanied by newborn stratified 42.9% of deliveries to the high-risk group, which
complications (Figure 3b). Using this threshold would included 90.9% of ShD cases and all those accompanied
require 10.7 CD to prevent one ShD case (Table 3). by maternal or newborn complications. Furthermore, as
the main concern with ShD is associated newborn injury,
we present an additional, more specific, threshold of 0.7
DISCUSSION
that stratified only 27.5% of the deliveries to the high-risk
In this study, we developed and validated a new group, which included 63.6% of ShD cases, but all
risk-prediction model to estimate the risk of ShD based on those accompanied by newborn injury. Given that many
maternal risk factors and fetal sonographic parameters. healthcare providers discuss CD with patients carrying

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
594 Tsur et al.

P = 0.028
(c) 105
(a) 100
P < 1e–16
95

aBPD
0.9
90
85
0.8
AUC

80
Controls ShD
0.7
P = 3e–7
105

aBPD adjusted for aEFW


100
0.6
95
FW

es

d)
et

se
aE

ab

a
90

-b
di

W
+

EF
FW

85
(a
aE

el
od

80
m
L

Controls ShD
M

(b) Sonographic EFW*


Pregestational diabetes
Previous ShD
Sex (female)
Abdominal circumference*
Maternal weight
IUFD
Insulin treatment
Variable

Gestational diabetes
Femur length*
Nulliparous
Smoker
Gravidity
Head circumference*
Maternal age
Gestational age
Maternal height
BPD*
0.0 0.1
Beta

Figure 2 Prediction of fetal shoulder dystocia (ShD) in derivation cohort. (a) Box-and-whiskers plots of predictive performance of estimated
fetal weight (EFW) adjusted for gestational age at delivery (aEFW) alone, logistic model trained with aEFW and diabetes, and machine-
learning (ML) model, measured by area under receiver-operating-characteristics curve (AUC). Dataset was split 500 times into training and
test sets, and AUCs presented are for test sets. Boxes represent median and upper and lower quartiles, and whiskers denote largest and
smallest values no more than 1.5 × interquartile range from limits. (b) Beta values for each variable in final model that was generated by ML
algorithm. *Gestational age-adjusted measurements. IUFD, intrauterine fetal death. (c) Box-and-whiskers plots showing association with
ShD of biparietal diameter (BPD) adjusted for gestational age at delivery (aBPD) (above) and aBPD adjusted for aEFW (below). aBPD had
significant positive association with ShD, while aBPD adjusted for aEFW had significant negative association with ShD, as suggested by the
ML model. Adjustment to aEFW was performed by fitting a linear model between aBPD and aEFW, extracting residuals and adding mean
aBPD. P-values are based on logistic regression with ShD as response variable.

a fetus with EFW ≥ 4000 g8 , we speculate that universal a large number of assumptions16 related to interactions
application of our model in such patients could not only between the features. Examples for such assumptions
reduce the number of ShD cases, but also optimize the use are specific interactions, such as the difference between
of CD in the context of suspected macrosomia. the abdominal diameter and the BPD10 , the difference
To our knowledge, and according to a review of between the AC and the HC23 , the AC/HC ratio and,
the literature (via a MedLine search using different in patients with gestational diabetes mellitus, also the
combinations of the terms ‘shoulder dystocia’, ‘machine FL/AC ratio24 . The use of ML obviated this dependence
learning’, ‘artificial intelligence’ and ‘prediction’), this is and allowed us to generate an assumption-free model
the first attempt to utilize a complete ML pipeline for quantifying the association with ShD of each of the 18
predicting ShD. Previous studies used regression analyses parameters while considering the complex interactions
for prediction, and were therefore required to rely on between them.

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Utilizing AI to predict shoulder dystocia 595

(a) 1.00 (b) 1.00

0.75 0.75

Sensitivity

Risk score
0.50 0.50

0.25 0.25

0.00 0.00
0.00 0.25 0.50 0.75 1.00 4000 4250 4500 4750 5000
1 – Specificity aEFW (g)

Figure 3 External validation and clinical efficacy of the machine-learning (ML) model in prediction of fetal shoulder dystocia (ShD).
(a) Receiver-operating-characteristics (ROC) curves for prediction, in validation cohort, of ShD by estimated fetal weight adjusted for
gestational age at delivery (aEFW) alone ( ; area under ROC curve (AUC) = 0.772), logistic model trained with aEFW and diabetes ( ;
AUC = 0.784) and ML model ( ; AUC = 0.866). (b) Scatterplot of aEFW vs risk score for 273 deliveries with aEFW ≥ 4000 g. Risk-score
threshold of 0.5 stratified 42.9% of deliveries to high-risk group, which included 90.9% of ShD cases and all those accompanied by
maternal or newborn complications; more specific threshold of 0.7 stratified only 27.5% of deliveries to high-risk group, which included
63.6% of ShD cases, but all those accompanied by newborn complications. Plots indicate non-ShD cases ( ) and ShD cases with: newborn
injury ( ), maternal complications only ( ) and no additional related complications ( ).

Table 3 Predictive accuracy for shoulder dystocia (ShD) in validation cohort of singleton vaginal deliveries

Model risk-score Deliveries ShD


threshold (n) (n) Specificity Sensitivity NPV PPV NNT AUC

aEFW ≥ 4000 g 273 11 24.8 0.775


0.7 75 7 0.74 0.64 0.98 0.09 10.7
(0.68–0.79) (0.31–0.89) (0.95–0.99) (0.04–0.18)
0.5 117 10 0.59 0.91 0.99 0.09 11.7
(0.53–0.65) (0.59–1.00) (0.96–1.00) (0.04–0.15)
All 2584 31 83.4 0.866
0.7 144 11 0.95 0.35 0.99 0.08 13.1
(0.94–0.96) (0.19–0.55) (0.99–0.99) (0.04–0.13)
0.5 229 16 0.92 0.52 0.99 0.07 14.3
(0.91–0.93) (0.33–0.70) (0.99–1.00) (0.04–0.11)
Values in parentheses are 95% CI. aEFW, adjusted estimated fetal weight; AUC, area under receiver-operating-characteristics curve; NNT,
number needed-to-treat (number of Cesarean deliveries required to prevent one case of ShD); NPV, negative predictive value; PPV, positive
predictive value.

In many obstetric centers, women with suspected risk. This may be of particular importance in women at
macrosomia and/or diabetes are referred for fetal growth intermediate risk for ShD who desire vaginal delivery,
ultrasound examination at ∼37 weeks’ gestation, while because their risk keeps increasing as their fetus keeps
delivery may take place several weeks later. In order to growing.
increase the clinical generalizability of our findings, we Another strength of our study is the use of two different
allowed up to 35 days between sonographic biometry cohorts (from two different continents), one serving for
assessment and delivery in both cohorts. For this reason, development and the other for validation of the model.
we adjusted all biometric parameters (BPD, HC, AC and This practice was chosen to prevent a prevalent overfitting
FL), as well as the EFW, according to GA at delivery, error and to establish generalizability of the model.
by means of commonly used Hadlock equations18–20 . We Our study is not, however, without limitations. One
validated the accuracy of this practice in our data and, limitation is its retrospective design. We could assess
furthermore, we observed that the AUC of aEFW alone the model’s performance only when vaginal delivery was
in the derivation and validation cohorts was 0.745 and achieved and not when CD was performed either before
0.771, respectively, both being higher than the AUC of or during the course of labor. Additionally, although the
0.70 observed in a previous study using biometry obtained derivation cohort included 131 ShD cases, the validation
within 7 days of delivery10 . The prospective use of aEFW cohort had only 31 ShD cases for which complete data for
can not only provide the specific risk for ShD at the exact all 18 parameters were available. However, even in this
day of delivery, but also provide information regarding relatively small cohort, the improvement achieved with
the optimal GA for labor induction in the context of ShD the model was highly statistically significant.

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.
14690705, 2020, 4, Downloaded from https://obgyn.onlinelibrary.wiley.com/doi/10.1002/uog.21878 by Nat Prov Indonesia, Wiley Online Library on [27/01/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
596 Tsur et al.

The model is available without cost at www 6. Beta J, Khan N, Khalil A, Fiolna M, Ramadan G, Akolekar R. Maternal and neonatal
complications of fetal macrosomia: systematic review and meta-analysis. Ultrasound
.mfmcalculator.org. Currently, until further validation, Obstet Gynecol 2019; 54: 308–318.
we advocate using the model only as part of an 7. Ouzounian JG, Korst LM, Miller DA, Lee RH. Brachial plexus palsy and shoulder
dystocia: obstetric risk factors remain elusive. Am J Perinatol 2013; 30: 303–307.
IRB-approved clinical investigation. We propose a 8. Barber EL, Lundsberg LS, Belanger K, Pettker CM, Funai EF, Illuzzi JL. Indications
prospective randomized study focusing on women carry- contributing to the increasing cesarean delivery rate. Obstet Gynecol 2011; 118:
29–38.
ing a fetus with EFW ≥ 4000 g but below current ACOG 9. Spong CY, Berghella V, Wenstrom KD, Mercer BM, Saade GR. Preventing the first
cut-offs for CD. These women would be randomized to cesarean delivery: summary of a joint Eunice Kennedy Shriver National Institute of
Child Health and Human Development, Society for Maternal-Fetal Medicine, and
either management based on their healthcare provider’s American College of Obstetricians and Gynecologists Workshop. Obstet Gynecol
clinical considerations or management based on ML risk 2012; 120: 1181–1193.
10. Burkhardt T, Schmidt M, Kurmanavicius J, Zimmermann R, Schäffer L. Evaluation
stratification. The two main outcomes would be CD rate of fetal anthropometric measures to predict the risk for shoulder dystocia. Ultrasound
and ShD rate. Another important future step is to develop Obstet Gynecol 2014; 43: 77–82.
11. Gupta M, Hockley C, Quigley MA, Yeh P, Impey L. Antenatal and intrapartum
a complementary ML model for reassessing the risk of prediction of shoulder dystocia. Eur J Obstet Gynecol Reprod Biol 2010; 151:
ShD before performing operative vaginal delivery. 134–139.
12. Belfort MA, Dildy GA, Saade GR, Suarez V, Clark SL. Prediction of shoulder dystocia
In conclusion, we developed a ML model for prediction using multivariate analysis. Am J Perinatol 2007; 24: 5–10.
of ShD. The model was more accurate in the prediction 13. Dyachenko A, Ciampi A, Fahey J, Mighty H, Oppenheimer L, Hamilton EF.
Prediction of risk for shoulder dystocia with neonatal injury. Am J Obstet Gynecol
of ShD than was EFW either alone or combined with 2006; 195: 1544–1549.
diabetes and provides a robust tool for optimizing the use 14. Palatnik A, Grobman WA, Hellendag MG, Janetos TM, Gossett DR, Miller ES.
Predictors of shoulder dystocia at the time of operative vaginal delivery. Am J Obstet
of CD in the context of suspected macrosomia. Gynecol 2016; 215: 624.e1–5.
15. Ouzounian JG, Gherman RB, Chauhan S, Battista LR, Lee RH. Recurrent shoulder
dystocia: analysis of incidence and risk factors. Am J Perinatol 2012; 29: 515–518.
16. Bzdok D, Altman N, Krzywinski M. Statistics versus machine learning. Nat Methods
ACKNOWLEDGMENT 2018; 15: 233–234.
17. Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA 2018;
We would like to acknowledge the contribution of Cinthia 319: 1317–1318.
18. Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK. Estimation of fetal weight
Blat in extracting data from the UCSF deliveries database. with the use of head, body, and femur measurements–a prospective study. Am
J Obstet Gynecol 1985; 151: 333–337.
19. Hadlock FP, Deter RL, Harrist RB, Park SK. Estimating fetal age: computer-assisted
analysis of multiple fetal growth parameters. Radiology 1984; 152: 497–501.
REFERENCES 20. Hadlock FP, Harrist RB, Martinez-Poyer J. In utero analysis of fetal growth: a
sonographic weight standard. Radiology 1991; 181: 129–133.
1. ACOG. Practice Bulletin No 178. Shoulder Dystocia. Obstet Gynecol 2017; 129: 21. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear
e.123–133. Models via Coordinate Descent. J Stat Softw 2010; 33: 1–22.
2. RCOG. Green-top Guidedline, No 42. Shoulder Dystocia; 2012 (updated 2017). 22. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or
https://www.rcog.org.uk/en/guidelines-research-services/guidelines/gtg42 more correlated receiver operating characteristic curves: a nonparametric approach.
3. Gherman RB, Ouzounian JG, Goodwin TM. Obstetric maneuvers for shoulder Biometrics 1988; 44: 837–845.
dystocia and associated fetal morbidity. Am J Obstet Gynecol 1998; 178: 23. Endres L, DeFranco E, Conyac T, Adams M, Zhou Y, Magner K, O’Rourke L,
1126–1130. Bernhard KA, Siddiqui D, McCormick A, Abramowicz J, Merkel R, Jawish R,
4. Gauthaman N, Walters S, Tribe IA, Goldsmith L, Doumouchtsis SK. Shoulder Habli M, Floman A, Magann EF, Chauhan SP, Network CFR. Association of
dystocia and associated manoeuvres as risk factors for perineal trauma. Int Fetal Abdominal-Head Circumference Size Difference With Shoulder Dystocia: A
Urogynecol J 2016; 27: 571–577. Multicenter Study. AJP Rep 2015; 5: e099–104.
5. Gherman RB, Goodwin TM, Souter I, Neumann K, Ouzounian JG, Paul RH. The 24. Duryea EL, Casey BM, McIntire DD, Twickler DM. The FL/AC ratio for prediction
McRoberts’ maneuver for the alleviation of shoulder dystocia: how successful is it? of shoulder dystocia in women with gestational diabetes. J Matern Fetal Neonatal
Am J Obstet Gynecol 1997; 176: 656–661. Med 2017; 30: 2378–2381.

SUPPORTING INFORMATION ON THE INTERNET

The following supporting information may be found in the online version of this article:
Figure S1 Adjustment of estimated fetal weight to gestational age at delivery.
Figure S2 Performance of machine-learning algorithms to predict shoulder dystocia.

Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd. Ultrasound Obstet Gynecol 2020; 56: 588–596.

You might also like