You are on page 1of 10

Novel Machine Learning Approach for the Prediction

of Hernia Recurrence, Surgical Complication,


and 30-Day Readmission after Abdominal Wall
Reconstruction
Abbas M Hassan, MD, Sheng-Chieh Lu, PhD, Malke Asaad, MD, Jun, Liu, PhD,
Anaeze C Offodile II, MD, MPH, FACS, Chris Sidey-Gibbons, PhD, Charles E Butler, MD, FACS
Downloaded from http://journals.lww.com/journalacs by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdgGj2MwlZLeI= on 05/03/2022

BACKGROUND: Despite advancements in abdominal wall reconstruction (AWR) techniques, hernia recur-
rences (HRs), surgical site occurrences (SSOs), and unplanned hospital readmissions persist.
We sought to develop, validate, and evaluate machine learning (ML) algorithms for predicting
complications after AWR.
METHODS: We conducted a comprehensive review of patients who underwent AWR from March 2005
to June 2019. Nine supervised ML algorithms were developed to preoperatively predict HR,
SSOs, and 30-day readmission. Patient data were partitioned into training (80%) and testing
(20%) sets.
RESULTS: We identified 725 patients (52% women), with a mean age of 60 ± 11.5 years, mean body
mass index of 31 ± 7 kg/m2, and mean follow-up time of 42 ± 29 months. The HR rate was
12.8%, SSO rate was 30%, and 30-day readmission rate was 10.9%. ML models demon-
strated good discriminatory performance for predicting HR (area under the receiver operating
characteristic curve [AUC] 0.71), SSOs (AUC 0.75), and 30-day readmission (AUC 0.74).
ML models achieved mean accuracy rates of 85% (95% CI 80% to 90%), 72% (95% CI 64%
to 80%), and 84% (95% CI 77% to 90%) for predicting HR, SSOs, and 30-day readmission,
respectively. ML identified and characterized 4 unique significant predictors of HR, 12 of
SSOs, and 3 of 30-day readmission. Decision curve analysis demonstrated that ML models
have a superior net benefit regardless of the probability threshold.
CONCLUSIONS: ML algorithms trained on readily available preoperative clinical data accurately predicted
complications of AWR. Our findings support incorporating ML models into the preoperative
assessment of patients undergoing AWR to provide data-driven, patient-specific risk assess-
ment. (J Am Coll Surg 2022;234:918–927. © 2022 by the American College of Surgeons.
Published by Wolters Kluwer Health, Inc. All rights reserved.)

Ventral hernia repair is among the most common surgi- complications such as hernia recurrence (HR), surgi-
cal procedures, with more than 400,000 operations per- cal site occurrences (SSOs) persist.3,4 Previous studies
formed annually in the US alone.1,2 Despite advancements reported HR rates of up to 42%5,6 and overall com-
in abdominal wall reconstruction (AWR) techniques, plication rates of up to 44%.6,7 Unplanned hospital

CME questions for this article available at http:// Received October 27, 2021; Accepted January 13, 2022.
jacscme.facs.org Department of Plastic and Reconstructive Surgery (Hassan, Asaad, Liu,
Offodile, Butler); MD Anderson Center for INSPiRED Cancer Care,
Disclosure Information: Authors have nothing to disclose. Timothy J
Eberlein, Editor-in-Chief, has nothing to disclose. Ronald J Weigel, CME Department of Symptom Research (Lu, Sidey-Gibbons); and Institute for
Editor, has nothing to disclose. Cancer Care Innovation (Offodile), University of Texas MD Anderson
Cancer Center, Houston, TX.
Disclosures outside the scope of this work: Dr Butler is a consultant to
Allergan Inc. Dr Offodile receives research funding from Blue Cross Blue Correspondence address: Charles E Butler, MD, FACS, Department
Shield. Other authors have nothing to disclose. of Plastic Surgery, University of Texas MD Anderson Cancer Center,
1400 Pressler St, Unit 1488, Houston, TX 77030. email: cbutler@
Support: Dr Offodile receives research funding from the National Academy
of Medicine and honoraria from Indiana University, the Rising Tide mdanderson.org
Foundation for Clinical Cancer Research, and the University of Tennessee. Supplemental digital content is available for this article.

© 2022 by the American College of Surgeons. Published by Wolters Kluwer https://doi.org/10.1097/XCS.0000000000000141


Health, Inc. All rights reserved. 918 ISSN 1879-1190/22
Vol. 234,  No. 5,  May 2022 Hassan et al    Machine Learning for Ventral Hernia Repair 919

The field of surgery has been advanced and reshaped by


CONTINUING MEDICAL EDUCATION CREDIT
artificial intelligence.16-23 Machine learning (ML) is a branch
INFORMATION of artificial intelligence that focuses on creating intelligent
Accreditation: The American College of Surgeons machines that can learn relationships and patterns between
is accredited by the Accreditation Council for complex variables. ML can help identify which complex
Continuing Medical Education (ACCME) to combinations of patient and treatment characteristics make
­provide continuing medical education for physicians. a patient more or less likely to experience complications and
unplanned hospital readmissions. In this study, we devel-
AMA PRA Category 1 CreditsTM: The American oped, validated, and evaluated 9 distinct ML algorithms to
College of Surgeons designates this live activity predict HR, SSOs, and 30-day readmission after AWR. In
for a maximum of AMA PRA Category 1 CreditTM. addition, we compared the performance of ML algorithms
Physicians should claim only the credit commen- to that of conventional statistical methods in characterizing
surate with the extent of their participation in the predictors of surgical complications.
activity.
Of the AMA PRA Category 1 CreditsTM listed above,
METHODS
a maximum of 1 credits meet the requirement for
Self-Assessment. Study design
We conducted a retrospective cohort review assessing all
consecutive patients who underwent open repair of ventral
hernias, either in an isolated setting or as part of an onco-
logic resection, at the University of Texas MD Anderson
Cancer Center from March 1, 2005, through June 30,
2019, after approval from the IRB. The surgical technique
Abbreviations and Acronyms used in this study has previously been described.12,13,24,25
ALE = accumulated local effect All participating surgeons were board-certified plastic and
AUC = area under the receiver operating characteristic curve
reconstructive surgeons. Patient and surgical characteris-
AWR = abdominal wall reconstruction
HR = hernia recurrence tics, including sex, age, BMI, history of radiotherapy and/
ML = machine learning or chemotherapy, parastomal hernia, indication for repair,
OR = odds ratio earlier abdominal surgery, comorbidities (coronary artery
PFI = permutation feature importance disease, diabetes mellitus, hypertension, pulmonary dis-
SSO = surgical site occurrence ease, and/or renal disease), presence of rectus muscle vio-
lation, CDC wound classification, and use of component
separation and/or bridged repair, were recorded.
Tobacco use was defined as the use of any tobacco prod-
readmissions and reoperations secondary to these com- uct within 8 weeks of AWR. The term “bridged repair”
plications can result in higher rates of inpatient morbid- refers to the use of mesh to span a defect in the abdominal
ity and mortality, higher healthcare costs, and a lower wall without approximating the fascial edges. Component
quality of life.8 After AWR, 30-day readmission and separation was defined as anterior component separation
reoperation rates have been reported to be as high as 13% with external oblique aponeurotic release. Rectus muscle
and 17%, respectively.9,10 violation was defined as 1 or more of the following: an
With careful patient selection and preoperative optimi- existing or new ostomy, gastrostomy/jejunostomy tube
zation, many complications of AWR can be avoided.11-14 placement, transversely divided rectus abdominis muscle,
Identifying patients at risk of poor outcomes and char- and/or resected rectus abdominis muscle. Obesity was
acterizing patterns and interactions between risk factors defined as a BMI greater than 30 kg/m2.
and complications could lead to individualized patient
counseling and shared decision-making. However, accu-
rately predicting which patients are at risk of experiencing Surgical outcomes
complications is inherently difficult because of the com- Surgical outcomes included HR, SSOs, and 30-day read-
plex interactions among illness severity, patient charac- mission. HR was defined as a contour abnormality with an
teristics, and treatment-related factors. These interactions associated fascial defect. HR in this population was detected
can be nonlinear and have elaborate interactions and inter- through physical examination and/or abdominal imaging,
dependencies that can not easily be modeled using tradi- such as computed tomography or magnetic resonance
tional statistical techniques.15 imaging. SSOs were defined as the postoperative presence
920 Hassan et al    Machine Learning for Ventral Hernia Repair J Am Coll Surg

of at least one of the following complications: infection, to summarize a test’s overall diagnostic accuracy. It has a
seroma/hematoma, wound dehiscence, or enterocutane- value from 0 to 1, with 0 indicating a completely inaccu-
ous fistula. Seroma or hematoma was defined as serous or rate test and 1 indicating a perfectly accurate test.36 AUC
blood fluid collection, respectively, that required drainage. was calculated using the trapezoidal rule.37 All sequential
Wound dehiscence was defined as a full-thickness break- analyses were performed using R open source software,
down of the skin that extended greater than 2 cm. version 4.0.2 (R Foundation for Statistical Computing).

Statistical analyses
ML model preparation
The dataset was randomly divided into training (80%) Univariate and multiple logistic regression models were
and testing (20%) sets. To overcome class imbalance (ie a used to estimate the odds ratios (ORs) and hazards ratios
low number of events in 1 class), we used an oversampling of predictors of surgical complications. The stepwise model
strategy that has previously been shown to increase the per- selection method was used to build the most parsimonious
formance of ML models trained on unbalanced datasets.26 multivariable model. Two-tailed values of p < 0.05 were
Using Bayesian optimization, we selected the optimal considered significant. The analyses were performed in SAS
hyperparameters for each algorithm. To allow reproduc- 9.4 (SAS Institute Inc) and R open source software.
ible reporting, supervised ML models were used to pre-
dict surgical outcomes, and the findings were reported RESULTS
using the Transparent Reporting of a Multivariable Patient demographics
Prediction Model for Individual Prognosis Or Diagnosis We identified 725 patients (52% women) who met the study
(TRIPOD).27 We used a decision curve analysis to deter- criteria. The mean age was 60 ± 11.5 years, mean BMI was
mine the net benefit of correctly predicting complications 31 ± 7 kg/m2, and mean follow-up time was 42 ± 29 months.
(true positives) vs the relative harm of incorrectly labeling The operations in the study cohort were performed by 68
patients as having complications (false positives).28,29 surgeons at our institution who had extensive experience in
complex AWR. Tables summarizing patient demographics,
ML model development surgical characteristics, and surgical outcomes can be found
in the Supplemental Digital Content (http://links.lww.
We used 9 supervised ML algorithms, ranging from sim-
com/XCS/A53). Our observed HR rate was 12.8%, SSO
ple linear to nonlinear to complex nonlinear methods, that
rate was 30%, and 30-day readmission rate was 10.9%.
have been shown to be successful in predicting medical out-
Patients were randomly assigned to a training set (n = 580,
comes18,30-33: support vector machine, decision tree, gen-
80%) or a testing (validation) set (n = 145, 20%).
eralized linear model, multiple adaptive regression splines,
K-nearest neighbor, single hidden layer artificial neural
network, random forest, and extreme gradient boosting. In ML prediction performance
addition, we created a voting ensemble algorithm that uses
The performance of the ML algorithms for predicting HR,
the majority rule to make predictions based on the predic-
SSOs, and 30-day readmission is shown in the receiver
tions of the these algorithms. Detailed description of the
operating characteristic curves in Figure  1. ML models
models used in this study can be found in the Supplemental
achieved mean accuracy rates of 85% (95% CI 80% to
Digital Content (http://links.lww.com/XCS/A53).
90%), 72% (95% CI 64% to 80%), and 84% (95% CI
77% to 90%) for predicting HR, SSOs, and 30-day read-
ML model validation and interpretation mission, respectively. A comprehensive list of accuracy,
We used a permutation feature importance (PFI) analy- sensitivity, specificity, and positive and negative predictive
sis, which employs a permutation method to compute a values for each model can be found in Tables 1, 2, and 3.
feature contribution coefficient in units of the decrease in The generalized linear model performed the best for predict-
the model’s performance, to examine significant predic- ing HR, with an AUC of 0.71. The support vector machine
tors chosen by our models.34 After identifying the top pre- model performed the best for predicting SSOs, with an AUC
dictors for each model, we used accumulated local effect of 0.75. The multiple adaptive regression splines model per-
(ALE) analysis to understand and visualize the effects of formed the best for predicting 30-day readmission, with an
the predictor variables on the predicted outcomes.35 Using AUC of 0.74. Figure 2 shows decision curve analysis curves
3 rounds of 10-fold cross-validation, we used the area that demonstrate the clinical usefulness of the ML models in
under the receiver operating characteristic curve (AUC) as predicting complications in the validation dataset. Regardless
the performance parameter for optimal model selection. of the threshold used, the net benefit of the ML models was
AUC (also known as the c-statistic) is an effective way greater than that of the reference models.
Vol. 234,  No. 5,  May 2022 Hassan et al    Machine Learning for Ventral Hernia Repair 921

Figure 1.  Receiver operating characteristic curves for prediction of (A) hernia recurrence, (B) surgical site occurrence, and (C) 30-day read-
mission. The generalized linear model achieved the highest area under the receiver operating characteristic curve (AUC) for predicting hernia
recurrence (0.71). The support vector machine model achieved the highest AUC for predicting surgical site occurrences (0.75). Multiple
adaptive regression splines model achieved the highest AUC for predicting 30-day readmission (0.74). glm, generalized linear model; mars,
multiple adaptive regression splines; dt, decision tree; knn, k-nearest neighbor; svm, support vector machine; rf, random forest; xgb, extreme
gradient boosting; nnet, single hidden layer artificial neural network; vote, voting ensemble.

Table 1.  Machine Learning Model Performance for Predicting Hernia Recurrence
Model GLM MARS DT KNN SVM RF XGB NNET VOTE
Accuracy 66 (59–72) 62 (55–68) 46 (40–53) 59 (51–65) 78 (71–83) 85 (80–90) 64 (57–71) 74 (68–80) 73 (67–79)
Sensitivity 79 (73–85) 63 (56–69) 63 (56–69) 29 (23–36) 04 (01–07) 04 (01–07) 54 (47–61) 25 (19–31) 13 (08–17)
Specificity 64 (45–84) 62 (42–81) 44 (24–64) 62 (43–82) 87 (73–100) 95 (87–100) 64 (47–85) 82 (65–97) 81 (67–97)
PPV 22 (15–29) 17 (10–23) 12 (06–19) 09 (04–13) 04 (01–06) 10 (06–14) 18 (01–23) 14 (09–19) 07 (04–12)
NPV 96 (92–100) 93 (88–98) 91 (85–96) 88 (08–95) 88 (76–100) 88 (70–100) 93 (86–98) 88 (81–99) 88 (78–99)
AUC 71 (61–82) 62 (47–76) 54 (41–67) 47 (37–57) 57 (47–67) 54 (44–64) 67 (56–77) 60 (49–71) 54 (42–66)
Data presented as percentage (95% CI)
AUC, area under the receiver operating characteristic curve; DT, decision tree; GLM, generalized linear model; KNN, k-nearest neighbor; MARS, multiple adaptive regression
splines; NNET, single hidden layer artificial neural network; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; SVM, support vector machine;
VOTE, voting ensemble; XGB, extreme gradient boosting.

Table 2.  Machine Learning Model Performance for Surgical Site Occurrence
Model GLM MARS DT KNN SVM RF XGB NNET VOTE
Accuracy 68 (59–75) 61 (52–69) 53 (45–61) 60 (52–68) 72 (64–80) 72 (64–80) 64 (56–72) 66 (57–73) 65 (56–73)
Sensitivity 64 (54–73) 41 (31–50) 75 (67–83) 47 (38–57) 39 (29–48) 54 (45–64) 70 (62–79) 59 (50–69) 57 (47–66)
Specificity 69 (56–83) 69 (56–83) 44 (29–58) 65 (51–79) 87 (77–97) 80 (68–92) 61 (47–76) 68 (55–82) 68 (55–82)
PPV 47 (37–58) 37 (27–46) 37 (24–49) 38 (27–48) 57 (48–66) 54 (45–64) 83 (33–56) 45 (34–55) 44 (33–54)
NPV 81 (71–91) 73 (60–85) 80 (72–88) 74 (63–86) 77 (61–92) 80 (68–92) 83 (74–92) 79 (69–90) 78 (70–88)
AUC 68 (59–78) 65 (56–74) 68 (59–78) 65 (55–74) 75 (66–84) 72 (63–81) 67 (58–77) 69 (59–78) 68 (59–78)
Data presented as percentage (95% CI)
AUC, area under the receiver operating characteristic curve; DT, decision tree; GLM, generalized linear model; KNN, k-nearest neighbor; MARS, multiple adaptive regression
splines; NNET, single hidden layer artificial neural network; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; SVM, support vector machine;
VOTE, voting ensemble; XGB, extreme gradient boosting.

Predictive/protective factors for HR ratio 2.40, 95% CI 1.46 to 3.94, p < 0.001), and bridged
Multivariable Cox proportional hazards regression analysis repair (hazard ratio 3.01, 95% CI 1.71 to 5.30, p < 0.001)
identified the presence of rectus muscle violation (hazard as independent predictors of HR. Component separation
ratio 1.8, 95% CI 1.09 to 2.98, p = 0.021), obesity (hazard was found to be protective against HR (hazard ratio 0.50,
922 Hassan et al    Machine Learning for Ventral Hernia Repair J Am Coll Surg

Table 3.  Machine Learning Model Performance for 30-Day Readmission


Model GLM MARS DT KNN SVM RF XGB NNET VOTE
Accuracy 71 (63–78) 72 (64–80) 72 (64–79) 59 (51–67) 78 (55–71) 84 (77–90) 68 (59–75) 71 (64–79) 66 (58–74)
Sensitivity 45 (36–54) 75 (52–92) 75 (67–83) 50 (41–59) 20 (13–27) 20 (13–27) 50 (41–59) 30 (22–38) 55 (46–64)
Specificity 75 (56–94) 72 (52–92) 71 (51–91) 61 (39–82) 87 (73–100) 94 (84–100) 70 (50–90) 78 (60–96) 68 (48–88)
PPV 23 (15–30) 30 (21–39) 29 (20–39) 17 (09–25) 20 (13–27) 36 (28–45) 21 (13–29) 18 (11–25) 22 (13–30)
NPV 90 (80–99) 95 (74–100) 95 (89–100) 83 (80–97) 87 (73–100) 88 (69–100) 90 (81–98) 88 (76–99) 90 (82–99)
AUC 64 (51–76) 74 (64–85) 73 (63–84) 59 (47–72) 62 (49–76) 66 (50–82) 62 (47–76) 57 (45–68) 60 (48–72)
Data presented as percentage (95% CI)
AUC, area under the receiver operating characteristic curve; DT, decision tree; GLM, generalized linear model; KNN, k-nearest neighbor; MARS, multiple adaptive regression
splines; NNET, single hidden layer artificial neural network; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; SVM, support vector machine;
VOTE, voting ensemble; XGB, extreme gradient boosting.

Figure 2.  Decision curve analysis. The decision curves show the clinical usefulness of the machine learning (ML) models in predicting (A)
hernia recurrence (HR), (B) surgical site occurrence (SSO), and (C) 30-day readmission in the validation dataset. The results are presented in
the form of a graph, with the selected probability threshold plotted against the net benefit of the evaluated model. The blue line represents
the assumption that all patients have a complication (ie “treat everyone”), and the black line represents the assumption that no patients
have complications (ie “treat no one”). The purple lines depict the net benefit of using ML models to predict who will have HR, SSOs, and
30-day readmission. Regardless of the threshold used, the net benefit of the ML models was greater than that of the reference models.

Figure 3.  Feature importance analysis showing the impact of various factors in predicting (A) surgical site occurrence, (B) hernia recurrence,
and (C) 30-day readmission. Higher values on the x-axis indicate higher variable importance. AUC, area under the receiver operating charac-
teristic curve; AWR, abdominal wall reconstruction; GI, gastrointestinal; GU, genitourinary.
Vol. 234,  No. 5,  May 2022 Hassan et al    Machine Learning for Ventral Hernia Repair 923

95% CI 0.32 to 0.80, p = 0.003). The model achieved an to 13.30; p < 0.0001), diabetes mellitus (OR 1.55, 95%
AUC of 0.65 (95% CI 0.58 to 0.71). CI 1.03 to 2.32, p = 0.034), pulmonary disease (OR 1.82,
In PFI and ALE analyses, obesity, component separa- 95% CI 1.06 to 3.11, p = 0.029), and bridged repair (OR
tion, bridged repair, and rectus muscle violation increased 2.11, 95% CI 1.26 to 3.53, p = 0.004) as independent
the accuracy of ML model the most for prediction of HR. predictors of SSOs. The model achieved an AUC of 0.68
Figure  3 shows the impact of each of these variables in (95% CI 0.64 to 0.72).
predicting HR. In PFI and ALE analyses, obesity, BMI, wound clas-
sification, number of earlier abdominal surgeries, AWR
indication, diabetes mellitus, preoperative chemotherapy,
Predictive/protective factors for SSOs component separation, sex, rectus muscle violation, earlier
A multivariable logistic regression model for SSOs identi- hernia repair, and pulmonary disease increased the accu-
fied obesity (OR 1.05, 95% CI 1.03 to 1.08, p < 0.001), racy of the ML model the most for prediction of SSOs.
CDC wound class III (OR 2.92, 95% CI 1.64 to 5.21, p Figure 3 shows the impact of each of these variables in pre-
= 0.0003), CDC wound class IV (OR 5.90, 95% CI 2.62 dicting SSOs. The average effects of the most frequently

Figure 4.  Accumulated local effect plot of predictive/protective variables for surgical site occurrence. Accumulated local effect plots visu-
alize the effect of each of the variables on surgical site occurrence. Higher values on the y-axis indicate a higher risk of surgical site occur-
rence. GI, gastrointestinal; GU, genitourinary.
924 Hassan et al    Machine Learning for Ventral Hernia Repair J Am Coll Surg

impactful variables on prediction of SSOs are depicted in could preclude specific surgical interventions and encour-
Figure 4. ages taking steps such as prehabilitation or nutrition
optimization to maximize a patient’s status before AWR.
In addition, complex AWR is associated with significant
Predictive/protective factors for 30-day readmission
morbidity and mortality40; therefore, identifying patients
A multivariable logistic regression model identified at risk of developing complications and educating them
CDC wound class IV (OR 4.94, 95% CI 1.87 to 13.11, during presurgical evaluation is critical.
p < 0.0001) as an independent predictor of 30-day read- Historically, preoperative evaluation focused on identi-
mission. The model achieved an AUC of 0.61 (95% CI fying risk factors through a thorough patient history and
0.54 to 0.67). physical examination. The present study shows the feasi-
In PFI and ALE analyses, violated gastrointestinal/gen- bility and promise of using 9 distinct ML algorithms for
itourinary tracts, number of earlier abdominal surgeries, predicting complications in patients undergoing complex
and AWR indication increased the accuracy of ML model AWR. These models were developed using readily available
the most for prediction of 30-day readmission. Figure  3 clinical data from a single institution to provide accurate
shows the impact of each of these variables in predicting predictions individualized for our patient population. The
30-day readmission. use of ML will thus allow surgeons to make informed deci-
sions. For instance, patients whom ML identifies as at high
risk of developing surgical complications may choose a dif-
DISCUSSION ferent technique (eg undergoing component separation)
To our knowledge, the current study is the first to describe or undergo preoperative optimization (eg weight loss) to
the use of ML to predict postoperative complications of improve long-term outcomes of AWR. After risk factors are
AWR. We demonstrated that ML algorithms trained on adjusted, ML can be used again to provide an updated risk
readily available preoperative clinical data could accurately assessment, allowing providers and patients to determine
predict the occurrence of HR, SSOs, and 30-day readmis- real-time risk and whether a patient’s condition is opti-
sion after complex AWR. Evaluation of the algorithms using mized for surgery.1,40,41 This tool can be incorporated to
PFI and ALE analyses identified unique significant predic- augment the presurgical evaluation of patients to identify
tors of HR (n = 4), SSOs (n = 12), and 30-day readmis- risk factors and predictors of outcomes, thereby assisting in
sion (n = 3). ML models outperformed multivariate logistic shared surgical decision-making, patient and family coun-
regression in predicting HR (AUC, 0.71 vs 0.65), SSOs seling, resource allocation, and quality improvement.42
(AUC, 0.75 vs 0.68), and 30-day readmission (AUC, 0.74 Several risk stratification tools have been developed using
vs 0.61). ML models achieved mean accuracy rates of 85% hernia-specific and more general data to identify patient and
(95% CI 80% to 90%), 72% (95% CI 64% to 80%), and operative factors that predict poor AWR outcomes. Some of
84% (95% CI 77% to 90%) for predicting HR, SSOs, and the most commonly used tools include the Ventral Hernia
30-day readmission, respectively. Furthermore, the decision Working Group guidelines, Ventral Hernia Risk Score,
curve analysis demonstrated that the net benefit of the ML Hernia Wound Risk Assessment Tool, and the American
model was greater than reference models across the clinical College of Surgeons NSQIP surgical risk calculator.3,43-45
threshold range. These findings suggest that the ML predic- These tools, although valuable, most often use a logistic or
tion would more accurately identify patients who develop linear regression model to predict outcomes. These models
complications (true positives) while taking the tradeoff presume that the variables in the model interact in a linear
with false positives into consideration. These results sup- and additive manner; however, the interactions of patient
port incorporating these models into the electronic medical demographics, comorbidities, and surgical factors can be
record using patient-specific data to aid in individualized complex and nonlinear and therefore difficult to adequately
patient counseling and shared decision-making. model using traditional statistical techniques ,which presume
Complex AWR can be one of the more challenging a linear relationship between predictors and outcomes.15 In
operations that surgeons perform. With more than 4 mil- comparison with these traditional linear techniques, ML is
lion laparotomies performed annually in the US alone, more capable of modeling these interactions and identify-
the demand for AWR is growing.38 Despite advancements ing nonlinear patterns in complex data.15,46-48 We demon-
in surgical techniques, complication rates remain far from strated the limitation of traditional statistical methods in
acceptable. As a result, surgeons are changing their empha- our study; although a powerful tool, multivariate regression
sis away from surgical techniques and toward optimizing analysis demonstrated lower AUCs when compared with
patient-related factors.39 Preoperative risk evaluation of ML models. Furthermore, unlike conventional statistical
complex abdominal wall defects may identify factors that methods, ML can continuously improve with the inclusion
Vol. 234,  No. 5,  May 2022 Hassan et al    Machine Learning for Ventral Hernia Repair 925

of new data through a process known as incremental learn- ML models. The prediction accuracy of the models varied
ing.49,50 These findings encourage the use of ML to provide for each evaluated outcome, despite the use of the same
individualized risk prediction based on complex nonlinear cohort. This emphasizes the importance of evaluating
interactions between patient-specific risk factors. multiple models to achieve optimal accuracy for outcome
Although it is critical to understand and visualize the prediction. Furthermore, combining different algorithms
effects of predictor variables on HR, SSOs, and readmis- into a voting ensemble improved the prediction of com-
sion rates, a limitation of certain supervised ML models is a plications, reinforcing the concept that different ML algo-
lack of interpretability or transparency.35 This is consistent rithms identify different patterns within the same patient
with the primary goal of ML, which is to create accurate cohort and that combining these algorithms into a voting
predictions. Nonetheless, after performing PFI and ALE ensemble improves pattern identification.
analyses on the top predictors, we identified the impact Previous studies have shown that ML models are better
of each of these variables in predicting complications. at ruling in rather than ruling out complications.18,19,54,55
For instance, patients who underwent bridged repair and This is supported by the relatively low sensitivity of our
those with rectus muscle violation were deemed at high models compared with their high specificity and predictive
risk of HR, but component separation was protective. The values. Therefore, patients deemed at high risk of devel-
ability of ML to identify predictors of complications and oping an infection by these models are very likely to do
provide an individualized prediction based on patients’ so. However, patients deemed at low risk of developing
unique comorbidities and risk factors is a valuable asset an infection may still develop one. The low event rate in
that should be leveraged. our patient cohort could potentially explain the relatively
Crucially, our ML models were not only accurate across low sensitivity of our models. Nevertheless, these findings
the whole sample but also specific when identifying patients suggest that our ML models can identify patients who
who did experience a complication. However, the models are at high risk of experiencing complications. Surgeons
did not perform perfectly, and every individual algorithm can integrate this information into individualized patient
produced a different tradeoff profile when making predic- counseling, preoperative optimization, and surgical plan-
tions. There was an evident trend of high specificity and ning to decrease the risk of complications.
overall accuracy for all algorithms, but sensitivity and posi- This study has several strengths, including using a
tive predictive value differed between the models we evalu- prospectively maintained patient database and a 14-year
ated. These findings suggest that when these algorithms are cumulative experience at a major tertiary cancer center to
implemented, a decision must be made about the optimal develop the first ML model to predict outcomes of AWR
performance metrics for the practice concerned. using preoperative clinical data. Our data were analyzed
Although algorithm performance was generally strong, using 9 distinct ML algorithms and multivariate regression
the evident variation in some performance metrics analysis. Limitations of this study that warrant mention
demonstrates the importance of continued efforts to make include the single-institution, retrospective design, which
more diverse and high-quality data available as part of the introduces unmeasured confounding. Our models did not
electronic health record. Other data sources, including include other techniques such as transversus abdominis
patient-reported outcomes and device data, may provide release or robotic-assisted hernia repair. Future prospective
useful information for future iterations of these models, multicenter studies should include these variables to con-
improving performance beyond the standard demon- struct more widely functioning models to improve their
strated in our experiment accuracy and prediction of complications. Additionally,
ML has the potential to aid in healthcare delivery, includ- given the complexity of our patient population, these
ing screening, diagnosis, outcome prediction, and deci- results may not be generalizable to other centers. However,
sion-making.51-53 Studies evaluating the use of ML in the the operative cases included in this study were performed
field of surgery assessed applications in diagnosis, preoper- by 68 surgeons at our institution, potentially improving
ative planning, and outcome prediction.16-23 Most studies the broad applicability of our models. Although our mod-
evaluated ML models in large patient cohorts. For exam- els achieved accuracy comparable with or better than that
ple, the Predictive opTimal Trees in Emergency Surgery reported in the literature for similar patient populations,
(POTTER) risk calculator is based on a decision tree they are far from perfect and could be improved by inte-
ML model developed from 382,960 patients.42 Although grating more clinically relevant variables into the predic-
POTTER achieved excellent results, the development of tion model. Future prospective, multicenter studies could
such large models in other fields and institutions may not incorporate more variables to develop more broadly func-
be feasible. Using data from only 725 patients, we were tional models to increase the accuracy and improve the
able to achieve accurate predictions using several different prediction of complications.
926 Hassan et al    Machine Learning for Ventral Hernia Repair J Am Coll Surg

CONCLUSIONS 12. Baumann DP, Butler CE. Bioprosthetic mesh in abdominal wall
ML algorithms trained on readily available preoperative reconstruction. Semin Plast Surg 2012;26:18–24.
13. Booth JH, Garvey PB, Baumann DP, et al. Primary fascial
clinical data could accurately predict the occurrence of
closure with mesh reinforcement is superior to bridged mesh
HR, SSOs, and 30-day readmission after AWR. Our find- repair for abdominal wall reconstruction. J Am Coll Surg
ings support incorporating ML models into the preoper- 2013;217:999–1009.
ative assessment of patients undergoing AWR to provide 14. Butler CE, Campbell KT. Minimally invasive component
data-driven, patient-specific risk assessment. separation with inlay bioprosthetic mesh (MICSIB) for com-
plex abdominal wall reconstruction. Plast Reconstr Surg
2011;128:698–709.
Author Contributions 15. Chen JH, Asch SM. Machine learning and prediction in med-
Study conception and design: Asaad, Butler, Hassan icine - beyond the peak of inflated expectations. N Engl J Med
Acquisition of data: Asaad, Hassan, Gibbons, Chieh 2017;376:2507–2509.
16. Cirillo MD, Mirdell R, Sjöberg F, et al. Time-independent
Analysis and interpretation of data: Chieh, Butler, Hassan, prediction of burn depth using deep convolutional neural net-
Offodile, Gibbons works. J Burn Care Res 2019;40:857–863.
Drafting of manuscript: Asaad, Hassan, Butler, Offodile 17. Angullia F, Fright WR, Richards R, et al. A novel RBF-based
Critical revision: Butler, Offodile, Gibbons predictive tool for facial distraction surgery in growing children
with syndromic craniosynostosis. Int J Comput Assist Radiol
Surg 2020;15:351–367.
REFERENCES 18. Formeister EJ, Baum R, Knott PD, et al. Machine learning for
1. Poulose BK, Shelton J, Phillips S, et al. Epidemiology and cost predicting complications in head and neck microvascular free
of ventral hernia repair: making the case for hernia research. tissue transfer. Laryngoscope 2020;130:E843–E849.
Hernia 2012;16:179–183. 19. Kuo PJ, Wu SC, Chien PC, et al. Artificial neural network
2. Asaad M, Kapur SK, Baumann DP, et al. Acellular dermal approach to predict surgical site infection after free-flap recon-
matrix provides durable long-term outcomes in abdominal wall struction in patients receiving surgery for head and neck cancer.
reconstruction: a study of patients with over 60 months of fol- Oncotarget 2018;9:13768–13782.
low-up. Ann Surg 2020. Epub ahead of print. 20. Yang CQ, Gardiner L, Wang H, et al. Creating prognostic sys-
3. Bernardi K, Adrales GL, Hope WW, et al; Ventral Hernia tems for well-differentiated thyroid cancer using machine learn-
Outcomes Collaborative Writing Group. Abdominal wall ing. Front Endocrinol (Lausanne) 2019;10:288.
reconstruction risk stratification tools: a systematic review of 21. Fujima N, Shimizu Y, Yoshida D, et al. Machine-learning-based
the literature. Plast Reconstr Surg 2018;142(3 suppl):9S–20S. prediction of treatment outcomes using MR imaging-derived quan-
4. Garvey PB, Giordano SA, Baumann DP, et al. Long-term out- titative tumor information in patients with sinonasal squamous cell
comes after abdominal wall reconstruction with acellular dermal carcinomas: a preliminary study. Cancers (Basel) 2019;11:E800.
matrix. J Am Coll Surg 2017;224:341–350. 22. Bur AM, Holcomb A, Goodwin S, et al. Machine learning to
5. Shestak KC, Edington HJ, Johnson RR. The separation of ana- predict occult nodal metastasis in early oral squamous cell carci-
tomic components technique for the reconstruction of massive noma. Oral Oncol 2019;92:20–25.
midline abdominal wall defects: anatomy, surgical technique, 23. Karadaghy OA, Shew M, New J, et al. Development and assess-
applications, and limitations revisited. Plast Reconstr Surg ment of a machine learning model to help predict survival
2000;105:731–738; quiz 739. among patients with oral squamous cell carcinoma. JAMA
6. Mathes SJ, Steinwald PM, Foster RD, et al. Complex abdomi- Otolaryngol Head Neck Surg 2019;145:1115–1120.
nal wall reconstruction: a comparison of flap and mesh closure. 24. Breuing K, Butler CE, Ferzoco S, et al. Incisional ven-
Ann Surg 2000;232:586–596. tral hernias: review of the literature and recommendations
7. Williams JK, Carlson GW, deChalain T, et al. Role of tensor regarding the grading and technique of repair. Surgery
fasciae latae in abdominal wall reconstruction. Plast Reconstr 2010;148:544–558.
Surg 1998;101:713–718. 25. Rastegarpour A, Cheung M, Vardhan M, et al. Surgical mesh
8. Jencks SF, Brock JE. Hospital accountability and population for ventral incisional hernia repairs: understanding mesh design.
health: lessons from measuring readmission rates. Ann Intern Plast Surg (Oakv) 2016;24:41–50.
Med 2013;159:629–630. 26. Hassan AM, Biaggi AP, Malke A, et al. Development and assess-
9. Gogna S, Latifi R, Choi J, et al. Predictors of 30- and 90-day ment of machine learning models for individualized risk assess-
readmissions after complex abdominal wall reconstruction with ment of mastectomy skin flap necrosis. Ann Surg 2022. Epub
biological mesh: a longitudinal study of 232 patients. World J ahead of print.
Surg 2020;44:3720–3728. 27. Collins GS, Reitsma JB, Altman DG, et al. Transparent

10. Buell JF, Sigmon D, Ducoin C, et al. Initial experience Reporting of a Multivariable Prediction Model for Individual
with biologic polymer scaffold (poly-4-hydroxybuturate) Prognosis Or Diagnosis (TRIPOD): the TRIPOD statement.
in complex abdominal wall reconstruction. Ann Surg BMC Med 2015;13:1.
2017;266:185–188. 28. Vickers AJ, Elkin EB. Decision curve analysis: a novel
11. Khansa I, Janis JE. The 4 principles of complex abdominal wall method for evaluating prediction models. Med Decis Making
reconstruction. Plast Reconstr Surg Glob Open 2019;7:e2549. 2006;26:565–574.
Vol. 234,  No. 5,  May 2022 Hassan et al    Machine Learning for Ventral Hernia Repair 927

29. Van Calster B, Vickers AJ. Calibration of risk prediction models: Emergency Surgery Risk (POTTER) calculator. Ann Surg
impact on decision-analytic performance. Med Decis Making 2018;268:574–583.
2015;35:162–169. 43. Nelson JA, Fischer J, Chung CC, et al. Readmission following
30. Pfob A, Sidey-Gibbons C, Lee HB, et al. Identification of ventral hernia repair: a model derived from the ACS-NSQIP
breast cancer patients with pathologic complete response datasets. Hernia 2015;19:125–133.
in the breast after neoadjuvant systemic treatment by 44. Berger RL, Li LT, Hicks SC, et al. Development and validation
an intelligent vacuum-assisted biopsy. Eur J Cancer of a risk-stratification score for surgical site occurrence and sur-
2021;143:134–146. gical site infection after open ventral hernia repair. J Am Coll
31. Parikh RB, Manz C, Chivers C, et al. Machine learning Surg 2013;217:974–982.
approaches to predict 6-month mortality among patients with 45. Fischer JP, Wink JD, Tuggle CT, et al. Wound risk assessment
cancer. JAMA Netw Open 2019;2:e1915997. in ventral hernia repair: generation and internal validation
32. Li C, Zhang S, Zhang H, et al. Using the K-nearest neigh- of a risk stratification system using the ACS-NSQIP. Hernia
bor algorithm for the classification of lymph node metas- 2015;19:103–111.
tasis in gastric cancer. Comput Math Methods Med 46. Tang ZH, Liu J, Zeng F, et al. Comparison of prediction model for
2012;2012:876545. cardiovascular autonomic dysfunction using artificial neural net-
33. Menon R, Bhat G, Saade GR, et al. Multivariate adap- work and logistic regression analysis. PLoS One 2013;8:e70571.
tive regression splines analysis to predict biomarkers of 47. Jaimes F, Farbiarz J, Alvarez D, et al. Comparison between
spontaneous preterm birth. Acta Obstet Gynecol Scand logistic regression and neural networks to predict death in
2014;93:382–391. patients with suspected sepsis in the emergency room. Crit Care
34. Orlenko A, Moore JH. A comparison of methods for interpret- 2005;9:R150–R156.
ing random forest models of genetic association in the presence 48. Tu JV. Advantages and disadvantages of using artificial neural
of non-additive interactions. BioData Min 2021;14:9. networks versus logistic regression for predicting medical out-
35. Apley DW, Zhu J. Visualizing the effects of predictor variables comes. J Clin Epidemiol 1996;49:1225–1231.
in black box supervised learning models. J R Stat Soc B Stat 49. Gepperth A, Hammer B. Incremental learning algorithms and
Methodol 2020;82:1059–1086. applications. Paper presented at: European symposium on arti-
36. Cook NR. Statistical evaluation of prognostic versus ficial neural networks (ESANN). 2016; Bruges, Belgium; April
diagnostic models: beyond the ROC curve. Clin Chem 27–29, 2016.
2008;54:17–23. 50. Silver DL. Machine lifelong learning: challenges and benefits
37. Rosner B. Fundamentals of Biostatistics. Cengage Learning; for artificial general intelligence. Artificial General Intelligence
2010. (AGI) 2011; Mountain View, CA.
38. Heller L, Chike-Obi C, Xue AS. Abdominal wall reconstruc- 51. Liu X, Faes L, Kale AU, et al. A comparison of deep learning
tion with mesh and components separation. Semin Plast Surg performance against health-care professionals in detecting dis-
2012;26:29–35. eases from medical imaging: a systematic review and meta-anal-
39. Trujillo CN, Fowler A, Al-Temimi MH, et al. Complex ysis. Lancet Digit Health 2019;1:e271–e297.
ventral hernias: a review of past to present. Perm J. 52. Thomsen K, Iversen L, Titlestad TL, et al. Systematic review of
2018;22:17–015. machine learning for diagnosis and prognosis in dermatology. J
40. Holihan JL, Alawadi Z, Martindale RG, et al. Adverse events Dermatolog Treat 2020;31:496–510.
after ventral hernia repair: the vicious cycle of complications. J 53. Senders JT, Staples PC, Karhade AV, et al. Machine learning
Am Coll Surg 2015;221:478–485. and neurosurgical outcome prediction: a systematic review.
41. Kao AM, Arnold MR, Augenstein VA, et al. Prevention and World Neurosurg 2018;109:476–486.e1.
treatment strategies for mesh infection in abdominal wall recon- 54. Trevethan R. Sensitivity, specificity, and predictive values: foun-
struction. Plast Reconstr Surg 2018;142(3 suppl):149S–155S. dations, pliabilities, and pitfalls in research and practice. Front
42. Bertsimas D, Dunn J, Velmahos GC, et al. Surgical risk is Public Health 2017;5:307.
not linear: derivation and validation of a novel, user-friendly, 55. Mallett S, Halligan S, Thompson M, et al. Interpreting diagnos-
and machine-learning-based Predictive OpTimal Trees in tic accuracy studies for patient care. BMJ 2012;345:e3999.

You might also like