Professional Documents
Culture Documents
BACKGROUND: Bedside airway evaluation is conduced before anesthesia, but all current
methods perform modestly, with low sensitivity and positive predictive value. We hypothesized
Downloaded from http://journals.lww.com/anesthesia-analgesia by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsI
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 07/18/2023
thesia care unit. Suitable patients were recruited postop- that this higher-dimensional model is related directly to
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 07/18/2023
eratively when adequately recovered from the effects of the underlying 50 eigenfaces by a straightforward linear
anesthesia. Photographs were obtained either in the transformation.14
postanesthesia care unit or on the ward on the first
postoperative day. Patients who had undergone head or Statistical Analysis
neck surgery were excluded. Patients in whom central Model Derivation
venous catheters or other interventions prevented full The task of recognizing the difficult intubation can be
view of the features of the face in frontal and profile conceived as the task of deriving an algorithm capable of
views were excluded. Patients who were neither easy nor separating the points representing the cohort of easy pa-
difficult to intubate by our criteria were not recruited. All tients from those representing the cohort of difficult pa-
patients meeting entry criteria were recruited until the tients within a defined variable space. Patients who were
recruitment goals were reached. Patients were informed easy to intubate were assigned a classification value of zero,
of their right to not participate in the study, but none difficult to intubate patients a value of 1. All 61 variables
refused. and the physical properties of MP test and TMD were
subjected to variable reduction by univariate analysis at
P ⱕ 0.1 (2 distribution, G2 goodness-of-fit statistic).15
Data Acquisition Those variables that individually showed a statistical trend
Eighty patients were recruited by cohorts. Forty patients
in discrimination between easy and difficult airways are
were used as the model derivation set and the other 40
shown with an asterisk in Table 1 and were used as a subset
patients were used as the model validation set. Each set
to derive the predictive model. Binomial logistic regression
was composed of 20 easy to intubate and 20 difficult to
to segregate the easy to intubate and difficult to intubate
intubate subjects. Digital photographs of the head and neck
cohorts16 was performed exhaustively on all possible vari-
of each patient in frontal view and in left and right profiles
able combinations of this reduced subset of variables using
were obtained. Patient demographics (height, weight, age,
a quadratic logit.17
gender, type of surgery), MP class, TMD (in finger-
The mathematical theory underlying the choice of the
breadths), and the details regarding ease of intubation were
quadratic logit function is described in Appendix A.
obtained from the anesthetic record. Any data found to be
Briefly, the quadratic form of the logit uses both the value
absent from the record were collected by the authors at the
of an input variable and its square. We chose this model
time of patient enrollment. Preoperative assessment of the
because we hypothesized that factors influencing difficulty
patients was performed in a Preoperative Anesthesia Test-
of intubation may not behave linearly, but instead be either
ing Clinic staffed by a small cadre of specially trained
easier or harder on both sides of a central value. As an
preanesthetic nurse practitioners using a structured elec-
example, one could hypothesize that if both the small jaw
tronic medical record and under the direct supervision of
length of micrognathia and the large jaw length of acro-
an experienced attending anesthesiologist. Because the
megaly suggest difficulty, then ease of intubation may be
clinical assessments of MP test and TMD were performed
optimum at some middle value and fall away on either
before surgery to determine the ease or difficulty of intu-
side. Furthermore, use of the quadratic logit does not
bation, the clinic staff was blinded to the ultimate cohort
exclude that a linear relationship may be found, because
assignation at the time of assessment. TMD was measured
the fitting of the quadratic logit may produce a curve that
in fingerbreadths with the head in a neutral position, as is
is locally straight over the region of interest. Hence, using
the usual clinical practice at our institution and elsewhere.9
only data taken from the 40 patients in the model
The photographs were analyzed by facial structure
derivation cohorts, each variable and its square were
analysis software (FaceGen Modeller v3.3; Singular In-
used as inputs to a logit function in all possible combi-
versions, Toronto, Canada). This software uses an algo-
nations of inclusion or noninclusion. The coefficients of
rithm to generate a mathematical model of the face based
these logits were optimized to produce candidate mod-
on a weighted contribution of predetermined “eigen-
els. The area under the curve (AUC) of the receiver
faces.” An example of a completed model is shown in
operating characteristic (ROC) of each candidate model
Figure 1 (image of the first author). The eigenface
was calculated18 and stored as AUCderivation for that
method allows the structure of any particular individual
candidate model. We selected models optimized for
face to be expressed in an elegant and compact form.10
AUC rather than raw accuracy because this technique is
Each of the weighting values for the eigenfaces can be
more robust with regard to future data.19
considered to be a coordinate value, allowing the whole
physiognomy of an individual face to be represented Model Validation
solely as a point in a 50-dimensional space.11 The facial Model validation was then performed to avoid choosing a
analysis software implements a further improvement to candidate model that is overfitted to the derivation dataset.
The candidate models produced in the model derivation AUCvalidation. This method excludes models that show evi-
stage were applied without further adjustment to the data dence of overfitting, a problem with large variable spaces. The
taken from the 40 patients in the model validation cohorts, mathematics underlying the rationale of AUC product maxi-
producing an AUCvalidation for each candidate model. mization are described in Appendix B. Briefly, using the
Final model selection was performed by choosing the observation that any overfitting to gain performance in one
candidate model with the maximum value of AUCderivation ⫻ subset will lead to a comparable loss of performance in a
Cheeks: concave/convex Mouth: lips deflated/inflated The facial structure analysis software requires some user
Cheeks: round/gaunt Mouth: lips large/small
Chin: forward/backward Mouth: lips puckered/retracted interaction to place certain fiducial markers on the images
Chin: pronounced/recessed Mouth: lips thin/thick to guide reconstruction. Ten patients were selected at
Chin: retracted/jutting Mouth: protruding/retracted random from the 80 study patients and their photographs
Chin: shallow/deep Mouth: tilt up/down were rerendered into 3-dimensional models to test the
Chin: small/large Mouth: underbite/overbite
reproducibility of the reconstruction process.
Chin: tall/short Mouth: up/down*
Chin: wide/thin* Mouth: wide/thin
Eyes: down/up* Mouth: chin distance, short/long*
Eyes: small/large Nose: bridge shallow/deep
RESULTS
Eyes: tilt inward/outward* Nose: bridge short/long Of the 61 descriptive facial proportions and the physical
Eyes: apart/together Nose: down/up properties of MP test and TMD, 11 showed a univariate
Face: brow-nose-chin ratio*a Nose: flat/pointed statistical trend in discriminating between easy and diffi-
Face: forehead-sellion-nose ratio Nose: nostril tilt down/up* cult intubations. These 11 variables, shown with an asterisk
Face: heavy/light Nose: nostrils small/large
in Table 1, were evaluated as possible inputs to the model,
Face: round/gaunt Nose: nostrils wide/thin
Face: tall/short Nose: region concave/convex producing a total of 211 ⫺ 1 ⫽ 2047 candidate models. The
Face: up/down Nose: sellion down/up final model was chosen by AUC product maximization and
Face: wide/thin Nose: sellion shallow/deep (1) found to depend on only 3 facial proportions plus TMD, as
Forehead: small/large Nose: sellion shallow/deep (2) marked with a superscript “a” in Table 1. The parameters of
Forehead: tall/short Nose: sellion thin/wide
Forehead: tilt forward/back Nose: short/long the model are stated numerically in Appendix C. Relative
Head: thin/wide Nose: tilt down/up*a to the population normal shown in Figure 2, the variations
Temples: thin/wide in facial appearance described by the 3 facial proportions
Thyromental distance*a Mallampati class used in the airway algorithm are shown in Figure 3.
The thyromental distance and Mallampati class are included in the table as 2
further variables that were used for modeling. Clinical Interpretation of the Model
* Demonstrated at least a statistical trend (P ⱕ 0.1) with identified difficult Figure 4 shows the classification and statistical behavior of
intubation.
a
Appear in the final model.
the algorithm when applied to the model derivation data-
set, the validation dataset, and the 2 datasets combined.
The algorithm successfully clusters the easy and difficult
airways toward opposite ends of the logit curve (Fig. 4, A,
C, and E). ROC curves were constructed for each test
Figure 3. Variations in facial appearance from the average head shown in Figure 2 by standard deviations of the descriptive facial proportions
used in the airway algorithm. is the standard deviation from the normal head derived from 300 individuals.
population and are shown adjacent to the classification only those difficult patients with an IDS score8 ⬎5, the
behavior (Fig. 4, B, D, and F). Table 2 contains the numeri- model correctly classified 89%, with a sensitivity of 96%,
cal representations of these statistical properties. The per- specificity 85%, and AUC 90.2%.
formance of the algorithm is stated in terms of its accuracy The MP test did not show a statistical trend with ease or
as a binomial classifier, allowing calculation of P values difficulty of intubation and so did not form part of the
according to the binomial distribution. In selecting a model described process of model selection. Even when the MP
based on the data, the problem of multiple comparisons test was explicitly forced into the reduced set of variables,
must be addressed. Because 11 variables participated in it did not affect the final model selection and so inclusion of
generation of the model, and all possible combinations of MP test did not add further predictive information.
models were exhaustively evaluated, each variable partici- To allow comparison of the model to classical airway
pated in 210 models. Application of the Bonferroni correc- assessment tools, the ability of the MP test and TMD to
tion for these multiple comparisons still yielded highly classify difficult intubation was tested against the study
significant P values (Table 2). When applied to the com- population. Table 3A shows the statistical performance of
bined dataset, the performance of the airway classification the MP test and TMD both alone and together when used
algorithm showed a sensitivity of 90% and a specificity of as variables for the quadratic logit model. This analysis
85%. The area under the ROC curve (AUC) was 89.9%. ascertained the maximum performance of these tests when
When applied to the combination of all easy patients and their thresholds are allowed to be optimized against the
Exact binomial probability test P ⫽ 9.29 ⫻ 10⫺7 P ⫽ 4.18 ⫻ 10⫺6 P ⫽ 1.58 ⫻ 10⫺12 P ⫽ 2.14 ⫻ 10⫺11
Bonferroni correction 210 210 210 210
Corrected probability P ⫽ 9.51 ⫻ 10⫺4 P ⫽ 4.28 ⫻ 10⫺3 P ⫽ 1.62 ⫻ 10⫺9 P ⫽ 2.19 ⫻ 10⫺8
IDS ⫽ intubation difficulty scale.
a
The combination of both the derivation set and validation set.
b
The set of all easy patients with all difficult patients for whom an IDS score ⬎5 was estimated (n ⫽ 65).
Table 3A. Statistical Performance Measures of Classical Airway Assessment Tools When Optimized with
Respect to the Study Model Derivation Population
Mallampati test Thyromental distance Bivariate model (MP and TMD)
Statistical properties Derivation Validation Derivation Validation Derivation Validation
Sensitivity 0.4 0.1 0.85 0.8 0.8 0.65
Specificity 0.85 0.9 0.5 0.5 0.7 0.7
True positives 8 2 17 16 16 13
True negatives 17 18 10 10 14 14
False positives 3 2 10 10 6 6
False negatives 12 18 3 4 4 7
Accuracy 25/40 20/40 27/40 26/40 30/40 27/40
Table 3B. Performance of Classical Airway Assessment Tools Alone and in Combination When Used with
Their Frequently Ascribed Thresholds
Statistical properties MP score >3 TMD <3 MP score >3 and TMD <3 MP score >3 or TMD <3
Sensitivity 0.25 0.125 0.05 0.325
Specificity 0.875 0.925 0.95 0.85
True positives 10 5 2 13
True negatives 35 37 38 34
False positives 5 3 2 6
False negatives 30 35 38 27
Accuracy 45/80 42/80 40/80 47/80
MP ⫽ Mallampati; TMD ⫽ thyromental distance.
model derivation data in the same manner used in the magnitude but opposite to Figure 5A. Figure 5B might
derivation of the new airway model. The classical tools therefore be considered to represent a patient as easy to
nevertheless demonstrate substantially weaker perfor- intubate as the patient in Figure 5A would be difficult.
mance than the model (57 of 80 correctly classified, Fisher
exact test, P ⫽ 0.018 for difference from computer model). Mathematical Interpretation of the Model
Table 3B shows the performance of the MP test and TMD A mathematical interpretation of the model is given in
when used in the usual clinical manner with their classi- Appendix C, which gives a more precise interpretation of
cally ascribed thresholds, without the inclusion of a the meaning of the various parameters and the resulting
squared term. The performance here is again inferior and logit calculated by the model.
the greatest achieved accuracy of 47 of 80 did not rise to the
level of statistical significance when compared with chance Facial Structure Test-Retest Validation
(P ⫽ 0.073, exact binomial distribution) and was inferior to A correlation coefficient of r ⫽ 0.80 was established across
the computer model (P ⬍ 0.0001, Fisher exact test). the 610 variables, indicating a high degree of reproducibil-
Because this new airway model describes appearance, it ity. The classifications (predicted easy versus difficult) of
is possible to generate pictures of faces that would appear these 10 patients by the algorithm described in Appendix C
to have certain degrees of ease or difficulty of intubation. were unchanged by rerendering.
Figure 5A illustrates the head that is theoretically most
difficult to intubate according to the model. Figure 5B DISCUSSION
represents a head that the model would classify as easy to In our study, computerized facial structure analysis com-
intubate. The parameter values for this head are set such bined with a widely used bedside airway evaluation
that the value produced by the model is of the same method yielded a model that significantly outperformed
popular clinical predictive tests. Our model accurately have not proven to be accurate.4 Indeed, getting anesthesi-
classified 70 of 80 airways compared with 47 of 80 for MP ologists to pay attention to the airway may be the principal
test plus TMD using classical thresholds.1,3,5 benefit of routinely performing airway examinations before
Use of a bedside examination to predict difficult intuba- induction.22
tion is considered the standard of care in modern anesthe- Our study differs from previous work using facial
siology practice. It has been incorporated into the difficult imaging to evaluate the airway. Suzuki et al.26 used digital
airway algorithm of not only the American Society of photographs of subjects’ faces to calculate 5 ratios and
Anesthesiologists7 and those of several other countries,20 angles from measurements derived from placement of
but also most recently into the World Health Organization anatomic markers on the photographs. They found one, the
Surgical Safety Checklist,21 the use of which is being “submandibular angle,” to be correlated with difficult
encouraged in every operating room in the world. Unfor- tracheal intubation. They also used morphing software to
tunately, all easily performed examination systems in clini- construct “average” easy and difficult to intubate faces,
cal practice perform only modestly, with sensitivities of
which we believe bear some subjective resemblance to our
20% to 62%, specificities of 82% to 97%, and very low
Figure 5. Similarly, Naguib et al.27 measured 22 indices
positive predictive values, generally ⬍30%, unless very
from plain radiographs and 8 from 3-dimensional com-
liberal definitions of difficulty are used.22 There are likely a
puted tomographic scans of the head in patients who were
number of reasons for this poor performance, including the
easy or difficult to intubate. They constructed a model
relative rarity of difficult intubation,22 the multifactorial
etiology and varying definition of difficult intubation, containing 3 bedside tests (MP test, TMD, and thyroster-
interobserver variability in test results,23,24 failure to vali- nal distance) and 2 radiographic features that accurately
date potential systems in patients independent of those separated the easy and difficult cohorts with an AUC of
used to derive the test,22 and the inadequacy of the tests the ROC curve of 0.97. Both of these previous investiga-
themselves. Conversely, experienced anesthesiologists al- tions, however, used a priori assumptions of which
most certainly use cues other than those derived from anatomic features might relate to difficult laryngoscopy
formal bedside tests to formulate their clinical impression and intubation. Both also required actual measurement
of the ease of intubating any given patient. There may be a of anatomic features. In contrast, our method modeled
large number of anatomic factors that enter into such a the entire physiognomy of the face with no such assump-
judgment.25 However, bedside scores based on such factors tions and no direct measurements. Moreover, the method
study cohorts. For example, some patients with limited digital photographs. It relies presently on a relatively
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 07/18/2023
neck mobility but otherwise normal airways are difficult to inefficient iterative algorithm to do so, and exerts consid-
intubate.28 Further refinement of the model could include erable computing power on modeling the coloration and
subjective or measured indices of neck extension, or indeed texture of the skin. Certainly, a more efficient one could be
other predictors of difficult intubation such as body mass written, particularly if only a few parameters need to be
index. Second, we measured TMD in the neutral position derived to predict difficult intubation. Indeed, we have a
rather than full neck extension, and we used fingerbreadths prototype algorithm that can analyze a face, derive the
rather than measured distance. This is the method in relevant parameters, and calculate the intubation predic-
routine use in our institution but some evidence suggests tion in less than 1 minute (data not shown). If proven
measuring TMD in extension is more predictive of intuba- practical for widespread clinical use, this would represent a
tion difficulty29 and, although popular, fingerbreadth mea- significant advance over previously published methods
surements are inferior to ruler measurements.9 Conversely, involving offline measurements taken from radiographs or
if the model included this potentially inferior measure- photographs. Second, the computing power required is
ment, refining it by using full-extension TMD could only modest but exceeds that of current handheld devices. We
improve performance of the model. Third, we sought to envision that clinical use of our model would be most
eliminate potential racial or gender-based confounding by efficiently deployed using high-speed computers accessible
confining our sample to Caucasian males. Finally, because to clinicians over a network, perhaps using handheld
photographs were obtained postoperatively, we cannot computers or smartphones incorporating digital cameras as
entirely exclude the possibility of changes in facial appear- input devices. Third, the requirement for manual place-
ance caused by anesthesia and surgery. Only a large, ment of fiducial markers to guide reconstruction is a
prospective study in a diverse patient population would be potential source of user error. However, it is encouraging
able to verify the performance of our model in general that our test-retest results revealed no cases in which the
clinical use. It is encouraging that the model predicts ease overall judgment of ease or difficulty of intubation varied.
in the computerized normal face, and that it performs Finally, the performance of the model should be compared
better than bedside tests within the study cohorts. It is also with that of experienced clinicians given similar data. The
possible that deriving and validating it within a larger model would be particularly useful if it could predict
fraction of the difficult airway “space” could further refine difficult intubation when an experienced clinician had not
the model. suspected it, a more dangerous clinical situation than the
Another potential limitation is the method used to converse error of judgment. If the model outperforms
categorize the subjects as easy or difficult to intubate. We human experts, then its applicability would potentially be
used a liberal definition of difficult intubation, which quite broad and would include even seasoned anesthesi-
causes the positive predictive value to increase. In real- ologists. Conversely, if human operators can match the
world clinical use, anesthesiologists are likely more model’s performance, then the software may be of greater
interested in very difficult intubation, and the positive utility to nonairway experts. This assessment is an area of
predictive value will be lower, as it is for all difficult active research by our group.
airway predictive methods. Conversely, use of a liberal In summary, the model presented herein significantly
definition makes the statistical task of separating the easy outperformed the current standard of the combination of
and difficult cohorts more difficult, not less so,22 because MP and TMD examinations, and is based on quantification
the 2 groups of patients are more similar. This makes the of facial anatomy performed by an unbiased computer
strong performance of our model notable. Moreover, the algorithm. Additional work should define the ability of
model performed even better in the subset of patients with experienced clinicians presented with similar photographs
an IDS score ⬎5, who had comparatively more difficult and bedside airway examination results, and the ability of
intubations. However, it is decidedly problematic to infer the computer model to prospectively predict difficult intu-
the comparative difficulty of an intubation from the after- bation in a large and diverse patient population. If the
the-fact description of the technical maneuvers required to superiority of the method can be confirmed, the model
manage that airway.6 For example, would an intubation could represent an important advance in the assessment of
achieved over a bougie on the third attempt be considered the airway.
more or less difficult than one in which the anesthesiologist
decided to use a video technique after the first unsuccessful
attempt? Even presuming equally experienced laryngosco- Appendix A: A Real-World Representation of the
pists, the comparison is confounded by differences in Quadratic Logit
comfort with, and availability of, other adjunct techniques. The quadratic logit function admits both the value of a
This ambiguity also complicates the use of research tools parameter and its square:
写 写
L共 z 兲 ⫽
i⫽⫹1
(A5) ⴝ AUCideal
2
ⴚ 20 (B3)
共 x i兲 ⫹ C 䡠
i,i2 共 x i兲
i,i2
i⫽⫹1 i⫽⫺1 The value of equation (B3) is maximized only when 0 ⫽ 0,
where C is a numerical constant determined by the model, describing no overfitting. The candidate model that gener-
defined as: ates the greatest AUC product as defined by equation (B3)
is therefore likely to be the model that most closely approxi-
写 冑
共i 2兲 mates the theoretically ideal model and has the least
写
C⫽
i⫽⫺1
e ⫺ 0
(A6) overfitting.
共 冑2 兲
i
i⫽⫹1 Appendix C: Mathematical Interpretation of the
Equation (A5) is a curious result, because we believe it Final Logistic Model
mathematically describes a mental process similar to that The parameters of the selected airway classification model
performed by anesthesiologists given the task of assess- are given in the following table:
ing an unknown airway. The combination of factors
favorable to intubation are weighed against the counter- Parameter x ␣
vailing unfavorable factors and, based on their relative Face (brow-nose-chin ratio) 2.995 2.417 ⫹1
preponderance, a decision is reached with some greater Jaw-neck (slope high/slope low) ⫺13.683 3.255 ⫹1
or lesser degree of confidence. The apparent artificiality Nose (tilt down/tilt up) 0.557 0.735 ⫹1
Thyromental distance (fingerbreadths) 2.032 0.738 ⫹1
of the quadratic logit model thus leads to a surprisingly
Greatest modeled difficulty (logit units) 0 ⫽ 10.85
natural result.
intubation improves (⫹1) or worsens (⫺1) as the value of inversion, and race in face recognition. Q J Exp Psychol A
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 07/18/2023