You are on page 1of 9

Society for Ambulatory Anesthesiology

Section Editor: Peter S. A. Glass

Derivation and Validation of a Simple Perioperative Sleep Apnea Prediction Score

Satya Krishna Ramachandran, MD, FRCA,* Sachin Kheterpal, MD, MBA,* Flavia Consens, MD, Amy Shanks, MS,* Tara M. Doherty, DO,* Michelle Morris, MS,* and Kevin K. Tremper, PhD, MD*
BACKGROUND: Obstructive sleep apnea (OSA) is a largely underdiagnosed, common condition, which is important to diagnose preoperatively because it has implications for perioperative management. Our purpose in this study was to identify independent clinical predictors of a diagnosis of OSA in a general surgical population, develop a perioperative sleep apnea prediction (P-SAP) score based on these variables, and validate the P-SAP score against standard overnight polysomnography. METHODS: A retrospective, observational study was designed to identify patients with a known diagnosis of OSA. Independent predictors of a diagnosis of OSA were derived by logistic regression, based on which prediction tool (P-SAP score) was developed. The P-SAP score was then validated in patients undergoing overnight polysomnography. RESULTS: The P-SAP score was derived from 43,576 adult cases undergoing anesthesia. Of these, 3884 patients (7.17%) had a documented diagnosis of OSA. Three demographic variables: age 43 years, male gender, and obesity; 3 history variables: history of snoring, diabetes mellitus Type 2, and hypertension; and 3 airway measures: thick neck, modied Mallampati class 3 or 4, and reduced thyromental distance were identied as independent predictors of a diagnosis of OSA. A diagnostic threshold P-SAP score 2 showed excellent sensitivity (0.939) but poor specicity (0.323), whereas for a P-SAP score 6, sensitivity was poor (0.239) with excellent specicity (0.911). Validation of this P-SAP score was performed in 512 patients with similar accuracy. CONCLUSION: The P-SAP score predicts diagnosis of OSA with dependable accuracy across mild to severe disease. The elements of the P-SAP score are derived from a typical university hospital surgical population. (Anesth Analg 2010;110:100715) bstructive sleep apnea (OSA) is a prevalent condition in 9% to 24% of the general population1 that occurs as a result of partial or complete airway obstruction during sleep and is associated with episodic hypoxemia. Both anesthesia and surgery affect sleep patterns, resulting in apnea or desaturation even in patients without presumed OSA,2 but OSA increases this risk significantly.3 OSA increases the risk of cardiac arrhythmias,4,5 myocardial infarction,4 stroke,6 and sudden death7 in the general population. An important step in reducing morbidity in this patient population is identifying those with OSA preoperatively.8 The diagnosis of OSA is usually based on a sleep study, but it is impossible to envisage the routine use of overnight polysomnography (OPS) as a perioperative screening test because of cost and resource constraints. Recent American Society of Anesthesiologists (ASA) practice

From the Departments of *Anesthesiology and Neurology, University of Michigan, University Hospital, Ann Arbor, Michigan. Accepted for publication January 11, 2010. Supported by Departmental resources. Address correspondence and reprint requests to Dr. Satya Krishna Ramachandran, Department of Anesthesiology, 1 H427 University Hospital Box 0048, 1500 E. Medical Center Dr., Ann Arbor, MI 48109-0048. Address e-mail to Copyright 2010 International Anesthesia Research Society
DOI: 10.1213/ANE.0b013e3181d489b0

guidelines8 stress the need to identify OSA in the perioperative period through history, physical assessment, and laboratory tests. There are 2 important considerations regarding perioperative screening for OSA. First, there are currently numerous prediction models, some of which are highly accurate in predicting OSA. The most successful models typically use either a complex risk derivation formula based on multiple variables9 or combine such formulae with additional measurements and investigations such as morphometry10 and cephalometry.11 The complexity of these prediction models reduces their utility in the immediate perioperative period. However, simplicity in terms of test design comes at the cost of accuracy, and the simplest model, the STOP (snoring, tiredness during daytime, observed apnea, and high blood pressure) questionnaire12 has a sensitivity of 0.65 at an apnea-hypopnea index (AHI) threshold of 5. The STOP-BANG (BMI, age, neck circumference, and gender) model, which was derived from the validation set of the STOP study,12 was shown to have excellent value as a perioperative screening test in the presence of severe OSA, but is yet to be validated prospectively. Second, the reliability of a diagnostic test is best when derived in a study population that closely resembles clinical practice, because distinguishing features of a disease are different in a high-risk population.13 The diagnostic accuracy of a test can be biased or overestimated if a test is derived in a group of patients with underlying high prevalence of a disease,

April 2010 Volume 110 Number 4


Prediction of Obstructive Sleep Apnea in a Surgical Population

rather than in a typical clinical population.13 The prevalence of OSA as defined by AHI 5 in the STOP study was 73% in the derivation group and 69% in the validation group,12,14 similar to most clinical screening test studies for OSA that are based on sleep laboratories.9 11 In this context, we sought to identify independent clinical predictors of a perioperative diagnosis of OSA in a broad spectrum university hospital surgical population, using common preanesthetic evaluation methods, and develop a perioperative OSA prediction model based on these variables.


class,16 qualitatively assessed thick neck,12 reduced thyromental (TM) distance17 estimated 6 cm, reduced mouth opening estimated 4 cm, mandibular protrusion test assessed as inability to prognath lower incisors anterior to upper incisors, and clinically estimated reduced cervical spine mobility, were additionally included as variables because several of these have previously been associated with difficult airway.18 These airway variables were assessed at the discretion of the individual caregivers. Data on CPAP treatment were not specifically collected because both duration and regularity of usage were poorly documented.

After obtaining IRB approval (University of Michigan, Ann Arbor, MI), we performed the study in 2 steps. The first step involved deriving the screening test from a broad spectrum of surgical patients: the general surgical population, or GSP group. The second step involved validating the screening test in a set of patients undergoing overnight sleep study, the overnight polysomnography or OPS group.

Statistical Analysis

All adult patients undergoing general anesthesia during a 40-month period were analyzed in this study. Individual patient informed consent was waived by the IRB because no clinical interventions were studied, and no patient-identifiable data were used. Exclusion criterion was patients younger than 18 years. The primary outcome measure was the perioperative diagnosis of OSA. This was defined as OSA diagnosed with OPS and treated with continuous positive airway pressure (CPAP), bilevel positive airway pressure, or surgery for OSA. Patients who satisfied these criteria were grouped together as GSP-OSA. The remaining patients were grouped together and called GSP-controls. Perioperative, intraoperative, and postoperative data were collected from routine clinical documentation entered by anesthesiology residents, attending staff, and certified registered nurse anesthetists into the institutions perioperative clinical information system (Centricity, General Electric Healthcare, Waukesha, WI). The data analysis was performed retrospectively. The clinical evaluation form and its data storage were designed not only to serve clinical purposes but also to collect data for observational research studies. Each clinical element (body mass index [BMI], snoring, etc.) is stored as a discrete database element. In addition, a structured, predefined pick list is used by the clinician to enter information (Appendix). The demographic and history variables were chosen from the anesthesia assessment dataset after a thorough literature search for associations with OSA from among frequently used perioperative assessment tools. For each of the patients included in the study, data on the following variables were collected: patient or family member report of snoring,12 patient report of treated or untreated hypertension,12 patient report of treated Type 2 diabetes mellitus,15 and calculated BMI12 from patient report of weight and height, age,12 and gender.12 Airway variables frequently assessed preoperatively, namely, modified Mallampati

Derivation of Prediction Score in the GSP Group

Statistical analysis was performed using SPSS version 15 (SPSS, Chicago, IL). Colinearity diagnostics were performed on all the variables as well as bivariate correlation matrix to evaluate pairwise correlations and address any groups with a pairwise correlation 0.70. Continuous variables were transformed into dichotomous variables by identifying the maximal sum of specificity and sensitivity using a receiver operating characteristics (ROC) curve. Variables were then entered into a logistic regression full model fit. All significant variables (P 0.05) were deemed significant independent predictors of OSA. A hazard ratio was calculated for each significant predictor comparing the likelihood of OSA with and without the risk factor. This model was evaluated using the area under the ROC curve. An unweighted clinical scale was produced assigning 1 point per independent predictor. A weighted scale was also derived based on the coefficients derived from the logistic regression full model fit. The predictive accuracy of the weighted and unweighted scales was separately assessed using the area under the ROC curve. The scale that best combined ease of use with accuracy was called the perioperative sleep apnea prediction (P-SAP) score. To describe the effect of incremental risk factor presence on the predictability of OSA, the sensitivity, specificity, positive predictive value, negative predictive value, positive and negative likelihood ratios, and diagnostic odds ratios were then calculated for each threshold score for P-SAP score.

The P-SAP score was then validated in a series of patients who underwent OPS at our institution (OPS group). These patients initially presented to the sleep laboratory for symptoms and features highly suggestive of OSA. These patients underwent surgery within a 6-month period of the sleep study during which time the elements of the P-SAP score were assessed as part of their perioperative assessment. The patients were identified by combining the anesthesia clinical information system database and the sleep laboratory database at the University Hospital. P-SAP relevant data were entered prospectively into the electronic anesthesia record (Centricity, General Electric Healthcare) by anesthesia caregivers blinded to the results of polysomnography. Subjects were studied with standard polysomnography techniques in the University of Michigan sleep laboratory for at least 7

Validation of the P-SAP Score in the OPS Group



Table 1. Preoperative Patient Characteristics

Male gender Snoring Thick neck Mallampati 3 or 4 Reduced mouth opening Limited jaw protrusion Limited c-spine mobility Reduced TMD Diabetes mellitus Type 2 Hypertension Age, y (mean SD) Age 43 y BMI (mean SD) BMI 30 ASA physical status IV or V GSP-controls (n 40,448) 17,752 (44)* 9358 (23) 4746 (12) 4210 (10) 1177 (2.9) 3877 (9.6) 3240 (8.0) 2242 (5.5) 3914 (9.7) 13,983 (35) 50.9 16.9 27,261 (67) 28.2 6.5 12,958 (32) 2406 (5.9) GSP-OSA (n 3128) 1939 (62) 2024 (65) 1267 (41) 749 (24) 132 (4.2) 386 (12) 381 (12) 280 (9.0) 773 (25) 1739 (56) 53.6 13.4 2493 (80) 35.1 8.4 2187 (70) 312 (10) % Complete data GSP 100 100 100 100 100 100 100 100 100 100 100 100 100 100 99.9 OPS-control (n 124) 41 (33.1) 44 (35.5)* 26 (21)* 18 (14.5)* 4 (3.2) 14 (11.3) 13 (10.5) 15 (12.1)* 13 (10.5) 46 (37.1) 48.9 15.1 89 (71.8)* 31.9 7.9* 61 (49.2)* 6 (4.8) OPS-OSA (n 387) 246 (63.6) 201 (51.9) 123 (31.8) 81 (20.9) 15 (3.9) 44 (11.4) 50 (12.9) 40 (10.3) 103 (26.6) 205 (53) 56.4 13.9 328 (84.8) 34.0 7.8 248 (64.1) 37 (9.6) % Complete data OPS 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

GSP general surgical patient; OPS overnight polysomnography; OSA obstructive sleep apnea; TMD thyromental distance; BMI body mass index (kg/m2). Reduced TMD, reduced mouth opening, thick neck, and limited c-spine mobility were subjective assessments by anesthesia providers. * Signicant differences between GSP-OSA and OPS-OSA. Signicant differences between GSP-control and GSP-OSA. Signicant differences between GSP-control and GSP-OSA. P values calculated using Pearson 2 for categorical variables, statistical signicance P 0.05. Signicant differences between GSP-OSA and OPS-OSA. Note the close similarity between frequency of studied variables of GSP-OSA and OPS-OSA groups. Prevalence of univariate predictors in GSP-controls is lower than the OPS-controls in all except male gender prevalence.

Table 2. Independent Predictors of Obstructive Sleep Apnea

Predictors Male gender History of snoring Thick neck Mallampati 3 or 4 Hypertension Diabetes type 2 BMI 30 Age 43 years Reduced TMD P 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.049

coefcient 0.655 1.366 0.666 0.367 0.254 0.413 0.991 0.245 0.147

Weighted element score 9 19 9 5 3 6 13 3 2

Independent predictors of obstructive sleep apnea were derived using a logistic regression full model t. These 9 independent predictors were then used to create both weighted and unweighted clinical scales. The area under receiver operating characteristic (ROC) curve for the weighted model was 0.82 0.004, and the unweighted model was 0.79 0.004. The unweighted model represented the best balance between ease-of-use and accuracy and was used to develop the P-SAP score. TMD thyromental distance; BMI body mass index (kg/m2).

hours. Four electroencephalographic channels (C3, C4, O1, and O2 by the international 10 20 system), 3 chin electromyogram leads, 2 electrooculogram leads, 2 electrocardiogram leads, snoring sound, respiratory effort using piezoelectric belts over the chest and abdomen, airflow at the nose and mouth using thermocouples and nasal pressure cannulas, and 2 bilateral surface electromyogram electrodes (placed over the anterior tibialis muscles) were recorded. Oxyhemoglobin saturation (Spo2) was monitored by pulse oximetry. Experienced polysomnography technologists, blinded to the P-SAP score, used standard techniques to manually score all recordings for sleep stages, respiratory

events, and limb movements. Polysomnography measures followed the rules of Rechtschaffen and Kales19 for sleep staging and standard recommendations for respiratory scoring. Patients with AHI 5 per hour were diagnosed as having OSA (OPS-OSA group). Patients with AHI 5 per hour were grouped as OPS-controls. Diagnostic thresholds of AHI were chosen as 5 to 14.9 events per hour, 15 to 29.9 events per hour, and 30 per hour for diagnosis of mild, moderate, and severe OSA, respectively, as described in previous studies.12,14 Using these AHI thresholds, the P-SAP score was validated at 2 and 6 to provide the following summary measures of accuracy: sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios, and diagnostic odds ratios. Finally, comparisons were made between GSP-controls and OPS-controls on the one hand and GSP-OSA and OPS-OSA on the other, to identify whether generalizations could be made to the study subgroup populations. Significant differences were calculated using Pearson 2 for categorical variables and t test for continuous variables, with statistical significance set at P 0.05.


During the study period (May 2004 to September 2007), 43,576 cases were noted to have complete data entry for variables of interest to the study. Of these, 3884 patients (7.17%) had OSA diagnosed with OPS and were treated with CPAP, bilevel positive airway pressure, or surgery. Univariate analysis showed a statistically higher prevalence of the following variables in the OSA group: male gender, snoring, thick neck, modified Mallampati class 3

April 2010 Volume 110 Number 4


Prediction of Obstructive Sleep Apnea in a Surgical Population

Figure 1. Hazard ratio of independent perioperative predictors of obstructive sleep apnea. These 9 independent perioperative predictors of perioperative diagnosis of obstructive sleep apnea were identied using a full-model t logistic regression model. A hazard ratio (95% condence interval) for each risk factor was derived by comparing the odds of having a diagnosis of obstructive sleep apnea in patients with and without the given risk factor.

Table 3. Summary Characteristics of the P-SAP ScoreGSP (Derivation) Cohort

Unweighted clinical scale 1 2 3 4 5 6 7 8 9 Sensitivity (95% CI) 0.985 (0.9360.998) 0.939 (0.9180.966) 0.851 (0.8140.886) 0.667 (0.5790.762) 0.451 (0.3760.547) 0.239 (0.2260.249) 0.09 (0.0760.115) 0.02 (0.0060.034) 0.03 (0.0080.014) Specicity (95% CI) 0.099 (0.0490.227) 0.323 (0.2320.467) 0.574 (0.4910.667) 0.773 (0.6920.847) 0.898 (0.7990.913) 0.962 (0.9540.982) 0.989 (0.9620.996) 0.998 (0.9750.999) 0.999 (0.9820.999) PPV (95% CI) 0.08 (0.040.12) 0.10 (0.090.24) 0.14 (0.080.25) 0.19 (0.120.28) 0.26 (0.210.39) 0.33 (0.230.47) 0.40 (0.320.55) 0.44 (0.330.53) 0.77 (0.590.86) NPV (95% CI) 0.99 (0.980.99) 0.99 (0.980.99) 0.98 (0.940.99) 0.97 (0.940.99) 0.95 (0.930.99) 0.94 (0.910.99) 0.93 (0.810.98) 0.93 (0.810.98) 0.93 (0.810.99) LR (95% CI) 1.093 (1.0871.097) 1.388 (1.3751.399) 1.998 (1.9682.027) 2.934 (2.8573.010) 4.409 (4.2244.599) 6.323 (5.8876.788) 8.556 (7.5049.752) 10.149 (7.61413.525) 42.494 (12.624143.084) LR (95% CI) 0.156 (0.1210.201) 0.187 (0.1650.212) 0.260 (0.2410.280) 0.431 (0.4120.450) 0.612 (0.5950.629) 0.791 (0.7790.804) 0.919 (0.9120.927) 0.981 (0.9770.984) 0.997 (0.9970.998) DOR (95% CI) 6.992 (5.4139.033) 7.404 (6.4818.458) 7.682 (7.0218.406) 6.811 (6.3517.305) 7.204 (6.7207.723) 7.990 (7.3248.717) 9.307 (8.09610.699) 10.346 (7.73413.840) 42.601 (12.644143.522)

P-SAP preoperative sleep apnea prediction; GSP general surgical population. The effect of incremental risk factor presence on sensitivity, specicity, positive predictive value (PPV), negative predictive value (NPV), positive (LR) and negative (LR) likelihood ratios (LR) and diagnostic odds ratio (DOR) with 95% condence intervals (CIs).

Table 4. Summary Measures of Accuracy of Validated P-SAP Score in OPS Group

PSAP score 2 AHI 5 15 30 5 15 30 Sensitivity (95% CI) 0.946 (0.9180.966) 0.973 (0.9430.990) 0.982 (0.9360.998) 0.217 (0.1770.262) 0.262 (0.2060.325) 0.324 (0.2390.420) Specicity (95% CI) 0.258 (0.1840.344) 0.168 (0.1260.216) 0.127 (0.0960.164) 0.911 (0.8470.955) 0.874 (0.8300.910) 0.853 (0.8140.886) PPV (95% CI) 0.799 (0.7850.812) 0.479 (0.4670.486) 0.236 (0.2260.239) 0.844 (0.812923) 0.621 (0.5310.705) 0.379 (0.2980.463) NPV (95% CI) 0.604 (0.4790.717) 0.889 (0.7830.948) 0.962 (0.8770.990) 0.272 (0.2550.283) 0.601 (0.5800.620) 0.820 (0.8020.839) LR (95% CI) 1.275 (1.1461.418) 1.170 (1.1061.237) 1.125 (1.0751.177) 2.447 (1.3504.436) 2.083 (1.4303.034) 2.204 (1.5423.151) LR- (95% CI) 0.210 (0.1260.351) 0.159 (0.0690.365) 0.143 (0.0350.578) 0.859 (0.7960.927) 0.844 (0.7720.923) 0.792 (0.6920.907) DOR (95% CI) 7.361 (3.08917.541) 11.466 (1.55284.723) 1.937 (0.23116.247) 6.062 (3.34011.002) 2.518 (1.5624.061) 2.848 (1.4655.535)

OPS overnight polysomnography; AHI apnea hypopnea index (events/h); P-SAP preoperative sleep apnea prediction. The effect of incremental risk factor presence on sensitivity, specicity, positive predictive value (PPV), negative predictive value (NPV), positive (LR) and negative (LR) likelihood ratios (LRs) and diagnostic odds ratio (DOR) with 95% condence intervals (CIs) for P-SAP threshold scores of 2 and 6.

or 4, limited mouth opening, limited jaw protrusion, limited c-spine mobility, reduced TM distance, Type 2 diabetes mellitus, hypertension, BMI 30 kg/m2, and age 43 years (P 0.05, Table 1). Colinearity diagnostics did not reveal a condition index higher than 12.6. The maximal bivariate correlation matrix was 0.419; therefore, no variables were removed from the model. Age and BMI were converted into categorical variables using a ROC curve and demonstrated the optimal balance of sensitivity and specificity at 43 years for age and 30 kg/m2 for BMI. A logistic regression full model fit was performed on 43,576 valid cases and revealed 9 independent perioperative predictors (P 0.05): male gender, history of snoring, thick neck, modified Mallampati class 3 or 4, TM distance 6 cm, hypertension, Type 2 diabetes mellitus, BMI 30 kg/m2, and age 43 years (Table 2). The model was evaluated using the omnibus tests for

coefficients, which revealed a 2 of 4173.734, with 12 degrees of freedom, and a P value 0.001. The area under the ROC curve for the unweighted model was 0.79 0.004, and 0.82 0.004 for the weighted model. Hazard ratios for each independent risk factor were also developed (Fig. 1). These predictors were then used to create a clinical scale. Each patient was assigned 1 point for each of the 9 risk factors they possessed. The characteristics of the clinical scale are shown in Table 3. Weighting the model based on a previously described formula for assigning weighted points20 resulted in a highly complex formula (69 points) with minimal benefit (0.03 improvement in area under the ROC curve) gained in terms of better accuracy (Table 2). In the interest of simplicity, the unweighted scale was thus accepted as the P-SAP score, and the summary measures of accuracy were derived for the unweighted score. Choosing a diagnostic threshold



to 2 risk factors produced a test sensitivity of 0.939 (false-negative [FN] rate 0.06) and specificity of 0.323, with positive and negative predictive values of 0.1 and 0.99, respectively. Increasing the P-SAP threshold to 6 increased the specificity to 0.974, at the expense of sensitivity (0.239), with positive and negative predictive values of 0.33 and 0.97, respectively. Maximal combined sensitivity (0.667) and specificity (0.773) with positive and negative predictive values of 0.19 and 0.97, respectively, were observed at a P-SAP score threshold of 4.

OPS Group

The demographic features of patients in the OPS cohort are described in Table 1. The prevalence of OSA in this population was 75.7%, similar to validation study prevalence of OSA in other studies. There is a close similarity between the frequency of studied variables of the patients with diagnosis of OSA or treatment for OSA in the GSP-OSA and OPS-OSA groups. When comparing the GSP-control group with the OPS-control group, significantly more men were seen in the GSP-control group, whereas the frequency of other independent variables were all less common in the GSP-control group with significant difference seen for snoring, thick neck, reduced TM distance, BMI 30 kg/m2, and age 43 years. Thus, GSP-control and OPScontrol were not similar in variables of interest for this study. The summary measures of accuracy for the OPS set are presented in Table 4. For AHI 5 events/h, sensitivity of threshold P-SAP score 2 was 0.946, sensitivity 0.258, positive predictive value 0.799, negative predictive value 0.604, and diagnostic odds ratio 7.361. Increasing the threshold P-SAP score to 6 resulted in an increase in specificity to 0.911, with an attendant decrease in sensitivity to 0.217. At the P-SAP threshold of 4, the specificity and sensitivity were comparable with the GSP group values, with sensitivity 0.635 (0.577 0.656), specificity 0.653 (0.577 0.723), positive predictive value 0.859 (0.829 0.888), negative predictive value 0.349 (0.309 0.387), and diagnostic odds ratio of 3.281. Finally, the prevalence of the validated risk factors in the GSP and OPS populations was analyzed. There is a significant difference in the frequency of risk factors in the OPS population compared with the GSP population at P-SAP score 2 (89.4% vs 69.1%), P-SAP score 4 (54.0% vs 26.2%), and P-SAP score 6 (18.4% vs 5.5%), with all comparisons achieving P 0.05 on Pearson 2 analysis.


Our data demonstrate a frequency of perioperative diagnosis of OSA of 7.17% in the surgical population studied. In addition, we have derived a clinical prediction tool, the P-SAP score in a typical surgical cohort, and validated the P-SAP score in a subgroup of patients who presented for surgery within 6 months of having a formal sleep study. The P-SAP score provides a screening method that is easy to perform because it incorporates routine preanesthetic assessment variables (including measures of upper airway morphology). In this surgical population, the P-SAP score

of 2 has high sensitivity at the expense of specificity, whereas a score of 6 has high specificity at the expense of sensitivity. Recent ASA guidelines stress the importance of perioperative diagnosis and management of patients with OSA.8 The gold standard of diagnosis of OSA and severity grading has remained the overnight sleep study, but because of constraints of time, personnel, and cost, it cannot be considered as a primary screening mechanism for OSA. Common wait times for polysomnography can vary from 2 to 10 months in the United States.21 Previously, a variety of clinical prediction models and algorithms have used several of the P-SAP variables to aid risk assessment and screening before polysomnography.9,10,12,14 The full description and critique of the various models in use are beyond the scope of this discussion, but the key considerations in assessing a screening test for OSA are discussed below. First, a common problem with all previously described OSA screening tests, including the STOP questionnaire, is spectrum bias or high underlying prevalence of OSA in the derivation cohort.22 This is a very different clinical scenario compared with the GSP, where the prevalence of OSA is much lower and the clinical distinction between normal patients and those with OSA is possibly more apparent. In essence, these can be considered to be 2 completely different study populations. Prediction models that are derived in high prevalence populations report higher sensitivity than is seen when the test is used in a lower risk population. The advantage of deriving the screening tests in a representative clinical population is that this is exactly how the tests will be used in practice.13 Second, there is a tradeoff between sensitivity and specificity with most clinical models for OSA. The most clinically important summary statistic is possibly the FN rate (defined as 1 sensitivity), which gives the proportion of patients with OSA who were screened as normal by the diagnostic test. Because sensitivity is considered prevalence independent, it follows that FN rates are also independent of prevalence of OSA. The FN rate is typically expressed as a conditional probability or a percentage of patients with OSA that are missed by the screening test. The Berlin questionnaire, which is now commonly used in several hospitals, has FN rates of 14.5% to 38.2%,14 clearly making it undependable to robustly exclude OSA preoperatively. Similar FN rates were observed with the ASA model,14 STOP questionnaire,12and STOP-BANG model12 for AHI 5 (37.9% vs 34.4% vs 16.4%, respectively) and AHI 30 (12.3% vs 20.5% vs 0.0%, respectively). The best screening tool of the 3 above models, the STOP-BANG model, was derived as a post hoc estimate and is yet to be validated in a truly representative surgical population. Therefore, it is of interest that 6 of the 8 elements of the STOP-BANG model have been shown to be independent predictors of OSA in the P-SAP derivation study, lending validity to their clinical importance. The P-SAP score has been shown to have a reproducibly higher sensitivity (with FN rates 10% across all AHI validation thresholds) compared with the STOP questionnaire and the ASA model. As with all clinical screening tests, the effect of false positives on resource utilization and FNs on adverse outcome are important considerations. Perhaps one critical issue is the fact that we

April 2010 Volume 110 Number 4


Prediction of Obstructive Sleep Apnea in a Surgical Population

still do not know whether there is a subgroup of patients with OSA at high risk of postoperative complications. Further studies into identifying the subgroup at greatest risk of postoperative sleep apnea and hypoxemia are imperative at this point. One limitation of the study is that it focuses purely on the ability to screen robustly for a diagnosis of OSA and not on the prediction of postoperative morbidity. The incidence of postoperative morbid events was not analyzed as part of this study. There are several reasons for the lack of postoperative data: first, postoperative monitoring data were not routinely collected in an electronic format, reducing the usefulness of retrospective data gathering. Furthermore, none of the institutional electronic datasets had complete data on this particular cohort of patients for analysis of outcomes in this study. Second, previous studies have demonstrated that postoperative central and obstructive apneas occur irrespective of severity of OSA. Indeed, Chung et al.12,14 showed that patients with OSA had a significantly higher incidence of postoperative complications compared with controls (22.6% vs 12.3%), primarily related to desaturation (20.6% vs 9.2%) and prolonged oxygen therapy (14.3% vs 4.7%). Finally, Kaw et al.23 estimated that a prospective study of 2000 patients would be required to demonstrate a doubling of complications related to purely OSA in coronary artery bypass graft patients. Recent research into the effects of remifentanil in patients with OSA suggests that the at-risk population may in fact not have the typical markers of severe disease.24 Until this population is identified, it is important for anesthesiologists to exclude OSA as efficiently as possible preoperatively. Hence, this study focuses purely on the ability to screen robustly for a diagnosis of OSA. If we identify patients with OSA before they have surgery, the number of patients potentially receiving additional monitoring care in the postanesthesia care unit for signs of desaturation will no doubt increase and discharge times from the postanesthesia care unit could be delayed further if desaturation episodes do indeed occur, but these additional costs will offer more safety as per ASA practice guidelines.8 The P-SAP score with its ease of use (linear scale, no need for additional investigations) and high sensitivity across disease severity may offer a useful method of screening. At a threshold P-SAP score 2, it is possible that many patients who may not be at high postoperative risk may be included. This lack of specificity at the P-SAP score 2 is a drawback, and future research into identifying techniques or measurements to increase the specificity of screening at the P-SAP score 2 is essential to make it a cost-effective addition to the anesthesiologists perioperative assessment. However, using a threshold P-SAP score 4 ensures a significant improvement in specificity (0.773), with a decrease in sensitivity (0.667), which is comparable with other screening tools currently available.12,14,22 Our retrospective study, similar to others based on large prospectively collected clinical databases,18 has certain generic limitations. Despite general standardization of perioperative evaluation at our institution, we cannot guarantee that controlled and uniform conditions were applied across all the assessments. Furthermore, although all the

perioperative variables have excellent data entry rates, both the possible predictors and outcomes were recorded by providers as part of their clinical documentation responsibilities. As a result, the data reflect the electronic medical record, and no additional detail is available. There were no rigorous processes to validate the entry of data. Although the format and specificity of some elements were prospectively altered to provide more detailed data for analysis, we did not use a distinct data collection form with diagrams and extensive definitions to assist providers in accurate selection as recommended in other studies.25 Therefore, no specific measurements were undertaken of neck circumference, TM distance, mouth opening, and c-spine mobility. Instead, the judgment of the clinician was used for these variables. However, there is no reason to believe that one particular group suffered exclusively as a result of observation bias through this large dataset. Additionally, the large numbers of patients included in the study help reduce the effect of the observation error. Finally, the size of our analysis precluded performing screening polysomnography to confirm or reject the diagnosis of OSA for the GSP group. In the GSP group, we accepted a diagnosis of OSA, whether or not it was diagnosed in our institution, as long as the diagnosis was supported by an explicit treatment plan such as CPAP therapy or surgery. It is wholly possible that the patients in the GSP-OSA group had greater severity of the disease because they were treated with surgery or CPAP, and this may or may not have had an effect on the clinical predictors studied. Because of the retrospective nature and large number of studies in the validation (OPS) group, no formal assessment was made of the degree of agreement between sleep laboratory technicians and physicians for the diagnosis of OSA. Polysomnography is routinely scored by a core group of technicians at the university hospital. The standard methodology of scoring was used. The scoring of sleep studies is subject to a robust quality-assurance process to maintain high quality of reporting. We did not present the other variables of the sleep study, because they did not affect the variables studied directly, unlike the STOP study that analyzed measures of tiredness and sleepiness. Finally, patients in the validation study were investigated for sleep-related disorders and had incidental surgery within 6 months of the sleep study. Although this in itself does not influence the validity of the sleep study, it is important to note that the prevalence of OSA in the validation (OPS) group was similar to the STOP study. Another possible criticism of our approach for the derivation sample is the influence of undiagnosed OSA on the GSP-control population variables. This critique is based on the often-quoted statistic that 90% of OSA is undiagnosed.1 The prevalence of OSA in specific surgical populations is thought to be high: 71% to 95% in patients who are morbidly obese undergoing bariatric surgery26 and 23% in patients with traumatic brain injury.27 The typical university population in this study consists mainly of nonobese (30%35% obesity prevalence) and female patients (45%50% male prevalence), and it is unlikely that the quoted prevalence statistics for high-risk groups apply across the entire GSP. Previous population studies identify a 2% to 9% prevalence of OSA in middle aged women and 4% to 24% prevalence in middle aged men.1



Therefore, it seems reasonable to assume that a typical university hospital surgical population has an OSA prevalence that ranges between these, 2% and 24%. The prevalence of moderate-severe OSA in the study by Young et al.1 was 9% in men and 4% in women, and this typically represents the proportion of patients who need treatment for OSA, either surgical or CPAP. The prevalence of diagnosis of treated OSA in our GSP cohort was 7.17%, similar to the prevalence of moderate-severe OSA in the study by Young et al. Despite these differences, the frequency of independent predictors was comparable in the GSP-OSA group and OPS-OSA group. In other words, patients with OSA across the GSP and OPS groups had similar distribution of study variables. However, the GSP-control group had a lower frequency of the independent predictors than the OPScontrol group, therefore supporting the view that the majority of patients in the GSP-control group did not have markers of OSA. Otherwise, there would have been a tendency to higher frequency of these variables in the GSP-controls. As further support of the validity of the P-SAP elements, all the variables on the P-SAP score have previously been identified as independent or univariate predictors of OSA in various populations. The nature of logistic regression full-fit analysis removes dependent variables from the prediction model. Thus, the 9 variables that form the P-SAP score were independent of each other in this particular cohort of patients. The central mechanism of these various manifestations of OSA relates to abnormal fat deposition in the upper airway. The occurrence of obesity and other markers of metabolic syndrome in association with OSA underline the importance of the involved metabolic pathways. Thus, many of these variables are ultimately caused by a single but complex process and are interrelated but not necessarily dependent on each other. One further criticism of our study could be that other validated clinical predictors of OSA such as daytime somnolence, observed apnea, and tiredness were not included in the P-SAP score. At the time of this study, these variables were not part of a typical perioperative assessment at our university hospital and completed data on these elements were sparse and consequently not analyzed. These variables are not universally used in clinical screening tests for OSA as shown in a recent meta-analysis.22 The STOP-BANG model12 relies partly on daytime somnolence and observed apneas, both of which have advantages and disadvantages. Observed apneas are highly specific of OSA but require the presence of a nighttime observer to identify. Daytime somnolence is seen in the general population primarily due to depression and obesity,28 possibly affecting its accuracy as a predictor of OSA. Despite these limitations, this study is notable for a few important reasons. First, at a diagnostic threshold P-SAP score 2, the reproducibly high sensitivity across all severity of OSA is a useful alternative to other perioperative clinical screening tests for OSA. Further work is essential to identify methods or additional tests to improve the specificity of this threshold score because it is currently seen in a vast proportion of patients presenting for surgery (69.1%). Second, unlike prediction models with higher specificity for OSA, such as morphometry and cephalometry, the P-SAP score does not incur an additional burden of data

collection and appraisal, i.e., the data elements are all part of a standard perioperative evaluation. Finally, an additional consideration for anesthesiologists is the role as perioperative physicians in the long-term medical management of patients screened to be at high risk of OSA. Using the information from this study, we recommend that those patients who screen positive for OSA with P-SAP scores 6 should undergo expedited polysomnography because this score is associated with an extremely high specificity. These patients are likely to benefit from CPAP therapy. This approach could facilitate long-term management of these patients and ensure that perioperative referrals for sleep studies have an appropriately low false-positive rate. Although this may not be feasible preoperatively in all patients, clinical processes to facilitate postoperative testing after an appropriate postoperative recovery period should be considered. Further work needs to be done to ascertain the effect of such therapy on morbidity and mortality after surgery in these high-risk patients. Admittedly, this approach will miss a significant proportion of patients with OSA (because of the sensitivity of 0.239), but the purpose at this level is to ensure that a proportion of patients at high risk of long-term complications related to OSA are identified and appropriate secondary preventative treatment is offered as indicated by sleep study results. It should be noted that the purpose of this study was to identify clinical risk factors for OSA and not the outcomes. As such, recommendations for clinical processes need rigorous costeffectiveness analyses to ensure that the optimal balance is achieved between missed cases and false positives. The P-SAP score validates 6 of the 8 elements of the STOP-BANG model12 but differs from it in 2 important ways. First, the P-SAP score uses upper airway elements such as high modified Mallampati class and reduced TM distance. Modified Mallampati class has been validated previously as a marker of diagnosis and severity of OSA.16 Reduced mandibular length is an important bony factor associated with OSA in nonobese patients17 and, therefore, was included to provide an additional screening measure for a wider spectrum of patients. Second, the inclusion of Type 2 diabetes as an element in the P-SAP score is important because diabetes has been linked not only to diagnosis of OSA but also to severity of the disease.15 Perhaps as a result of these additional elements, the validated FN rate for the P-SAP score of 2 was 10% across all severities of OSA (2% FN rate for severe OSA is similar to the STOP-BANG), which is a significant improvement on the accuracy of previously reported clinical models including the STOP-BANG model. Thus, both the P-SAP score and the STOP-BANG model use several similar elements but have important differences that may have significant impact of performance across different patient spectrums, and this should ideally be answered by a prospective head-to-head analysis. In summary, we report a new P-SAP score for OSA with excellent test characteristics. In contrast to current prediction models for OSA, this model was developed for use in a GSP with low prevalence of diagnosis of OSA. The P-SAP score uses commonly collected perioperative variables, thereby allowing its seamless addition to the current anesthesia evaluation, with reproducible accuracy across the

April 2010 Volume 110 Number 4


Prediction of Obstructive Sleep Apnea in a Surgical Population

entire spectrum of OSA severity. This screening tool could help with perioperative risk stratification, allocation of postoperative monitoring resources, and identification of people who might benefit from formal polysomnography and treatment.

Preoperative predictor pick list choices History of snoring Diabetes mellitus necessitating oral hypoglycemic therapy without insulin therapy Diabetes mellitus necessitating insulin therapy with or without oral hypoglycemic therapy Hypertension Sleep apnea Yes/no Type 2 diabetes Oral hypoglycemic therapy

Type 1 diabetes Type 2 diabetes with insulin therapy

Yes/no Treated by continuous positive airway pressure Treated by surgery Airway physical examination Cervical spine Limited extension Limited exion Known unstable Possible unstable Neck anatomy Limited laryngeal mobility Mass Radiation changes Thick/obese Thyroid cartilage not visible Tracheal deviation Thyroid cartilage to Normal (estimated to be 6 cm) mentum distance Reduced (estimated to be 6 cm) Mouth opening Normal (estimated to be 3 cm) interincisor or Reduced (estimated to be 3 cm) intergingival distance Mandibular protrusion test Normal: lower incisors can be protruded anterior to upper incisors limited: lower incisors can be advanced to only meet upper incisors severely limited: lower incisors cannot be advanced to meet upper incisors Modied Mallampati class I, II, III, or IV as modied by classication Samsoon and Young, performed with patient sitting with head in neutral exion/extension position, tongue out, without phonation Body mass index Weight in kilograms/(height in meters)2

REFERENCES 1. Young T, Palta M, Dempsey J, Skatrud J, Weber S, Badr S. The occurrence of sleep-disordered breathing among middle-aged adults. N Engl J Med 1993;328:1230 5 2. Ahmad S, Nagle A, McCarthy RJ, Fitzgerald PC, Sullivan JT, Prystowsky J. Postoperative hypoxemia in morbidly obese patients with and without obstructive sleep apnea undergoing laparoscopic bariatric surgery. Anesth Analg 2008;107: 138 43 3. Blake DW, Chia PH, Donnan G, Williams DL. Preoperative assessment for obstructive sleep apnoea and the prediction of postoperative respiratory obstruction and hypoxaemia. Anaesth Intensive Care 2008;36:379 84

4. Hung J, Whitford EG, Parsons RW, Hillman DR. Association of sleep apnoea with myocardial infarction in men. Lancet 1990;336:261 4 5. Shepard JW Jr, Garrison MW, Grither DA, Dolan GF. Relationship of ventricular ectopy to oxyhemoglobin desaturation in patients with obstructive sleep apnea. Chest 1985;88:335 40 6. Arzt M, Young T, Finn L, Skatrud JB, Bradley TD. Association of sleep-disordered breathing and the occurrence of stroke. Am J Respir Crit Care Med 2005;172:144751 7. Gami AS, Howard DE, Olson EJ, Somers VK. day-night pattern of sudden death in obstructive sleep apnea. N Engl J Med 2005;352:1206 14 8. Gross JB, Bachenberg KL, Benumof JL, Caplan RA, Connis RT, Cote CJ, Nickinovich DG, Prachand V, Ward DS, Weaver EM, Ydens L, Yu S. Practice guidelines for the perioperative management of patients with obstructive sleep apnea: a report by the American Society of Anesthesiologists Task Force on Perioperative Management of patients with obstructive sleep apnea. Anesthesiology 2006;104:108193 9. Kirby SD, Engl P, Danter W, George CF, Francovic T, Ruby RR, Ferguson KA. Neural network prediction of obstructive sleep apnea from clinical criteria. Chest 1999;116:409 15 10. Kushida CA, Efron B, Guilleminault C. A predictive morphometric model for the obstructive sleep apnea syndrome. Ann Intern Med 1997;127:5817 11. Battagel JM, LEstrange PR. The cephalometric morphology of patients with obstructive sleep apnoea (OSA). Eur J Orthod 1996;18:557 69 12. Chung F, Yegneswaran B, Liao P, Chung SA, Vairavanathan S, Islam S, Khajehdehi A, Shapiro CM. STOP questionnaire: a tool to screen patients for obstructive sleep apnea. Anesthesiology 2008;108:81221 13. Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004;140: 189 202 14. Chung F, Yegneswaran B, Liao P, Chung SA, Vairavanathan S, Islam S, Khajehdehi A, Shapiro CM. Validation of the Berlin questionnaire and American Society of Anesthesiologists checklist as screening tools for obstructive sleep apnea in surgical patients. Anesthesiology 2008;108:82230 15. Reichmuth KJ, Austin D, Skatrud JB, Young T. Association of sleep apnea and type II diabetes: a population-based study. Am J Respir Crit Care Med 2005;172:1590 5 16. Liistro G, Rombaux P, Belge C, Dury M, Aubert G, Rodenstein DO. High Mallampati score and nasal obstruction are associated risk factors for obstructive sleep apnoea. Eur Respir J 2003;21:248 52 17. Sakakibara H, Tong M, Matsushita K, Hirata M, Konishi Y, Suetsugu S. Cephalometric abnormalities in non-obese and obese patients with obstructive sleep apnoea. Eur Respir J 1999;13:40310 18. Kheterpal S, Han R, Tremper KK, Shanks A, Tait AR, OReilly M, Ludwig TA. Incidence and predictors of difficult and impossible mask ventilation. Anesthesiology 2006; 105:88591 19. Rechtschaffen A, Kales A. A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects. Los Angeles, CA: Brain Information Services, University of California, 1968 20. Rassi A Jr, Rassi A, Little WC, Xavier SS, Rassi SG, Rassi AG, Rassi GG, Hasslocher-Moreno A, Sousa AS, Scanavacca MI. Development and validation of a risk score for predicting death in Chagas heart disease. N Engl J Med 2006;355: 799 808 21. Flemons WW, Douglas NJ, Kuna ST, Rodenstein DO, Wheatley J. Access to diagnosis and treatment of patients with suspected sleep apnea. Am J Respir Crit Care Med 2004;169: 668 72 22. Ramachandran SK, Josephs LA. A meta-analysis of clinical screening tests for obstructive sleep apnea. Anesthesiology 2009;110:928 39 23. Kaw R, Michota F, Jaffer A, Ghamande S, Auckley D, Golish J. Unrecognized sleep apnea in the surgical patient: implications for the perioperative setting. Chest 2006;129:198 205



24. Bernards CM, Knowlton SL, Schmidt DF, DePaso WJ, Lee MK, McDonald SB, Bains OS. Respiratory and sleep effects of remifentanil in volunteers with moderate obstructive sleep apnea. Anesthesiology 2009;110:419 25. Rosenstock C, Gillesberg I, Gatke MR, Levin D, Kristensen MS, Rasmussen LS. Inter-observer agreement of tests used for prediction of difficult laryngoscopy/tracheal intubation. Acta Anaesthesiol Scand 2005;49:1057 62 26. Lopez PP, Stefan B, Schulman CI, Byers PM. Prevalence of sleep apnea in morbidly obese patients who presented for weight loss surgery evaluation: more evidence for routine screening for obstructive sleep apnea before weight loss surgery. Am Surg 2008;74:834 8

27. Castriotta RJ, Wilde MC, Lai JM, Atanasov S, Masel BE, Kuna ST. Prevalence and consequences of sleep disorders in traumatic brain injury. J Clin Sleep Med 2007;3:349 56 28. Bixler EO, Vgontzas AN, Lin HM, Calhoun SL, Vela-Bueno A, Kales A. Excessive daytime sleepiness in a general population sample: the role of sleep apnea, age, obesity, diabetes, and depression. J Clin Endocrinol Metab 2005;90:4510 5

April 2010 Volume 110 Number 4