Professional Documents
Culture Documents
Background: The Glasgow Coma scores by simple addition (motor [m] ⴙ ⴝ 0.89, ROCm ⴝ 0.87; pseudo R2GCS ⴝ
Scale (GCS) has served as an assessment verbal [v] ⴙ eye [e] ⴝ GCS score). Prob- 0.42, pseudo R2m ⴝ 0.40) and has a better
tool in head trauma and as a measure of lematically, different combinations sum- calibrated logistic model.
physiologic derangement in outcome mod- ming to a single GCS score may actually Conclusion: Because the motor com-
els (e.g., TRISS and Acute Physiology and have very different mortalities. For exam- ponent of the GCS contains virtually all
Chronic Health Evaluation), but it has not ple, the GCS score of 4 can represent any the information of the GCS itself, can be
been rigorously examined as a predictor of three mve combinations: 2/1/1 (survival measured in intubated patients, and is
of outcome. ⴝ 0.52), 1/2/1 (survival ⴝ 0.73), or 1/1/2 much better behaved statistically than the
Methods: Using a large trauma data (survival ⴝ 0.81). In addition, the rela- GCS, we believe that the motor compo-
set (National Trauma Data Bank, N ⴝ tionship between GCS score and survival nent of the GCS should replace the GCS
204,181), we compared the predictive is not linear, and furthermore, a logistic in outcome prediction models. Because the
power (pseudo R2, receiver operating model based on GCS score is poorly cali- m component is nonlinear in the log odds
characteristic [ROC]) and calibration of brated even after fractional polynomial of survival, however, it should be mathe-
the GCS to its components. transformation. The m component of the matically transformed before its inclusion
Results: The GCS is actually a col- GCS, by contrast, is not only linearly re- in broader outcome prediction models.
lection of 120 different combinations of its lated to survival, but preserves almost all Key Words: Glasgow Coma Scale,
3 predictors grouped into 12 different the predictive power of the GCS (ROCGCS Predictive power, Outcome.
J Trauma. 2003;54:671–680.
T
he Glasgow Coma Scale (GCS)1 was introduced a quar- Given the fundamental importance of the GCS, it may
ter of a century ago and is now a part of the bedrock of seem remarkable that this score has never been subjected to
outcome prediction after head injury. Created to be re- careful statistical evaluation. However, the GCS was ac-
liably used even by workers without specialized training, the cepted as a useful description of consciousness and powerful
GCS seems straightforward: it is simply the sum of three predictor of outcome long before large databases required for
coded values that describe a patient’s motor (1– 6) verbal rigorous statistical analysis were available. Although many
(1–5), and eye (1– 4) level of response to speech or pain. authors (its creators among them) have noted shortcomings in
Since its creation, the GCS has been widely used. Not only is the GCS, including the inability to calculate the GCS score
it used to describe individual trauma patients in the ambu- for many patients6,7 and poor statistical performance,8 we
lance, the emergency room, and the intensive care unit, but it believe a reappraisal is in order. The data available in the
is also used as a component of several other outcome predic- National Trauma Data Bank (NTDB) provide a powerful tool
tion scores: the Revised Trauma Score;2 Acute Physiology for this analysis. We hypothesized that most of the power of
and Chronic Health Evaluation;3 TRISS;4 Circulation, Res- the GCS resides in the motor component and that the addition
piration, Abdomen, Motor, Speech Scale; and A Severity of the verbal and eye components add little to the predictive
Characterization of Trauma5 all use the GCS as a predictor. power of the GCS. Moreover, we thought it likely that the
addition of the verbal and eye subscores would undermine
Submitted for publication October 9, 2002. useful mathematical characteristics of the motor-only model
Accepted for publication January 6, 2003. of consciousness.
Copyright © 2003 by Lippincott Williams & Wilkins, Inc.
From the Department of Surgery, University of Vermont, College of
Medicine (C.H., T.M.O., F.B.R., M.A.H., S.R.S.), Burlington, Vermont, PATIENTS AND METHODS
Department of Anesthesia, University of Rochester (L.G.G., P.D.K.), Roch- The American College of Surgeons established the
ester, New York, and Department of Surgery, Wake Forest University NTDB as a national repository of trauma data. It contains
School of Medicine (J.W.M.), Winston-Salem, North Carolina. information supplied by 89 hospitals from around the country
Presented at the 61st Annual Meeting of the American Association for
the Surgery of Trauma, September 26 –28, 2002, Orlando, Florida. representing Level I, II, and III trauma centers. Although the
Address for reprints: Turner M. Osler, MD, FACS, Department of NTDB includes a wide variety of information for each case,
Surgery, University of Vermont, 111 Colchester Avenue, FL 466, Burling- for this study only GCS subscores (motor [m], values coded
ton, VT 05401; email: turner.osler@vtmednet.org. 1– 6 or blank; verbal [v], values coded 1–5 or blank; and eye
DOI: 10.1097/01.TA.0000058130.30490.5D [e], values coded 1– 4 or blank) and outcome (survival to
Fig. 2. The 120 possible combinations of GCS subscores with their survival rates (and 95% confidence interval) grouped by GCS score.
Although the GCS is commonly thought of as 13 possible scores ranging from 3 to 15, it is actually 120 different possible combinations of
its component subscores that are grouped into 13 individual “scores” by the simple expedient of addition. Unfortunately, different
combinations of subscores that sum to the same GCS score often have very different survival rates.
The ability of the GCS to predict survival was evaluated analysis to improve the fit of these single predictors. Finally,
and compared with models based on its individual component a model containing all GCS subscores and their interaction
subscores (m, v, and e) and the sum of its motor and verbal terms appropriately transformed using the technique of mul-
scores (m ⫹ v) to determine where the predictive power of tiple fractional polynomials combined into a single predictor
GCS arises. Two further survival prediction models, one (“multiple fracpoly GCS model”) was created and evaluated.
based on GCS (“fracpoly GCS model”) and one based on the This final model represents the state-of-the-art approach to
motor component of GCS alone (“fracpoly m model”) were the three predictors available in the GCS. The performance of
also created using the technique of fractional polynomial these eight models is presented in Table 1. We note first that
Fig. 3. Survival as a function of the eye, verbal, and motor scores, and their sum, the GCS score. Note that both the eye and verbal scores
are distinctly nonlinear and that this is reflected in the GCS score. The motor score, by contrast, is very linear.
removing the eye component from the GCS results in a model the relative rankings of models were largely unchanged.
(m ⫹ v) indistinguishable from the GCS (m ⫹ v ⫹ e): not However, the difference in ROC values for the m and GCS
only are the ROC and pseudo R2 values the same for these scores was reduced by 50%. This suggests that the improve-
two models, but the smaller, more parsimonious m ⫹ v model ment in prediction resulting from literally adding v and e
is actually slightly better calibrated. The further elimination scores to the m score is largely attributable to the superior
of the verbal subscore from the GCS leaves the “motor-only discrimination of the GCS in patients with a normal level of
score” model. This further simplification results in a small consciousness (data not shown).
but statistically significant decrease in performance: ROC All models are poorly calibrated as assessed by the
falls (ROCGCS ⫽ 0.891 vs. ROCm ⫽ 0.873, p ⫽ 0.000), as Pearson 2 statistic, but calibration is improved by mathe-
does pseudo R2 (pseudo R2GCS ⫽ 0.416 vs. pseudo R2m ⫽ matically transforming predictors using the technique of frac-
0.403), and misclassifications increase (GCS ⫽ 4.9% vs. m ⫽ tional polynomials before creating a logistic model. For ex-
5.1). ample, the logistic model based on m alone can be
Elimination of all patients with a GCS score of 15 from transformed by including the inverse square root of m and the
the data set resulted in worse performance of all models, but third power of m in the logistic model with considerable
Fig. 4. Neither the GCS score nor its motor component are linear in the log odds of survival. However, the motor score is far less irregular.
improvement in calibration. Although this transformed model noted that some components of the score might be impossible
is still not “well calibrated” (i.e., Pearson 2 value of 30 on 3 to assess. Of perhaps greater significance, in their 1977 dis-
degrees of freedom results in a value of p ⬍ 0.001), to casual cussion they observed that “[the] validity of the assumption
examination calibration is quite good (Fig. 5). Transforma- that each of the three parts of the scale should count equally
tion of the GCS score using the inverse square of the GCS and that each step should differ equally from that next to it
score and the third power of the GCS score is much less has still to be tested.”
successful because of the nonmonotonic nature of the GCS Teasdale and Jennett were not able to evaluate these
score (Fig. 6). concerns, perhaps because they did not have large patient
databases available to them. Their concerns were well
DISCUSSION founded, however. Numerous authors have since observed
The GCS was developed in 1974 by Teasdale and Jennett that trauma patients who are inebriated, intubated, or phar-
as a practical way to measure the “depth and duration of macologically paralyzed cannot have their GCS score as-
impaired consciousness” in a variety of conditions including sessed. This is a particularly troublesome problem because it
head trauma. Simplicity was the overriding design concern, is precisely these patients who are at the highest risk of dying.
with the goal of interrater reliability even by staff without This problem is compounded by the wide variety of ways it
special training. As originally proposed, the GCS score was has been “solved” at different trauma centers, such as scoring
reported as three independent subscores (motor, verbal, and components as the minimum possible value, the maximum
eye). The further simplification of recording only the sum of possible value, as a “T,” or simply as “unknown.”16,17 At-
the three components as a single score was adopted by Teas- tempts at a unified solution, such as imputation of missing
dale and Jennett in 1977. values using a linear regression model,18,19 have not been
Interestingly, the creators of the GCS foresaw possible adopted, perhaps because the formula is complicated and may
shortcomings in their score. In their original article, they not apply equally well to all case mixes. Other authors have
Fig. 5. The motor-only logistic model (dashed line) is poorly calibrated, but mathematical transformation of the motor score greatly improves
calibration (dotted line). The solid line represents perfect calibration (predicted mortality ⫽ actual mortality).
Fig. 6. The GCS logistic model (dashed line) is poorly calibrated, but mathematical transformation of the GCS score somewhat improves
calibration (dotted line). The solid line represents perfect calibration (predicted mortality ⫽ actual mortality).
advocated for the simple replacement of the GCS score with Although the motor subscore is a powerful predictor of
the motor score alone20 or in place of the GCS score in the mortality, we do not expect it to be used in isolation, because
Revised Trauma Score.21 These last studies are persuasive other predictors (e.g., injury severity, age, comorbidities) are
but were based on small data sets. Still another approach is also known to influence outcome. Rather, the motor subscore
the use of the Reaction Level Scale,22 which has eight values will be used as a component of a larger, more comprehensive
and resembles an enhanced motor subscore. survival model. When the motor subscore is incorporated into
We believe that the eye subscore should certainly be such a model, however, it will be important that it first be
removed from the GCS because it adds nothing to the pre- mathematically transformed to be linear in the log odds of
dictive power of the model and is occasionally impossible to survival, because this is a condition of the logistic model.
obtain. The choice to remove the verbal subscore is more This study has limitations. Most importantly, it is based
difficult, because the presence of the verbal subscore does on a cohort of trauma patients in whom all GCS subscores
improve the model of impaired consciousness at a statistically were recorded in the NTDB, and therefore its applicability to
significant level. Nevertheless, we believe that the verbal patients in whom not all subscores were available is not
subscore should also be removed because its contribution is certain. Moreover, in patients in whom all subscores were
not great and it is occasionally impossible to assess (i.e., assigned the procedures for assigning scores is likely to have
intubated patients or inebriated patients). Thus, we advocate been subject to local conventions that were almost certainly
the motor subscore as the best choice for a level-of-con- not uniform. These two problems may have biased the results
sciousness indicator. Although this model preserves most of of this study in unpredictable ways. Nevertheless, this study
the power of the GCS, it avoids the problems inherent in represents by far the largest and most comprehensive exam-
collecting the verbal and eye components of the GCS. More- ination of the GCS available to date.
over, its linearity with respect to survival is far more intuitive
and easily remembered than the complex survival graph of CONCLUSION
the GCS. In summary, the GCS is composed of three subscores
There are two circumstances in which the motor-only that contain redundant information. The simple addition of
model is unreliable: in patients with pharmacologic (thera- these subscores to create the GCS, although convenient, re-
peutic) paralysis and in patients with traumatic paralysis (i.e., sults in a nonlinear relationship between the GCS score and
high spinal cord injuries). In these cases, the motor score is mortality. We found that the motor component of the GCS
simply not a measure of consciousness and cannot be used as score is a powerful predictor of outcome and contains most of
one. In the case of pharmacologic paralysis, it is a simple the predictive power of the score. The addition of the verbal
matter to allow the drug to wear off before assessing the subscore adds slightly to the predictive power, but the further
motor subscore. The case of quadriplegia is more difficult to addition of the eye subscore (resulting in the familiar GCS
deal with, but it is possible that a standardized group of facial score) adds nothing to predictive power. We believe that a
responses to voice and noxious stimuli can be developed with motor subscore-only model of level of consciousness is the
only slight loss of accuracy overall. most practical because the verbal score may be impossible to
obtain in seriously injured patients. Adding to the appeal of used extensively in trauma research. Unfortunately, because
the motor-only model is its near linearity with respect to this test requires grouping the data into at least 6 (and pref-
mortality. Quadriplegia is a potentially problematic injury erably 10) groups based on outcome score, for scores such as
because it naturally confounds the motor-only score. Six the m, v, and e subscores of the GCS it is simply not
levels of the motor score based on physical examination of applicable. Moreover, when most patients’ predictors fall into
the face will need to be defined to make the motor-only score a single covariate pattern, it can be impossible to partition the
universally calculable. Finally, the motor-only score will re- data to allow the calculation of the Hosmer-Lemeshow
quire mathematical transformation to ensure linearity in the statistic.
log odds of survival before incorporation into more compre- For the purposes of this article, we chose to examine
hensive logistic models of survival. three measures of goodness-of-fit: overall misclassification
rate, McFadden’s R2, and Akaike’s information criterion. We
APPENDIX selected overall misclassification because it is unambiguous
Discrimination and Calibration of Survival Models in its interpretation and easily calculated: one simply finds a
Survival predictions are based on mathematical models cutpoint in the score that minimizes misclassifications and
that take the values of one or more predictors and allow the reports this rate. We chose to report McFadden’s (pseudo) R2
calculation of the outcome of interest (typically, death in because of its analogy to R2 in linear regression, where R2
trauma outcome models). There are many possible prediction represents the percentage of variability explained by a model.
models, and so the business of selecting the “best” model is Although this interpretation of the pseudo R2 is not strictly
of obvious interest. Unfortunately, measuring how well a correct for logistic models, because the pseudo R2 values
predictive model performs is not straightforward. Two broad reported here are all calculated using the same data set, it is
measures of models are discrimination and goodness-of-fit. appropriate to compare this statistic between models. Finally,
Discrimination is the degree to which a model separates we chose to examine Akaike’s information criteria, which
survivors from nonsurvivors. This is usually quantified as the examines the amount of information contained in a score
area under the ROC curve. The ROC curve varies from 0.5 based on the likelihood of the data observed given the model
(separation of survivors from nonsurvivors is no better than under consideration corrected for the number of predictors in
chance alone) to 1.0 (perfect separation of survivors from the model. Although the absolute level of Akaike’s informa-
nonsurvivors). Although the actual calculation of the ROC tion criteria is uninformative, lower values imply more infor-
curve is not straightforward, in principle it could be calcu- mation conveyed by a model.
lated by repeatedly randomly selecting a survivor and a non- We believe that the most informative and convenient
survivor and recording whether the model under consider- approach to goodness-of-fit is a simple graph of predicted
ation correctly predicts the survivor. The percentage of survival versus actual survival for each covariate pattern. Not
randomly selected pairs correctly predicted by the model is its only does this single graph display the performance of a
ROC curve statistic. Although the ROC curve is conceptually model throughout its range, but it is also immediately inter-
simple, in practice its use requires some care. For example, pretable: the closer such a line lies to the diagonal, the more
the actual value of the ROC curve for a model depends on the reliable the prediction model. Another attractive feature of
distribution of cases in the data set. Thus, in order for two this graphical approach to goodness-of-fit is that it is not
different models to be compared with respect to ROC curve, affected by case mix.
the identical data set must be evaluated by each model. As an
additional complexity, the results of such model comparisons Fractional Polynomial Analysis
may actually change with different data sets, again depending A fractional polynomial (FP) is a polynomial whose
on case mix. powers are integers or fractions that may be positive, zero, or
Goodness-of-fit tests attempt to capture how well a negative. Introduced by Royston and Altman in 1994,23 FPs
model predicts outcome throughout the range of predictions. are extremely useful in regression models because they offer
That is, for a model to have acceptable goodness-of-fit, we greater flexibility than ordinary polynomials. In multivariate
wish the differences between predicted and actual outcomes regression models, FPs allow us to preserve the continuous
to be small and, furthermore, that errors be unsystematically nature of predictor variables even when such variables are
distributed. Unfortunately, there are many goodness-of-fit originally nonlinear and thus allow the creation of better
tests and no agreement on which is “best.” The oldest such calibrated models. Alternatively, if a model based on FP
test is the Pearson 2. Because the Pearson 2 tests whether a transformations fails to improve on untransformed predictors,
model’s calibration is indistinguishable from “perfectly cali- we can be assured that the regression is in fact linear in its
brated,” the very large data sets used in trauma outcome predictors.9
research almost always have sufficient power to reject the As implemented in STATA, a back-fitting algorithm is
“perfectly calibrated” hypothesis, and thus the judgment “not used that finds a fractional polynomial transformation for
perfectly calibrated” is not very informative. The Hosmer- each predictor in turn, while holding the functional forms of
Lemeshow test is another goodness-of-fit test that has been the other predictors temporarily fixed. The algorithm con-
cause the major value of the GCS is in the area of traumatic simplify it into a single measure, you lose information. The
brain injury, any recommendations to change its collection or Glasgow Coma Scale score is a measure of brain injury. My
interpretation must be based on investigations into the trau- concern is that the way the Glasgow Coma Scale is scored
matic brain injury population, per se, using long-term func- across this country, it’s more a global assessment of neuro-
tional outcome. logic injury.
This article makes a very strong statement that the motor Did you consider taking out patients with diagnoses of
subscale of the GCS can and probably should replace the spinal cord injury and seeing how the day-to-day measure-
whole GCS in modeling studies of short-term mortality in the ment of the Glasgow Coma Scale is affected? Could you then
general trauma population. I do not, however, believe that it get best motor score related to brain injury and get rid of the
has any implications regarding the clinical use of the GCS at noise from including spinal cord injuries? Are your findings
this time or in prognostic modeling specific to the traumatic more a reflection of coding based on spinal cord injury and
brain injury population. I hope that further efforts from this not on head injury?
excellent group will allow us to take these steps. Dr. Turner M. Osler (closing): Dr. Chestnut, thank you
Dr. Pascal Udekwu (Raleigh, North Carolina): Having for those insightful thoughts. I, too, am troubled by the
had the opportunity to look at the North Carolina Trauma National Trauma Data Bank because it’s too good to be true:
Registry data set on head injury patients based on Interna- You send some email, and you get 200,000 cases to work on.
tional Classification of Diseases, Ninth Revision codes, I The problem with not having collected the data yourself or
have to say that our conclusions really support those pre- participating in the data collection is that you can never know
sented here today. I think that our results are also strength- what’s going on there.
ened by the fact that a subset of patients had Functional I am extremely troubled by the fact that in the NTDB
Independence Measure scores that provide some support in only 2,000 cases lacked a component of the Glasgow Coma
terms of functional outcome as opposed to mortality alone. Scale. That would be 1%. That’s inconceivable. So, we know
So, I would like to say, wonderful article. Thank you very that this data set must have people filling in blanks, perhaps
much. I look forward to its publication.
according to local protocols that they just assign a 1 or they
Dr. Howard R. Champion (Annapolis, Maryland): I
just assign some of the highest possible number, the lowest
would like to, first of all, state that I agree entirely with your
possible number. We don’t know what these conventions
conclusions, as far as they go. I need to take issue with the
were. They could, of course, bias the data. We have no way
first reviewer in terms of the way in which it may in any way
of knowing in what direction they would have biased the
reflect on Scotch whiskey. The GCS was devised in a pub,
data. However, given the data set, it’s not a problem we can
and is thus more a reflection of Glasgow beer than the fine
solve except by having a better data set. So, this study
Scotch whiskey that is also found in that country.
obviously needs to be repeated.
From a historical point of view, the Glasgow Coma Scale
was developed to measure coma in head-injured patients 24 To comment on the traumatic brain injury population, it
hours or so after injury. When we applied it in prehospital is true that Teasdale and Jennett conceived the Glasgow
care in this country, to get around terms such as “lethargy” Coma Scale to deal with brain surgery and brain injury. They
and “stupor” as the descriptors for head injury, Graham weren’t really thinking about the trauma population. What
Teasdale was actually very upset with that concept. has been discovered, however, is that in severely traumatized
He said, “It wasn’t designed for that” and “should not be patients who are in a shock-like state, their Glasgow Coma
applied in that area.” The questions I have are two: first, when Scale score will fall off.
you take the measure of coma down from the 15-integer As a measure of overall illness, it turns out that the
Glasgow Coma Scale to best motor response, although it Glasgow Coma Scale is quite powerful. Thus, it has been
makes an awful lot of sense, you’re taking the measurement incorporated into our trauma models. That’s not to say that
of coma to five intervals. Most neurosurgeons would say that what we are actually measuring is brain injury. Many patients
is insufficient to characterize head injury in any way in with a depressed Glasgow Coma Scale score have no brain
relationship to any outcome. injury at all; they are merely in shock. The GCS, however, is
The other thing is that the Glasgow Coma Scale relation- a powerful predictor, and it may be that the GCS score is just
ship to mortality differs considerably between blunt and pen- a surrogate for shock.
etrating injury to the head. How do you account for this? I agree that for the day-to-day work of a brain surgeon,
How are you going to address these two issues if you’re having all three components of the GSC may be helpful.
just going to distill it and simplify it? Your proposal has However, looking at a broader trauma population, which is
certain advantages for certain users but big disadvantages for what our mandate was, I think that it’s clear that the motor
others. score contains virtually all of the power. I hope that covers
Dr. K. Dean Gubler (Portland, Oregon): I appreciate your questions.
this article, the presentation, and the amount of work to Dr. Champion, Thank you. Those are both excellent
contemplate. When you combine measures and then try to points. I think that what I’m doing is not for neurosurgeons.
What I’m doing is for trauma demography and outcome of course, this doesn’t reflect anything about the brain. We
prediction. I think it’s perfectly appropriate for neurosur- propose that if we’re going to use the motor score alone, there
geons to continue to use whatever scale they want, and I think needs to be a special score for quadriplegics based on just
they are aware that a GCS score of 7 equals a GCS score of their facial response to noxious stimuli or voice that would
8 equals a GCS score of 9 equals a GCS score of 10 equals fill in the gap. However, quadriplegics don’t represent a large
a GCS score 11. They’re aware of that foible, and our math- category in anybody’s data set, so that hasn’t affected our
ematical models need to be aware of that foible as well. overall results.
I think the blunt-penetrating distinction is an important I’d like to underscore, again, that the Glasgow Coma
one, but most of our trauma registries don’t contain very Scale, although it was conceived to measure brain injury,
much penetrating head trauma. Therefore, it doesn’t really isn’t measuring brain injury—it’s measuring brain function,
affect the conclusions of an article that’s looking at such a and brain function can be affected by many things, including
large number of patients. shock. So, the reason the Glasgow Coma Scale works so well
Thank you Dr. Gubler, those are good thoughts. We, too, as the predictor in general trauma populations is because it’s
are concerned about the spinal cord injury problem, because measuring a lot of different things, not just brain injury.
in the case of a quadriplegic, the motor score goes to 1, and Thank you very much.