Neuropsychological assessment of hepatic encephalopathy: ISHEN
practice guidelines
Christopher Randolph1, Robin Hilsabeck2, Ainobu Kato3, Parampreet Kharbanda4, Yu-Yuan Li5, Daniela Mapelli6,
Lisa D. Ravdin7, Manuel Romero-Gomez8, Andrea Stracciari9 and Karin Weissenborn10
1 Department of Neurology, Loyola University Medical Center, Maywood, IL, USA
2 Department of Psychiatry, University of Texas Health Science Center and South Texas Veterans Health Care System, San Antonio, TX, USA
3 Department of Internal Medicine, Iwate Medical University, Iwate, Japan
4 Department of Neurology, Postgraduate Institute of Medical Education and Research, Chandigarh, India
5 Department of Gastroenterology, First Municipal People’s Hospital of Guangzhou, Guangzhou, China
6 Department of General Psychology, University of Padova, Padova, Italy
7 Department of Neurology, Weill Cornell Medical Center, New York, NY, USA
8 UGC Enfermedades Digestivas, Hospital Universitario de Valme, Sevilla, Spain
9 Department of Neurology, S. Orsola-Malpighi Hospital, Bologna, Italy
10 Department of Neurology, Medizinische Hochschule Hannover, Hannover, Germany

assessment – encephalopathy – hepatic –
neuropsychological – outcome –

Dr Christopher Randolph, 1 East Erie, Suite 355,
Chicago, IL 60611, USA
Tel: 11 708 216 3539
Fax: 11 708 216 4629
Received 11 December 2008
Accepted 24 January 2009

Low-grade or minimal hepatic encephalopathy (MHE) is characterised by relatively
mild neurocognitive impairments, and occurs in a substantial percentage of patients
with liver disease. The presence of MHE is associated with a significant compromise of
quality of life, is predictive of the onset of overt hepatic encephalopathy and is
associated with a poorer prognosis for outcome. Early identification and treatment of
MHE can improve quality of life and may prevent the onset of overt encephalopathy,
but to date, there has been little agreement regarding the optimum method for
detecting MHE. The International Society on Hepatic Encephalopathy and Nitrogen
Metabolism convened a group of experts for the purpose of reviewing available data
and making recommendations for a standardised approach for neuropsychological
assessment of patients with liver disease who are at risk of MHE. Specific recommendations are presented, along with a proposed methodology for further refining these
assessment procedures through prospective research.


Hepatic encephalopathy (HE) is a condition that is relatively
common in patients with liver disease (1, 2), results in significant compromise of quality of life (3, 4), requires a high burden
of care (2), and is associated with poor prognostic outcomes,
including an elevated risk of death (5). Overt HE involves
clinically obvious compromise of consciousness/arousal, behavior, and motor functions. This is typically classified along a
gradient of severity ranging from mild confusion to coma (6, 7).
Low-grade or minimal hepatic encephalopathy (MHE) is
characterised predominantly by a subtle impairment of neurocognitive status, and is not readily detectible via standard
mental status testing or neurological examination. This has also
been termed ‘subclinical’ HE in the past, because of the lack of
overt clinical symptomatology, but MHE is now the preferred
nosology (8). There is currently a lack of consensus regarding
how best to detect MHE. Early detection of MHE is perceived to
be clinically important; however, as MHE can impair quality of
life (3, 4, 9–11), it is predictive of the onset of overt HE (12–15),
and may have some prognostic value in the outcome for
patients with end-stage liver disease (16). Treatment of MHE
can improve quality of life (17), and may theoretically avert
more severe HE.
A variety of approaches have been used in efforts to detect
MHE, including neuropsychological testing, EEG/evoked
Liver International (2009)
2009 John Wiley & Sons A/S


potentials (18–22) and critical flicker frequency (23–25), which
is a psychophysiological measure. It is not yet clear as to which
approach is superior in terms of sensitivity and/or clinical
validity. Neuropsychological tests have more face validity in
this context, as they directly measure cognitive functions (e.g.
memory, attention and visuospatial skills) that are directly
relevant to activities of daily living. Biological or psychophysiological markers, on the other hand, may be less affected by
variables such as age, education and language, which are known
to impact performance on neuropsychological tests, and may
therefore complicate interpretation/classification.
Although the neuropathophysiology of MHE is not definitively established, elevated levels of ammonia have been implicated, and a variety of structural and functional imaging studies
have suggested that the primary manifestations of MHE may be
mediated by subcortical systems, including the basal ganglia
(26–31). This hypothesis is consistent with neuropsychological
investigations that have attempted to ‘profile’ the patterns of
impairment in MHE. These typically report a pattern of impairment characterised by prominent deficits in the domains of
attention, visuospatial abilities and fine motor skills (32, 33).
Although these domains (attention, visuospatial abilities and
fine motor skill) are most commonly implicated in MHE,
impairments of memory have also been reported (34–38).


37. General features of the battery that received moderately strong support included applicability for patients who are illiterate (modal response 7. 630 Randolph et al. In addition. The final question asked was whether or not each member would recommend an existing battery for this purpose. The general characteristics of such a battery that were seen as strongly desirable were as follows:  A specific battery should be identified for this purpose. 53). the use of continuous response-type measures of sustained attention and freedom from distractibility (e. The International Society for Hepatic Encephalopathy and Nitrogen Metabolism (ISHEN) formed a commission to review the available data on the role of neuropsychological testing in this context. 54). the better. but it was also noted that these are typically not amenable to the creation of equivalent multiple forms.  The battery should measure multiple cognitive domains. compare findings across studies. It was also noted that. as in Alzheimer’s disease. The goal of this process was to identify the characteristics of a ‘gold standard’ battery via which researchers could pool results. in both verbal and visual domains (33. or experimental approaches. 39. Most felt that a computerised battery would be more cumbersome (i. increase power and ease interpretation. The paradigms are listed in Table 1. but that verbal fluency measures were useful as a processing speed or ‘executive’ component.ISHEN MHE NP recommendations These appear to be primarily characterised by diminished immediate memory performance as a consequence of slowed or inefficient cognitive processing (33. The commission chairs (C. 17. Responses were recorded on a seven-point Likert-type scale. in the order of perceived desirability. and that it should serve as a benchmark against which to compare newer. 52) (which also include a motor/practic component). Visuospatial impairments have been primarily reported on block design tasks (17. These include measures of cognitive processing speed involving psychomotor responding. and the mean response also being 7 (means were rounded to whole numbers). require greater expense and might not be as useful as pencil-and-paper testing in this context. slowed processing impacts upon memory performance and it was felt that this was a clinically useful measure that might have ecological significance (i. Impairments on measures of cognitive processing speed and response inhibition that do not require a motor response have also been reported (e. The attentional impairments in MHE are observed on a variety of measures. was discouraged by several members who felt that performance on these measures could be confounded by other variables not directly related to MHE. such as line orientation or the Hooper test (48. and K. As far as the length of the battery was concerned. rather than to an intrinsic disruption of language. Methodology Commission members were informed of the overall purpose of the survey.  The battery should have age-based norms. 40). with verbal fluency tasks and measures such as the Stroop test) (33. mean = 3).e. The use of motor measures. despite the demonstrated sensitivity of some of these to MHE. with the exception of slowed verbal fluency. Several members spontaneously pointed out the need for alternate forms of the battery. and on line tracing tasks (55. 49. and that it was not important that the battery be computerised (modal response = 1. less portable). 47–49). 50). Specific components Commission members were also queried regarding the desirability of specific test paradigms as components of a gold standard battery. This process is similar to consensus batteries that have been developed for the purpose of evaluating patients with other diseases where neurocognitive outcome has emerged as a potential target of treatment. but also on more pure measures of visuospatial perception.e. with the modal response for each being a 7. Most did not feel that measures of reaction time were feasible without the use of a computer. rapid forgetting). and to make recommendations regarding the routine neuropsychological assessment of patients with liver disease. R. Results General features There were several questions regarding the general nature of a ‘gold standard’ battery. as was the desirability of a global score. such as the Number Connection Test or Trailmaking Test and the Digit Symbol subtest from the Wechsler Intelligence Scales or the Symbol Digit Modalities Test (1. to eliminate or reduce practice effects. which was discouraged. and drafted a survey designed to reach some consensus on the nature of a suitable battery for this purpose. 41–46). W. such as neuropsychiatric systemic lupus erythematosus (57) and schizophrenia (58). mean = 6).e. Four of these resulted in near-universal agreement. Generally.  The battery should be easily translatable and applicable cross-culturally. in terms of affecting daily functioning). The need for appropriate training in order to correctly administer and score the battery was also pointed out.g. This is distinct from a primary impairment of anterograde memory produced by damage to limbic memory systems.g. several members pointed out that language per se was felt to be unaffected.) recruited a panel of experts in this field. to improve reliability. In their comments. Fine motor skill impairments have been noted on measures such as the grooved pegboard task (33. The mean suggested that the maximum time for completion was 40 min. It was noted that motoric dysfunction c Liver International (2009) 2009 John Wiley & Sons A/S . that there is limited agreement on what types of executive tests might be useful in this context (apart from measures of verbal fluency). with a score range from 1 reflecting ‘not important’ to 7 reflecting ‘very important’. and each commission member then independently responded to a series of questions about the features of a putative ‘gold standard’ battery for the assessment of MHE. commission members felt that the shorter the battery. 37. while MHE has not been reported to produce a true impairment of anterograde memory (i. make clinical decisions regarding their own patients and ultimately compare with other methodologies for diagnosis and treatment planning in MHE. Language is typically reported to be intact. 42. 56) (the latter also involve visuospatial abilities). 47. The consistent finding of impaired verbal fluency performance is also likely secondary to overall slowing of cognitive processing speed. 43. the modal response was that it needed to be o 60 min. but they also recognised that obtaining a reliable measurement of neurocognitive status was likely to require a minimum of 20–40 min of testing. Impairments have also been reported on various measures of attention span/working memory. There was some discussion regarding the inclusion of executive or self-regulatory measures. inhibitory control test) have been reported to be sensitive to impairments in patients with liver disease (51).

and there are no practice effects with repeated testing using alternate yielding the known normal quantiles. ISHEN committee ratings Neurocognitive domain Modal response Mean response Processing speed Working memory Verbal memory (anterograde) Visuospatial ability Visual memory (anterograde) Language Reaction time Motor functions 7 7 7 6 6 5 5 4.redeh. and has been shown to be sensitive to impairment in patients with cirrhosis (24. The analysis of the single test results showed that they were normally distributed only after logarithmic transformation. 61). including the 95% range around the midpoint for each single test. the battery is presumably relatively culture-free.g. and normative data have also been collected in Spain and Great Britain. cognitive processing speed. and there would appear to be a need to perform a direct comparison of the raw data from the four European countries. stroke. and the battery requires 15–20 min to complete. and the subject is timed on how quickly he or she can place a dot in the center of each circle. The effect of education and occupation were negligible compared with the age effect. The Serial Dotting Test consists of 10 rows of 10 circles. Europe and Asia. On principle. the Serial Dotting Test and the Line Drawing Test. A consensus paper by the World Congress of Gastroenterology in 1998 recommended the PSE-Syndrom-Test for the evaluation of patients at risk of MHE (7). RBANS scores were strongly correlated with liver disease as measured by the model for end-stage liver disease staging (38). While the British data are not yet available. In a sample of 300 consecutive outpatients presenting for liver transplantation. language (including semantic fluency) and visuospatial function (line orientation and figure copy). coding and block design). drawing. and to efficiently screen/track neurocognitive impairment in other disorders. Existing candidate batteries Commission members were also asked to nominate any existing batteries that could potentially serve as a gold standard for assessment of MHE. multiple sclerosis and bipolar disorder. There are four alternate forms (A. ISHEN MHE NP recommendations Table 1. The RBANS underwent a US populationbased standardisation for ages 20–89. all data showed a linear dependence on age with normally distributed residuals of homogenous variance as determined by linear regression analysis and Kolmogorov–Smirnov test. the calculation of the normative data has been performed differently in different countries. PSE-Syndrom-Test The PSE-Syndrom-Test is a battery consisting of five paperand-pencil tasks. the Italian and the Spanish groups did not include line-tracing errors in their scoring (changing the score range from 16 to 18 to 15 to 15). given the components. 40. traumatic brain injury. and as of August 2008. scores on each subtest are assigned a value ranging from 11 to 3. There are four alternate forms of the PSE-SyndromTest (only the Serial Dotting Test is unchanged across forms).  2 and  3 standard deviations were calculated. working memory. the regression lines together with parallel lines of  1. including Number Connection Tests A and B. could potentially impact upon any neuropsychological measure that involved a motor response (e. A similar finding was reported in a separate study of 148 liver 631 . the Spanish data can be accessed by the interested clinician via the internet (http://www.5 7 6 5 6 5 4 4 4 Neurocognitive domains were rated on their importance for inclusion in a battery designed to detect impairments associated with minimal hepatic encephalopathy (MHE). B. The RBANS has undergone extensive clinical and psychometric validation for a variety of disorders. 32. The Line Drawing Test requires the subject to draw a continuous line between two parallel (winding) lines. as well as a total scale score. Repeatable Battery for the Assessment of Neuropsychological StatusTM The RBANS was designed with two basic goals: To serve as a ‘core’ battery for the assessment of dementia. Because the Line Drawing Test generates two scores. language. It has also been shown to correlate with functional neuroimaging results in these patients (63–65). based on age-related norms (11 for scores better than 1 SD above the normal mean to 3 for scores more than 3 SDs Liver International (2009) 2009 John Wiley & Sons A/S c below the normal mean. using a Likert scale ranging from 1 (not important) to 7 (very important). Normative data were initially collected in Germany (32. but that this was to some extent an unavoidable confound. For the purpose of scaling an individual’s test performance. Additional normative data from Italy have been published (66). there are a total of six measures that contribute to the total score. and scores include completion time and errors. schizophrenia. Administration time is approximately 20–25 min. visuospatial/constructional.Randolph et al. 62). a coding test (Digit Symbol Test) similar to the Digit Symbol subtest of the Wechsler scales. Unfortunately. This battery is only commercially available in the German form presently. After such a transformation. C and D). Most of the members did not choose a specific battery for this purpose. and generates age-scaled index scores with a normal mean of 100 and SD = 15 for five domains (immediate memory. and the instructions are easily translatable. The only two specific batteries that were recommended were the PSE-Syndrom-Test (59) and the Repeatable Battery for the Assessment of Neuropsychological StatusTM (RBANS) (60). It is a portable pencil-and-paper test that requires a folding stimulus booklet and paper record form to administer. It is currently being used in multiple clinical trials for Alzheimer’s disease and schizophrenia. standardise the scoring and determine the extent to which local norms are required for these individual countries. Thus. The regression lines and the standard deviation lines were finally transformed into the original scales (32). Scores on the RBANS also predicted disability independently of liver disease severity in this study. including studies completed in North America. The use of the RBANS in evaluating patients with liver disease has been largely restricted to the USA to date. there were approximately 20 different official translations available for clinical and research purposes. It contains measures of verbal and visual anterograde memory. 31. including various forms of dementia. attention and delayed memory). which can therefore range from 16 to 18. The PSE-Syndrom-Test was specifically developed to measure the effects of MHE.

transplant candidates (36). Each of these has at least a few peer-reviewed publications. Candidate battery characteristics compared with commission recommendations Table 2 lists the parameters of the ideal ‘gold standard’ battery for the detection of MHE as recommended by the commission. utilising measures extracted from a much larger battery for their sensitivity to this syndrome. Therefore. Italy and Great Britain. 632 c Liver International (2009) 2009 John Wiley & Sons A/S . and verbal fluency that are not contained in the PSE-SyndromeTest. and the basic psychometrics of the test appear to be satisfactory. the PSE-Syndrom-Test has been shown to correlate with functional brain imaging results in cirrhotic patients. and the broad clinical utility of this test (in conjunction with its use in ongoing multinational clinical trials in other diseases) is likely to stimulate local norming projects. The only existing batteries that were recommended for this purpose (each by a minority of the commission members) were the PSE-Syndrom-Test and the RBANS. until one of these batteries is demonstrated to have superior clinical validity in Table 2. have age-based norms and that it should measure multiple cognitive domains but generate a single global score with adequate retest reliability for detecting change. In addition.86 in normals. visuospatial construction. together with an indication of the degree to which the PSESyndrom-Test and the RBANS meet these requirements. 69) and Alzheimer’s disease (70). and the RBANS has been demonstrated to be predictive of disability in these patients. as it includes measures of verbal and visual memory. many of which are already underway. Discussion and recommendations The commission members were remarkably consistent in their recommendations for a gold standard neuropsychological battery for the detection of MHE. visuospatial perception. The RBANS also underwent a rigorous population-based standardisation and norming in the USA. but there has been no direct comparison of the two tests to date. It has been utilised in the measurement of MHE over a longer period of time than the RBANS. demonstrating both the sensitivity of the test to various forms of cerebral dysfunction and the predictive value of the total scale score with respect to various measures of functional independence in disorders such as stroke (68). visual memory. suggesting sensitivity to the effects of MHE. The commission recommends that. The commission members were in agreement that a suitable battery should be a portable. requiring translation of only the instructions for administration. The suggested component neurocognitive domains to be tested by the battery were also fairly consistently agreed upon. working memory. should have alternate forms to minimise or eliminate practice effects. and cognitive processing speed (psychomotor). It is somewhat closer in nature to the battery recommended by the commission. and in a recent report on 66 patients with end-stage liver disease (67). SD = 15) – normally distributed 25 Retest reliability for total score 0.81 in normals. Yes Demonstrated Demonstrated Yes German Spanish Italian and British (yet unpublished) Four forms Yes – sum of six categorical scores based on normal SDs – range 16 to 18 15–20 Retest reliability for total score 0. index score (mean of 100. language (including fluency). and all of the tasks are essentially non-verbal. the application of the RBANS in the context of liver disease may benefit from the more widespread use of this scale. the PSE-Syndrome-Test was developed specifically for the purpose of detecting MHE. The RBANS has the advantage of having a large body of clinical validity data in other disorders. penciland-paper Ease of translation/cross-cultural application Use with illiterate patients Availability of age-based norms Yes Verbal memory. It requires somewhat less time to administer than the RBANS. On the other hand. visuospatial perception and construction. A clear need was seen for a consistent approach across centers for the purposes of both diagnosis and measuring the effects of treatment. Substantial normative data have been collected in Germany. that it should be easily translated. no practice effects 6/7 RBANS. pencil-and-paper battery taking o 40 min to complete. Repeatable Battery for the Assessment of Neuropsychological StatusTM. and cross-cultural sensitivity in that context has been demonstrated. Spain. Existing tests compared with commission recommendations Commission recommendations PSE-Syndrom-Test RBANS Measurement of multiple cognitive domains Cognitive processing speed (psychomotor) and visuospatial demands (psychomotor) Portable. no practice effects 2/7 Yes US population-based Italian Alternate forms Global score generated Time for administration (min) Retest reliability. schizophrenia (61. minimal practice effects Number of cognitive domains measured with a modal ranking of 5 or higher in importance by the commission Four forms Yes.ISHEN MHE NP recommendations Randolph et al. These initial findings are encouraging.

EASL Clinical Practice Guidelines: Management of chronic hepatitis B. The commission believes that sufficient data exist to recommend the use of the RBANS or the PSE-Syndrom-Test as standard assessments for patients at risk of MHE at this point. measurements of outcome. We recognise that there may be a more efficient or sensitive approach to the detection of MHE in patients with liver disease than either the RBANS or the PSESyndrom-Test. Diagnostic cut-offs can then be set with known false-positive rates for the population of interest. gender and education). and facilitate communication and comparison of results across studies. Recommendation is made with less certainty. and this is a much simpler process than translation of the RBANS. etc. etc. as only the instructions for this test require translation. and cost Weaker Variability in preferences and recommendation values. The relative predictive value of each scale with respect to other clinical data of interest can also be directly compared. the global scores from each scale can be used to determine optimal cut-off points via ROC analyses. If a local translation of the RBANS is not available. as patient groups may vary in terms of the stability of their neurocognitive status as a function of disease severity. (4) The statistical properties of each scale can be weighed against the practical parameters of the scale (time to administer. The choice of which battery to use should be based upon the availability of local translations and normative data (Table 3).Randolph et al. The latter measure is important to establish the capacity of the test to identify group and individual changes in response to treatment. Any candidate battery or other (e.Further research is very likely to quality evidence have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Local norms can be calibrated to existing test norms on the basis of relatively small sample sizes.2008. (3) Using the common normative reference sample. preferably using a stratified sampling approach to calibrate local norms to published norms in a consistent fashion. 633 .001. (2) Both tests should also be given (in counterbalanced sequence) on a single occasion to a sample of cirrhotic patients. Candidate measures should be compared with index measures using appropriate sample sizes as follows: (1) A common normative reference group. This will allow a more systematic approach to the clinical management of these patients.jhep. however. effort. performance on driving simulators. This choice should be driven on the basis of available local test translations and normative reference data. A  Both the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) and PSE-Syndrom-Test have met psychometric and clinical validity criteria for use in assessment of patients at risk for minimal hepatic encephalopathy. presumed patient-important outcomes. 2 Table 3. A  Neuropsychological test batteries that measure multiple domains of cognitive function are generally more reliable than single tests. measurements of quality of life. available norms. have an established scoring methodology that provides a single global score for classification purposes. the detection and monitoring of MHE. and sensitivity/specificity of the scales can be directly compared. the use of the PSESyndrom-Test is recommended.) to determine as to which scale will be of greater utility in this context.1016/j. as well as the comparison of any candidate replacement batteries. the commission recommends relying upon existing US or German norms while pursuing the collection of local norms. and tend to be more strongly correlated with functional status. The commission also recommends the following general approach for the future: comparison of the RBANS and PSESyndrom-Test. Any estimate of effect is uncertain Grading of Recommendation Strong Factors influencing the strength of recommendation the recommendation included the warranted quality of the evidence. should be administered both batteries (in counterbalanced sequence) on two occasions at least 1 week apart (the retest intervals should be the same for both batteries). researchers choose one or the other for the routine assessment of patients at risk for MHE. The second will be used to establish test–retest reliability.g. In the absence of local norms for the test being used. higher cost or resource consumption Symbol A B C 1 2 European Association for the Study of the Liver. doi: 10. along with the collection of as much additional clinical data (disease variables. The commission Liver International (2009) 2009 John Wiley & Sons A/S c ISHEN MHE NP recommendations further recommends ongoing research to establish local norms for these tests.or very low. biological) measure should. Grading of evidence and recommendations Notes Grading of Evidence High-quality evidence Further research is very unlikely to change our confidence in the estimate of effect Moderate-quality Further research is likely to have an evidence important impact on our confidence in the estimate of effect and may change the estimate Low. the commission recommends a systematic approach for future studies to compare approaches to diagnosing and monitoring MHE. J Hepatol 50 (2009). Normal subject retest reliability is preferred to facilitate comparison with various existing standardised tests. The first administration of each test will be used for analyses of sensitivity/clinical validity. if the samples are appropriately stratified.) as possible.10. Statements  Neuropsychological testing is an established methodology for quantifying cognitive impairment due to various forms of encephalopathy. or more uncertainty: more likely a weak recommendation is warranted. cost. in order to directly compare sensitivity/specificity and reliability to the RBANS or the PSE-Syndrom-Test. independent activities of daily living. matching a local cirrhotic sample on demographic variables (age. B Recommendations  Use of either the RBANS or the PSE-Syndrom-Test is recommended for diagnosing and monitoring minimal hepatic encephalopathy. including low-grade or minimal hepatic encephalopathy. Finally.

