Professional Documents
Culture Documents
PSYCHOMETRICS
ABSTRACT
Background: The Hypogonadism Impact of Symptoms Questionnaire Short Form (HIS-Q-SF) is a patient-
reported outcome measurement designed to evaluate the symptoms of hypogonadism. The HIS-Q-SF is an
abbreviated version including17 items from the original 28-item HIS-Q.
Aim: To conduct item analyses and reduction, evaluate the psychometric properties of the HIS-Q-SF, and
provide guidance on score interpretation.
Methods: A 12-week observational longitudinal study of hypogonadal men was conducted as part of the original
HIS-Q psychometric evaluation. Participants completed the original HIS-Q every 2 weeks. Blood samples were
collected to evaluate testosterone levels. Participants completed the Aging Male’s Symptoms Scale, the Inter-
national Index of Erectile Function, the Short Form-12, and the PROMIS Sexual Activity, Satisfaction with Sex
Life, Sleep Disturbance, and Applied Cognition Scales (baseline and weeks 6 and 12). Clinicians completed the
Clinical Global Impression of Severity and Change scales and a clinical form.
Main Outcome Measures: Item performance was evaluated using descriptive statistics and Rasch analyses. Reli-
ability (internal consistency and test-retest), validity (concurrent and know groups), and responsiveness were assessed.
Results: One hundred seventy-seven men participated (mean age ¼ 54.1 years, range ¼ 23e83). Similar to the
full HIS-Q, the final abbreviated HIS-Q-SF instrument includes five domains (sexual, energy, sleep, cognition,
and mood) with two sexual subdomains (libido and sexual function). For key domains, test-retest reliability was
very good, and construct validity was good for all domains. Known-groups validity was demonstrated for all
domain scores, subdomain scores, and total score based on the Clinical Global ImpressioneSeverity. All domains
and subdomains were responsive to change based on patient-rated anchor questions.
Clinical Implications: The HIS-Q-SF could be a useful tool in clinical practice, epidemiologic studies, and other
academic research settings.
Strengths and Limitations: Careful consideration was given to the selection of the final HIS-Q-SF items based
on quantitative data and clinical expert feedback. Overall, the reduced set of items demonstrated strong psy-
chometric properties. Testosterone levels for the participating men were not as low as anticipated, which could
have limited the ability to examine the relations between the HIS-Q-SF and testosterone levels. Further, the
analyses used data collected through administration of the full HIS-Q, and future studies should administer the
standalone HIS-Q-SF to replicate the psychometric analyses reported in the present study.
Conclusion: Similar to the original HIS-Q, the HIS-Q-SF has evidence supporting reliability, validity, and
responsiveness. The short form includes a smaller set of items that might be more suitable for use in clinical
practice or academic research settings. Gelhorn HL, Roberts LJ, Khandelwal N, et al. Psychometric Evalu-
ation of the Hypogonadism Impact of Symptoms Questionnaire Short Form (HIS-Q-SF). J Sex Med
2017;14:1046e1058.
Copyright 2017, International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
4
Received January 10, 2017. Accepted May 26, 2017. Johns Hopkins University, Baltimore, MD, USA
1
Evidera, Bethesda, MD, USA; Copyright ª 2017, International Society for Sexual Medicine. Published by
2
AbbVie, North Chicago, IL, USA; Elsevier Inc. All rights reserved.
3 http://dx.doi.org/10.1016/j.jsxm.2017.05.013
Maryland Center for Sexual Health, Lutherville, MD, USA;
Gelhorn et al
Satisfaction with Sex past 30 d greater satisfaction
Life Scale14
(continued)
J Sex Med 2017;14:1046e1058
1049
1050 Gelhorn et al
trained using a standardized training protocol on the purpose and Convergent and divergent validity of the HIS-Q-SF was
procedures for the study. Participants completed a total of three assessed. Pearson product-moment and Spearman rank correlation
in-person study visits at baseline, week 2, and week 12. Partic- coefficients were used to estimate the relation at baseline between
ipants also completed assessments at home on an electronic PRO all HIS-Q-SF domain scores and (i) the Aging Male’s Symptoms
device. The first set of assessments were completed during their Scale (AMS) sexual, somato-vegetative, and psychological domain
first in-clinic visit at baseline and then at home every 2 weeks scores; (ii) the International Index of Erectile Function (IIEF)
from week 2 to week 12. A summary of site and patient- erectile dysfunction, orgasmic function, sexual desire, intercourse
completed study events is presented in Table 1, which includes satisfaction, and overall satisfaction scores; (iii) PROMIS Sexual
brief descriptions of each instrument.12e17 Testosterone levels Activity score; (iv) PROMIS Global Satisfaction with Sex Life
were assessed through blood draws from all participants at score; (v) PROMIS Sleep Disturbance score; (vi) PROMIS
baseline and from participants who were beginning treatment Applied Cognition score; (vii) the Short Form-12 Health Survey
(treatment naïve) or switching to a different treatment (SF-12) vitality and physical component summary score; and (viii)
(switchers) at weeks 2 and 12. Clinical Global ImpressioneSeverity (CGI-S) scores. Correlations
between the HIS-Q-SF domain scores and testosterone (free and
total) also were assessed for the total sample at baseline. The HIS-
Statistical Analyses Q-SF scores were expected to be moderately (r ¼ 0.30e0.50) to
Sociodemographic and clinical variables were used to charac- highly (r > 0.50) correlated with conceptually corresponding
terize the patient sample using descriptive statistics. Item-level measurements and domains, demonstrating convergent validity28
descriptive statistics (mean, SD, median, range, frequency) (eg, HIS-Q sexual domains and PROMIS and AMS sexual scales,
were used to evaluate the performance of individual HIS-Q-SF HIS-Q energy domain and AMS somato-vegetative scale and SF-
items. Confirmatory factor analysis with specification of a six- 12 vitality item, HIS-Q sleep domain and PROMIS sleep, HIS-Q
factor model (consistent with the original HIS-Q) was used to cognition domain and PROMIS cognition, and HIS-Q mood
confirm the factor structure of the instrument. Model fit was domain and AMS psychological scale). Divergent validity was
assessed with the confirmatory fit index and the root mean assessed by examining correlations among domains that were hy-
squared error of approximation. In general, a model is considered pothesized to be unrelated (eg, PROMIS sleep disturbance and
to have good fit and explain the data well if the confirmatory fit HIS-Q-SF sexual domain and subdomain scores), and small cor-
index is at least 0.90.18,19 The root mean squared error of relations (r < 0.30) were expected.28
approximation is a measurement of fit assessing the discrepancy
between the predicted and observed data per degree of freedom; To evaluate known-groups validity, domain scores on the
values of 0.01, 0.05, and 0.08 suggest excellent, good, and HIS-Q-SF were analyzed by disease severity based on the CGI-S.
mediocre fit, respectively.20 Rasch analyses21 were used to eval- Mean scores on the HIS-Q-SF domains were compared for each
uate individual item and subscale properties. The confirmatory of the CGI-S severity levels (no symptoms or very mild, mild,
factors and Rasch analyses were conducted using Mplus22 and moderate, and severe) using analysis of covariance at baseline and
RUMM 2030,23 respectively. week 12 controlling for age, sex, and race. In addition, the
known-groups validity of the HIS-Q-SF was examined using
After item evaluation of the individual HIS-Q-SF items and baseline measurements of total and free testosterone.
development of the scoring algorithm, the psychometric prop-
erties of the HIS-Q-SF were evaluated. Internal consistency To evaluate responsiveness, the extent to which the instru-
reliability of the HIS-Q-SF domains was evaluated using the ment can detect true change in participants known to have
Cronbach a coefficient24 at baseline, with reliability values of at changed in clinical status, patient-rated anchor questions were
least 0.70 indicating a more reliable (precise) instrument.25 used to characterize change from baseline to week 6 and from
baseline to week 12. Responsiveness analyses also were con-
Test-retest reliability was examined for all HIS-Q-SF domain ducted using the CGI-S score changes from baseline to week 12.
scores to examine the stability of the HIS-Q-SF over time within a
stable population. Stable subjects were defined as those with “no Responder definitions were identified for the HIS-Q-SF do-
change” in patient-rated anchor questions for each HIS-Q-SF mains, subdomains, and total score using anchor-based and
domain and the total score from baseline to week 2. Test-retest distribution-based methods. Mean scores for each HIS-Q-SF
reliability was assessed using intraclass correlations coefficients domain and subdomain for participants who improved by one
(ICCs) and paired-sample t-tests among stable patients only. ICCs point on each concept-specific anchor question were used to
range from 0 to 1.0, with higher scores indicating a more stable establish anchor-based responder definitions. The SDs at baseline
instrument. The hypothesis was that there would be no significant (0.2, 0.3, and 0.5) and the standard error of measurement29,30
differences in scale scores when there was no change in disease were used to establish distribution-based responder definitions.
status. ICCs should be statistically significant and high (>0.70). Then, the anchor- and distribution-based definitions were
An ICC of at least 0.7 indicates good test-retest reliability, 0.4 to triangulated to derive final responder definitions of clinically
0.7 indicates moderate test-retest reliability, and lower than 0.4 meaningful change for each of the HIS-Q-SF domain, sub-
indicates low test-retest reliability.25e27 domain, and total scores.
A direct comparison between the original HIS-Q and the Table 2. Sociodemographic characteristics
HIS-Q-SF total and domain scores was conducted. Pearson Total sample
correlation coefficients were calculated for the total, domain, and (N ¼ 177)
subdomain scores of the two measurements. It was expected that
Age
the same domains across measurements would be very highly
Mean (SD) 54.1 (11.4)
correlated (>0.80).
Median (range) 55.0 (23.0e83.0)
Missing, n (%) 1 (0.6)
RESULTS Race, n (%)*
Black or African American 32 (18.1)
The participants were recruited from 20 clinical sites across White 131 (74.0)
the United States (number of patients per site: mean ¼ 9.9, SD ¼ Other† 7 (4.0)
4.6). The final analysis sample included data from 177 men including Missing 7 (4.0)
89 men who were on the same TRT throughout the study (main- Ethnicity, n (%)
tenance patients), 41 men who were switching from one form of Hispanic or Latino 9 (5.1)
TRT to another at the start of the study (switchers), and 47 men who Not Hispanic or Latino 166 (93.8)
were initiating TRT for the first time (treatment-naïve patients). Missing 2 (1.1)
Employment status, n (%)
Employed fulltime 105 (59.3)
Demographics and Clinical Characteristics Employed part-time 12 (6.8)
The men participating in the study had a mean age of 54.1 years Student 2 (1.1)
(range ¼ 23e83), and most were white (74.0%). Most reported Unemployed, disabled, retired 52 (29.4)
being involved in an intimate relationship (82.5%; Table 2). The Other‡ 5 (2.8)
average duration of hypogonadism diagnosis was 2.2 years (SD ¼ Missing 1 (0.6)
3.2). The mean baseline testosterone level of participants was Education, n (%)
507.6 ng/dL (Table 3). The men participating in the study had Secondary, high school, some college, 88 (49.7)
moderate to severe levels of impairment in sexual functioning as trade school
measured by the AMS (overall mean ¼ 11.7, SD ¼ 4.3), and College degree 58 (32.8)
clinician ratings of hypogonadism severity for the participants Postgraduate degree 30 (16.9)
indicated mild to moderate symptom severity at baseline. Missing 1 (0.6)
Currently in an intimate relationship, n (%)
Yes 146 (82.5)
HIS-Q-SF Item Evaluation, Factor Structure, and No 30 (16.9)
Scoring of HIS-Q-SF Missing 1 (0.6)
Overall, the individual item-level analyses demonstrated *Categories are not mutually exclusive.
acceptable distribution of the HIS-Q-SF item responses across †
Other race: Asian (n ¼ 3), Native Hawaiian or Pacific Islander (n ¼ 1),
the response categories and good distributional characteristics. Hispanic (n ¼ 1), Haitian (n ¼ 1), and Jamaican (n ¼ 1).
‡
After item evaluation, factor analysis was completed on the Other employment: self-employed (n ¼ 4), and sales (n ¼ 1).
17-item HIS-Q-SF. Because the original HIS-Q had a six-factor not precisely distinguish between participants with different
solution (including the libido subdomain, sexual function sub- levels of hypogonadism severity) for three items (“difficult
domain, and energy, sleep, cognition, and mood domains), this achieving erections,” “difficulty ejaculating,” and “feeling sad”).
model also was examined for the HIS-Q-SF. This six-factor
The final HIS-Q-SF scoring includes each of the 14 ordinal
model showed acceptable factor loadings (0.500e1.004)
response scale items and yields five domain scores (sexual; energy,
and demonstrated acceptable fit to the data (confirmatory
sleep, cognition, and mood) and two sexual subdomain scores
fit index ¼ 0.993, root mean squared error of approximation ¼
(libido and sexual function); a total score also can be calculated.
0.086; eTable 1).
Scores are scaled from 0 to 100, where higher scores indicate
Rasch analyses were conducted using baseline data on each of greater levels of dysfunction. The open-ended items (items 1e3)
the domains and subdomains identified through the factor ana- representing numerical response data were not included in the
lyses. Item performance was very good, with all items demon- final scoring algorithm but could provide useful information on
strating fit to the Rasch model (P > .05 for all comparisons) and the frequency of sexual activity.
good distributions of item thresholds (b range: libido ¼ 4.8 to
3.6, sexual function ¼ 2.8 to 2.9, energy ¼ 5.6 to 5.6,
Psychometric Evaluation of HIS-Q-SF
sleep ¼ 1.3 to 1.8, cognition ¼ 2.9 to 1.4, mood ¼ 3.4 to
2.1). The item thresholds were well matched to the distributions Reliability
of individuals within each domain. There were a few minor issues The internal consistency reliability of the HIS-Q-SF instru-
with disordered thresholds (ie, some item response categories did ment was evaluated for each of the HIS-Q-SF scores at baseline.
Table 3. Baseline participant clinical characteristics—site reported Internal consistency reliability was acceptable for the sexual
Total sample domain (0.82), energy domain (0.88), libido subdomain (0.78),
(N ¼ 177) sexual function subdomain (0.91), and total score (0.84). The
mood domain demonstrated internal consistency that was
Time since initial diagnosis of hypogonadism (y)
slightly lower than the accepted threshold (Cronbach
Mean (SD) [range] 2.2 (3.2)
a ¼ 0.64).The internal consistency reliability estimates of the
[0.0e20.6]
Unknown, n (%) 1 (0.6)
sleep and cognition domains were lower at 0.39 and 0.44,
Provider-reported hypogonadism etiology, n (%) respectively, and did not reach the acceptable target of a
Primary congenital 23 (13.0) Cronbach a of at least 0.70.
Primary acquired 81 (45.8) Test-retest reliability was assessed for stable patients from
Secondary congenital 0 (0.0) baseline to week 2. Using the patient-completed anchor questions
Secondary acquired 28 (15.8) to identify stable patients, test-retest reliability for the sexual
Combined 7 (4.0) domain, libido subdomain, sexual function subdomain, energy
Unknown 38 (21.5) domain, mood domain, and HIS-Q-SF total score were very good
Specific suspected etiology or diagnosis, n (%) (ie, ICCs > 0.70). The sleep and cognition domains had mod-
Pituitary adenoma or disorder 2 (1.1)
erate test-retest reliability (ICC ¼ 0.67 and 0.61, respectively).
Testicular trauma or disorder 2 (1.1)
Other* 19 (10.7)
Unknown 154 (87.0)
Validity
Chief complaint or presenting symptom, n (%) Overall, good convergent and divergent validity was demon-
Erectile dysfunction 46 (26.0) strated for all HIS-Q-SF domains and subdomains, as reflected by a
Low libido 45 (25.4) pattern of statistically significant moderate to large correlations (r
Tiredness 29 (16.4) > 0.30) between each HIS-Q-SF domain score or subdomain
Fatigue 48 (27.1) score and the corresponding PRO or clinician rating at baseline. As
Other† 5 (2.8) expected, acceptable correlations (r > 0.30) were generally
Unknown 4 (2.3) observed between HIS-Q-SF domains and PRO or clinical sub-
History of testosterone replacement scales measuring similar concepts, and smaller correlations were
medications, n (%) demonstrated between items that were less conceptually related (r
No history of testosterone 52 (29.4) 0.30). For example, the sexual domain and libido and sexual
replacement medications function subdomains were strongly correlated with the AMS sexual
Buccal 2 (1.1)
scale (r ¼ 0.59, 0.39, 0.55; P < .0001 for all comparisons), the
Topical 69 (39.0)
IIEF sexual desire domain (r ¼ 0.49, 0.71, 0.35; P < .0001 for all
Patch 6 (3.4)
comparisons), and the PROMIS Sexual Activity score
Subcutaneous pellet 20 (11.3)
(r ¼ 0.57, 0.76, 0.40; P < .0001 for all comparisons). The
Injection 69 (39.0)
Missing 1 (0.6)
sexual and sexual function subdomains also were strongly corre-
BMI‡ (calculated), mean (SD) [range] 30.2 (5.2) lated with the PROMIS Global Satisfaction with Sex Life score
[21.5e53.2] (r ¼ 0.59, 0.57; P < .0001). The energy domain was strongly
Baseline serum total testosterone correlated with the AMS somato-vegetative scale (r ¼ 0.66; P <
concentration (n ¼ 172)§ .0001) and the SF-12 vitality item (r ¼ 0.62; P < .0001). The
Concentration (ng/dL), mean (SD) 507.6 (495.4) sleep domain was strongly correlated with the PROMIS Sleep
[range] [19.7e4,160.0] Disturbance score (r ¼ 0.66; P < .0001), the cognition domain
Missing, n (%) 5 (2.8) was strongly correlated with the PROMIS Applied Cognition score
Baseline free testosterone concentration (r ¼ 0.65; P < .0001), and the mood domain was strongly
(n ¼ 1,490)
correlated with the AMS psychological scale (r ¼ 0.78; P < .0001).
Concentration (ng/Dl), mean (SD) 15.0 (15.9)
Divergent validity was demonstrated through low correlations
[range] [0.5e126.0]
among conceptually unrelated scales. In the total sample, the HIS-
Missing, n (%) 28 (15.8)
Q-SF sexual, energy, sleep, and cognition domains, the libido and
BMI ¼ body mass index. sexual function subdomains, and the total score demonstrated
*Other suspected etiology: senescent (n ¼ 13), obesity (n ¼ 2), aging
(n ¼ 3), and testicular failure (n ¼ 1).
notable (but small to moderate) correlations with testosterone
†
Other chief complaint: no symptom (n ¼ 1), poor concentration (n ¼ 1), low levels at baseline (r ¼ 0.16 to 0.38; P < .05 for all
energy (n ¼ 1), weakness (n ¼ 1), and weight gain (n ¼ 1). comparisons).
‡
BMI ¼ (weight in pounds 703)/(height in inches)2.
§
Baseline serum total testosterone concentration lower than 300 ng/dL in The known-groups validity was good for all HIS-Q-SF
68 patients. domains and subdomains based on clinician-rated impression
No symptoms
or very mild Mild Moderate Severe Overall F-test*
Pairwise
LS mean LS mean LS mean LS mean comparison†
HIS-Q-SF score n (SE) n (SE) n (SE) n (SE) F P value (P value)
Sexual symptoms domain 34 31.3 (3.9) 36 41.5 (3.8) 73 49.8 (2.7) 30 61.8 (4.2) 10.53 <.0001 2‡, 3k, 5‡
Libido subdomain 34 34.2 (3.6) 36 43.1 (3.5) 73 42.8 (2.4) 30 48.8 (3.8) 2.72 .0459
Sexual function subdomain 35 31.0 (5.5) 36 42.4 (5.4) 72 55.1 (3.8) 30 70.6 (5.9) 9.37 <.0001 2‡, 3k, 5‡
Energy symptoms domain 35 34.6 (4.2) 37 44.3 (4.1) 73 51.7 (2.9) 30 59.6 (4.5) 6.45 .0004 2‡, 3‡
Sleep symptoms domain 35 27.5 (3.5) 37 28.7 (3.4) 71 34.9 (2.5) 30 45.8 (3.8) 5.28 .0017 3‡, 5‡
Cognition symptoms domain 35 24.3 (2.9) 36 29.9 (2.8) 73 36.1 (2.0) 30 45.0 (3.1) 9.26 <.0001 2‡, 3k, 5‡
Mood symptoms domain 35 20.2 (2.9) 37 27.5 (2.8) 72 31.1 (2.0) 30 37.5 (3.1) 6.01 .0006 2‡, 3‡
HIS-Q-SF total score (14 items) 34 27.6 (2.3) 34 35.6 (2.3) 70 41.7 (1.6) 30 51.6 (2.5) 18.06 <.0001 2k, 3k, 5§, 6‡
CGI-S ¼ Clinical Global ImpressioneSeverity; HIS-Q-SF ¼ Hypogonadism Impact of Symptoms Questionnaire Short Form; LS ¼ least squares;
SE ¼ standard error.
*General linear model (PROC GLM): 1 ¼ no symptoms or very mild vs mild; 2 ¼ no symptoms or very mild vs moderate; 3 ¼ no symptoms or very mild vs
severe; 4 ¼ mild vs moderate; 5 ¼ mild vs severe; 6 ¼ moderate vs severe.
†
Pairwise comparisons between LS means were performed using the Scheffe test adjusting for multiple comparisons.
‡
P < .05; §P < .001; kP < .0001.
of severity of symptoms (P < .05 for all comparisons; Table 4A). significant changes in the expected direction for all HIS-Q-SF scales
All domains also discriminated between categories of total from baseline to week 6 and from baseline to week 12 (P < .05 for all
testosterone levels (P < .05; Table 4B), except the libido sub- comparisons; Table 5). Although lower, responsiveness also was
domain (P ¼ .0536). The sexual symptoms domain, sexual demonstrated using changes based on the CGI-S; all domains, except
function subdomain, energy domain, and sleep domain and the cognition and libido, showed significant changes in the expected di-
total score discriminated between free testosterone levels rection from baseline to week 12 (P < .05 for all comparisons; Table 6).
(P < .05 for all comparisons; eTable 2).
Responder Definitions
Responsiveness Responder definitions were defined using anchor- and
Responsiveness of the instrument was very good for each of the distribution-based methods. To obtain anchor-based estimates,
domains, subdomains, and total score at each time point, in particular the mean score for participants who improved by one point on
when assessed using patients’ reports of their condition. Changes in the anchor questions are reported in Table 5 for each domain.
each of the patient-reported anchor questions were reflected by The anchor-based and distribution-based responder estimates are
Sexual symptoms domain 68 54.3 (2.9) 55 46.1 (3.2) 46 37.4 (3.5) 6.91 .0013 2‡
Libido subdomain 68 46.9 (2.5) 56 42.2 (2.8) 45 37.2 (3.1) 2.98 .0536
Sexual function subdomain 68 59.2 (4.1) 54 50.2 (4.6) 47 39.0 (4.9) 4.98 .0079 2‡
Energy symptoms domain 68 53.9 (3.1) 56 46.7 (3.4) 47 40.7 (3.7) 3.90 .0222 2‡
Sleep symptoms domain 67 38.1 (2.6) 55 35.7 (2.9) 47 26.6 (3.1) 4.33 .0147 2‡
Cognition symptoms domain 68 39.2 (2.1) 55 32.3 (2.4) 47 28.5 (2.6) 5.55 .0046 2‡
Mood symptoms domain 68 33.0 (2.1) 55 28.5 (2.4) 47 24.5 (2.6) 3.28 .0400 2‡
HIS-Q-SF total score (14 items) 67 45.2 (1.8) 52 38.2 (2.0) 45 32.6 (2.2) 10.12 <.0001 1‡, 2k
HIS-Q-SF ¼ Hypogonadism Impact of Symptoms Questionnaire Short Form; LS ¼ least squares; SE ¼ standard error.
*General linear model (PROC GLM).
†
Pairwise comparisons between LS means were performed using the Scheffe test adjusting for multiple comparisons: 1 ¼ <300 vs 300e500 ng/dL;
2 ¼ <300 vs >500 ng/dL; 3 ¼ 300e500 vs >500 ng/dL.
‡
P < .05; §P < .001; kP < .0001.
Decline (1) Stable (0) Improvement (1) Improvement (2) Overall F-test*
Pairwise comparison†
HIS-Q-SF score change n LS mean (SE) n LS mean (SE) n LS mean (SE) n LS mean (SE) F P value (P value)
Sexual domain by sexual activity anchor
Baseline to week 6 30 10.5 (3.3) 65 2.6 (2.2) 42 13.7 (2.8) 18 43.9 (4.2) 38.31 <.0001 1‡, 2k, 3k, 4‡, 5k, 6k
Baseline to week 12 25 6.6 (3.7) 62 3.3 (2.4) 41 11.7 (2.9) 20 34.5 (4.2) 20.55 <.0001 2‡, 3k, 5k, 6§
Sexual domain by overall sexual function anchor
Baseline to week 6 31 11.9 (3.5) 54 5.6 (2.7) 41 10.7 (3.1) 29 29.3 (3.7) 22.36 <.0001 1‡, 2k, 3k, 5k, 6‡
Baseline to week 12 24 5.2 (3.9) 63 1.3 (2.4) 27 14.8 (3.7) 34 25.2 (3.3) 16.63 <.0001 2‡, 3k, 4‡, 5k
Libido subdomain by libido anchor
Baseline to week 6 39 9.9 (2.3) 64 4.7 (1.8) 42 12.8 (2.2) 9 27.8 (4.7) 26.79 <.0001 1k, 2k, 3k, 4‡, 5§, 6‡
Baseline to week 12 34 7.4 (3.0) 68 1.1 (2.1) 33 9.1 (3.0) 13 26.9 (4.8) 14.17 <.0001 2‡, 3k, 5k, 6‡
Sexual function subdomain by erectile function anchor
Baseline to week 6 31 3.0 (6.0) 79 5.7 (3.7) 26 20.5 (6.5) 19 27.6 (7.6) 3.55 .0160
Baseline to week 12 35 1.0 (5.0) 66 8.3 (3.7) 26 18.9 (5.9) 21 31.4 (6.5) 5.34 .0016 2‡, 3‡
Sexual function subdomain overall sexual function anchor
Baseline to week 6 31 17.2 (5.3) 54 8.3 (4.0) 41 14.0 (4.6) 29 38.2 (5.5) 17.90 <.0001 1‡, 2§, 3k, 5§, 6‡
Baseline to week 12 24 5.9 (5.6) 63 1.2 (3.4) 27 23.2 (5.2) 34 34.6 (4.7) 16.07 <.0001 2‡, 3k, 4‡, 5k
Energy symptom domain by tiredness anchor
Baseline to week 6 22 8.5 (4.4) 70 2.7 (2.5) 52 19.7 (2.9) 11 42.1 (6.2) 21.49 <.0001 1‡, 2k, 3k, 4§, 5k
Baseline to week 12 26 10.6 (3.9) 56 6.3 (2.7) 49 21.7 (2.9) 16 49.2 (5.0) 34.38 <.0001 1§, 2k, 3k, 4‡, 5k, 6‡
Mood symptom domain by mood anchor
Baseline to week 6 46 12.3 (2.2) 64 2.0 (1.9) 33 9.3 (2.6) 12 22.9 (4.3) 24.83 <.0001 1‡, 2k, 3k, 4‡, 5k
Baseline to week 12 37 9.0 (2.5) 59 1.3 (2.0) 36 6.3 (2.5) 16 25.5 (3.8) 21.35 <.0001 2§, 3k, 5k, 6§
Cognition symptom domain by cognition anchor
Baseline to week 6 43 7.0 (2.5) 69 0.0 (2.0) 29 8.6 (3.1) 14 21.4 (4.4) 12.55 <.0001 2‡, 3k, 5§
Baseline to week 12 30 7.5 (3.2) 75 0.2 (2.0) 29 10.8 (3.3) 14 18.8 (4.7) 9.99 <.0001 2‡, 3§, 4‡, 5‡
Sleep symptom domain by sleep anchor
Baseline to week 6 51 8.8 (2.3) 58 3.7 (2.1) 39 10.9 (2.6) 6 35.4 (6.7) 19.85 <.0001 1‡, 2k, 3k, 5§, 6‡
Baseline to week 12 34 8.8 (3.1) 54 4.9 (2.5) 48 9.6 (2.6) 10 35.0 (5.7) 16.90 <.0001 1‡, 2§, 3k, 5k, 6‡
HIS-Q-SF total score by overall hypogonadism anchor
J Sex Med 2017;14:1046e1058
Baseline to week 6 29 6.5 (2.2) 54 0.9 (1.6) 40 7.6 (1.9) 28 20.4 (2.3) 27.18 <.0001 2k, 3k, 5k, 6§
Baseline to week 12 22 5.1 (2.7) 62 1.8 (1.6) 25 10.6 (2.5) 34 19.1 (2.2) 21.11 <.0001 2§, 3k, 4‡, 5k
HIS-Q-SF ¼ Hypogonadism Impact of Symptoms Questionnaire Short-Form; LS ¼ least squares; SE ¼ standard error.
*General linear model (PROC GLM).
Gelhorn et al
†
Pairwise comparisons between LS means were performed using the Scheffe test adjusting for multiple comparisons: 1 ¼ decline vs stable group; 2 ¼ decline vs improvement (1) group; 3 ¼ decline vs
improvement (>1) group; 4 ¼ stable vs improvement (1) group; 5 ¼ stable vs improvement (>1) group; 6 ¼ improvement (1) vs improvement (>1) group.
‡
P < .05; §P < .001; kP < .0001.
Psychometric Evaluation of the HIS-Q-SF 1055
Table 6. Responsiveness: HIS-Q-SF score change by CGI-S score change from baseline to week 12
CGI-S Score Change
Sexual domain 13 6.9 (5.9) 58 0.7 (2.8) 73 15.9 (2.5) 11.70 <.0001 1 §, 2 ‡
Libido subdomain 14 3.6 (5.0) 57 0.7 (2.5) 73 7.7 (2.2) 3.21 .0433 1*
Sexual subdomain 14 7.7 (7.8) 58 1.6 (3.8) 72 22.9 (3.4) 11.87 <.0001 1 §, 2 ‡
Energy domain 16 0.0 (6.6) 57 9.6 (3.5) 73 17.6 (3.1) 3.49 .0332
Sleep domain 16 0.0 (5.2) 57 1.3 (2.8) 72 10.2 (2.5) 3.56 .0311
Cognition domain 16 7.0 (4.7) 57 0.9 (2.5) 73 4.8 (2.2) 2.79 .0646
Mood domain 16 6.8 (4.4) 58 0.7 (2.3) 72 4.9 (2.1) 3.49 .0332
Total score 13 2.8 (3.9) 54 1.8 (1.9) 71 11.9 (1.7) 11.03 <.0001 1 §, 2 ‡
CGI-S ¼ Clinical Global ImpressioneSeverity; HIS-Q-SF ¼ Hypogonadism Impact of Symptoms Questionnaire Short Form; LS ¼ least squares;
SE ¼ standard error.
*General linear model (PROC GLM).
†
Pairwise comparisons between LS means were performed using the Scheffe test adjusting for multiple comparisons: 1 ¼ improvement vs stable;
2 ¼ improvement vs decline; 3 ¼ stable vs decline.
‡
P < .05; §P < .001; kP < .0001.
presented in Figure 1, as are the final responder definitions Comparison of Original HIS-Q and HIS-Q-SF
that were determined by triangulating across all estimates
Correlational Analysis
(responder definitions: sexual ¼ 10.0; libido ¼ 12.5; sexual
Pearson correlation coefficients were calculated for the
function ¼ 16.6; energy ¼ 12.5; sleep ¼ 12.5;
domain, subdomain, and total scores from the HIS-Q-SF and
cognition ¼ 12.5; mood ¼ 8.3, total ¼ 7.1; Figure 1).
the original HIS-Q, a published instrument with demonstrated
20.00
15.00
Clinically Meaningful Change Estimates
5.00
0.00
Figure 1. Summary of anchor- and distribution-based estimates of clinically meaningful change for the Hypogonadism Impact of
Symptoms Questionnaire Short Form.
reliability and validity.7 Same domain, subdomain, and total The final 17 items of the HIS-Q-SF fit on a single page and
score correlations were very high (r ¼ 0.91e0.97). In particular, can easily be administered in clinical settings in less than 2 mi-
the sexual domain and total scores across the two versions of the nutes. The original HIS-Q form is expected to be used in future
HIS-Q were very highly correlated at 0.97 and 0.96, clinical trials evaluating TRTs. Therefore, the alignment of items
respectively. and domains will allow for meaningful comparisons to be made
by individual clinicians. Although the results suggest that the
correlations among the domains, subdomains, and total score for
DISCUSSION the HIS-Q and HIS-Q-SF were very high, to some extent these
The 17-item HIS-Q-SF is an abbreviated version of the high values are expected because the items overlap and response
original 28-item HIS-Q PRO instrument, which was designed to data were drawn from the same dataset. Further research is
assess changes in hypogonadal symptoms in response to TRT. necessary to estimate these correlations among separate admin-
The present analysis provides strong support for the psycho- istrations of the HIS-Q and HIS-Q-SF. Future research also
metric properties (reliability, validity, and responsiveness) of the could be conducted to establish a common score metric between
reduced set of items in the HIS-Q-SF. When comparing the the original HIS-Q and the HIS-Q-SF.
psychometric properties between the original and short forms, in As evidenced by the deceased responsiveness of the HIS-Q-SF
general the measurements performed in a similar manner (for a to clinician ratings, the results of this study also highlighted that
comparison, see eTable 3), which would be expected because the changes in particular symptoms, such as libido, cognition, and
HIS-Q-SF includes a subset of items from the original HIS-Q sleep, are often not accurately detected or rated by a clinician;
(Appendix B). However, there were a few notable differences. therefore, a brief PRO tool that can be used in clinical practice
For internal consistency reliability measured at baseline, the might be useful. This highlights the potential utility of the HIS-
HIS-Q-SF had higher Cronbach a values for the sexual domains Q-SF because this measurement was developed and designed
and subdomains, but lower values for the sleep (0.39 vs 0.58), specifically for tracking the change of hypogonadal symptoms
cognition (0.44 vs 0.65), and mood (0.64 vs 0.85) domains. This after the initiation of TRT. Unlike many other existing mea-
is not surprising because scales with larger numbers of items tend surements that are often used in studies of these patients, the
to have higher internal consistency reliability. The test-retest HIS-Q-SF provides a comprehensive and content valid means of
reliability of the two versions was very good and fairly consis- assessing the broad range of symptoms that affect this population
tent across domains. specifically. In addition, the HIS-Q-SF provides numerical
The convergent validity was very consistent across the HIS- feedback on frequency of sexual activity and Likert-type feedback
Q and HIS-Q-SF versions. The libido subdomain of the HIS- on symptoms from all relevant domains; this distinguishes the
Q-SF did not distinguish between patients with different measurement from other available tools. The HIS-Q-SF also was
categories of total testosterone levels, which differs from the developed in accord with the FDA guidelines for PROs,
findings for the original HIS-Q. Other measurements of including multiple rounds of direct patient input.8 The FDA was
known-groups validity were generally very good and compa- involved in the review of the various iterations of the HIS-Q.
rable between measurements. The responsiveness of the HIS- The developers believe that using the HIS-Q-SF will allow cli-
Q-SF and the HIS-Q also were highly similar. Overall, the nicians to monitor patients’ hypogonadal symptoms in response
HIS-Q-SF and original HIS-Q had highly similar psycho- to treatment and over time.
metric characteristics. There were several limitations that should be noted. The
All domains from the 28-item original HIS-Q were retained to testosterone levels for the participating men were not as low as
closely align the HIS-Q and HIS-Q-SF measurements. Careful anticipated at baseline owing to the three categories of partici-
consideration was given to the selection of the final HIS-Q-SF pants sought by the original research. This could have limited the
items based on a combination of direct patient input, quantita- authors’ ability to examine the relations between the HIS-Q-SF
tive data, and clinical expert feedback. The authors acknowledge and testosterone levels. The participants in the present study
that “morning erections,” an item included in the original HIS- were recruited through clinical sites, and the utility of the HIS-
Q, is an important clinical symptom. However, based on the Q-SF among other samples of men, for example, those from the
qualitative research to confirm the items in the HIS-Q-SF, general population or men who have not sought treatment for
including concept elicitation, patients did not bring up the symptoms of hypogonadism, is unknown and could be a focus of
symptom as being one of the most relevant or important future research. The correlations that have been reported be-
symptoms, which would support inclusion in a short form. For tween scores on the full HIS-Q and the HIS-Q-SF are based on
those investigators specifically interested in the item on morning data from a single administration of the HIS-Q; future research
erections, the complete three-item libido scale from the longer should evaluate correlations between the measurements when
HIS-Q could be included in the HIS-Q-SF with no substantive each is administered as a standalone instrument in the same
change in psychometric qualities, content coverage, or respon- sample. Further, the analyses were conducted using data
dent burden (18 vs 17 items). collected through administration of the full HIS-Q; future
studies should administer the standalone HIS-Q-SF (ie, only the 2. Zitzmann M, Faber S, Nieschlag E. Association of specific
17-item form) and replicate the psychometric analyses reported symptoms and metabolic risks with serum testosterone in
in the present study. older men. J Clin Endocrinol Metab 2006;91:4335-4343.
3. Gooren LJ. Endocrine aspects of ageing in the male. Mol Cell
CONCLUSIONS Endocrinol 1998;145:153-159.
The 17-item HIS-Q-SF is a brief assessment tool for the 4. Morley JE. Testosterone replacement and the physiologic as-
pects of aging in men. Mayo Clin Proc 2000;75(Suppl):S83-
evaluation of hypogonadal symptoms. Like the original HIS-Q,
S87.
the HIS-Q-SF demonstrated good reliability, validity, and
responsiveness. The measurement could be a useful tool for 5. Gelhorn HL, Vernon MK, Stewart KD, et al. Content validity
of the Hypogonadism Impact of Symptoms Questionnaire
application in clinical practice, epidemiologic studies, and other
(HIS-Q): a patient-reported outcome measure to evaluate
academic research settings.
symptoms of hypogonadism. Patient 2016;9:181-190.
6. Jensen RE, Snyder CF, Basch E, et al. All together now:
Corresponding Author: Heather L. Gelhorn, PhD, Senior findings from a PCORI workshop to align patient-reported
Research Scientist, Evidera, 7101 Wisconsin Avenue, Suite outcomes in the electronic health record. J Comp Eff Res
1400, Bethesda, MD 20814, USA. Tel: 1-970-363-7333; Fax: 2016;5:561-567.
1-301-654-9864; E-mail: heather.gelhorn@evidera.com 7. Gelhorn H, Dashiell-Aje E, Miller M, et al. Psychometric eval-
Conflicts of Interest: This work was conducted by Evidera, an uation of the Hypogonadism Impact of Symptoms Question-
naire (HIS-QTM). J Sex Med 2016;13:1737-1749.
independent research organization. Dr Gelhorn, Ms Roberts,
and Dr Revicki are employees of Evidera; Evidera received 8. Food and Drug Administration. Guidance for industry on
research study support from AbbVie. Mr Miller, Dr Khandelwal, patient-reported outcome measures: use in medical product
and Mr Hepp are employees and stockholders of AbbVie. Dr development to support labeling claims. Fed Regist 2009;
74:65132-65133.
Dobs and Dr DeRogatis have no conflicts of interest to declare.
9. Gelhorn H, Bodhani A, Wahala L, et al. A qualitative study to
Funding: None. inform the development of the Hypogonadism Impact of
Symptoms Questionnaire Short Form (HIS-Q SF). In press.
STATEMENT OF AUTHORSHIP 10. Hogarty KY, Hines CV, Kromrey JD, et al. The quality of factor
Category 1 solutions in exploratory factor analysis: the influence of sample
size, communality, and overdetermination. Educ Psychol
(a) Conception and Design Meas 2005;65:202-226.
Heather L. Gelhorn; Laurie J. Roberts; Dennis A. Revicki;
Michael G. Miller 11. MacCallum RC, Widaman KF, Zhang S, et al. Sample size in
(b) Acquisition of Data factor analysis. Psychol Methods 1999;4:84-99.
Heather L. Gelhorn; Dennis A. Revicki; Michael G. Miller 12. Heinemann LAJ, Zimmerman T, Vermeulen A, et al. A new
(c) Analysis and Interpretation of Data ‘Aging Males’ Symptoms’ (AMS) rating scale. Aging Male
Heather L. Gelhorn; Laurie J. Roberts; Dennis A. Revicki; 1992;2:105-114.
Leonard R. DeRogatis; Adrian Dobs; Zsolt Hepp; Michael G.
Miller 13. Rosen RC, Riley A, Wagner G, et al. The International Index of
Erectile Function (IIEF): a multidimensional scale for assess-
Category 2 ment of erectile dysfunction. Urology 1997;49:822-830.
(a) Drafting the Article 14. Flynn KE, Jeffery DD, Keefe FJ, et al. Sexual functioning along
Heather L. Gelhorn; Laurie J. Roberts the cancer continuum: focus group results from the Patient-
(b) Revising It for Intellectual Content Reported Outcomes Measurement Information System
Heather L. Gelhorn; Laurie J. Roberts; Nikhil Khandelwal; (PROMIS(R)). Psychooncology 2011;20:378-386.
Dennis A. Revicki; Leonard R. DeRogatis; Adrian Dobs; Zsolt
15. Yu L, Buysse DJ, Germain A, et al. Development of short
Hepp; Michael G. Miller
forms from the PROMIS sleep disturbance and sleep-related
Category 3 impairment item banks. Behav Sleep Med 2011;10:6-24.
(a) Final Approval of the Completed Article 16. Becker H, Stuifbergen A, Morrison J. Promising new ap-
Heather L. Gelhorn; Laurie J. Roberts; Nikhil Khandelwal; proaches to assess cognitive functioning in people with mul-
Dennis A. Revicki; Leonard R. DeRogatis; Adrian Dobs; Zsolt tiple sclerosis. Int J MS Care 2012;14:71-76.
Hepp; Michael G. Miller
17. Ware JE, Kosinski M, Turner-Bowker DM, et al. How to score
version 2 of the SF-12 Health Survey. Lincoln, RI: Quality
REFERENCES Metrics; 2002.
1. Sato Y, Tanda H, Kato S, et al. Prevalence of major depressive 18. Hu L-T, Bentler PM. Evaluating model fit. In: Hoyle RH, ed.
disorder in self-referred patients in a late onset hypogonadism Structural equation modelling: concepts, issues and applica-
clinic. Int J Impot Res 2007;19:407-410. tions. Thousand Oaks, CA: Sage; 1995. p. 77-99.
19. MacCallum RC, Browne MW, Sugawara HM. Power analysis 26. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New
and determination of sample size for covariance structure York: McGraw-Hill; 1994.
modeling. Psychol Methods 1996;1:130-149. 27. Leidy NK, Revicki DA, Geneste B. Recommendations for
20. Browne MW, Cudeck R. Alternative ways of assessing model evaluating the validity of quality of life claims for labeling and
fit. In: Bollen KA, Long JS, eds. Testing structural equation promotion. Value Health 1999;2:113-127.
models. Beverly Hills, CA: Sage; 1993. p. 136-162. 28. Cohen J. Statistical power analysis for the behavioral sciences.
2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
21. Jones PW, Chen WH, Wilcox TK, et al; EXACT-PRO Study
Group. Characterizing and quantifying the symptomatic fea- 29. Wyrwich KW, Nienaber NA, Tierney WM, et al. Linking clinical
tures of COPD exacerbations. Chest 2011;139:1388-1394. relevance and statistical significance in evaluating intra-
individual changes in health-related quality of life. Med Care
22. Muthén LK, Muthén B. Mplus user’s guide. 3rd ed. Los 1999;37:469-478.
Angeles, CA: Muthén & Muthén; 1998e2004.
30. Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence
23. RUMM 2030: Rasch unidimensional measurement models. supporting an SEM-based criterion for identifying meaningful
Duncraig, Australia: RUMM Laboratory Pty Ltd; 2010. intra-individual changes in health-related quality of life. J Clin
24. Cronbach LJ. Coefficient alpha and the internal structure of Epidemiol 1999;52:861-873.
tests. Psychometrika 1951;163:297-334.
25. Hays RD, Revicki DA. Reliability and validity, including
responsiveness. In: Fayers PM, Hays RD, eds. Assessing
SUPPLEMENTARY DATA
quality of life in clinical trials. 2nd ed. New York: Oxford Supplementary data related to this article can be found at
University Press; 2005. p. 25-39. http://dx.doi.org/10.1016/j.jsxm.2017.05.013.