Professional Documents
Culture Documents
Context: Clinicians, researchers, and global health advocates often include pubertal
development in outcomes. However, assessments of pubertal stage can be challenging because
of the sensitive nature and feasibility of clinical examinations, especially in larger settings.
Objective: To determine the accuracy of self-assessed Tanner staging when compared with
physically assessed Tanner stages by a clinician.
Data Sources: MEDLINE, PubMed, Embase, Web of Science, Scopus, the Cochrane Library, CINAHL.
Data Extraction: We extracted data to generate complete 5 × 5 tables for each study, including
any subgroup eligible for the analysis, such as overweight/obese youth.
Data Synthesis: After screening, 22 studies representing 21,801 participants met our inclusion
criteria for the meta-analysis. Overall agreement was moderate or substantial between the 2
assessments, with breast stage 1, female pubic hair 1, male pubic hair 1, and male pubic hair 5
having the highest agreement. When stages were collapsed into pre- (Tanner stage 1), in (stages
2,3), and completing (stages 4,5) puberty, levels of agreement improved, especially for pre- and
completing pubertal development. Most included studies comprised Caucasian youth. More
studies are needed which include a broader range of geographic and socioeconomic settings, as
well as a greater diversity of racial/ethnic groups.
M
onitoring a youth’s growth and pubertal develop- this meta-analysis are reported, including subanalyses
ment is an important part of health surveillance of agreement for obese youth and self-assessment tools.
of an individual as well as a population. Abnormalities
of pubertal timing often represent the extreme ends of
the normal spectrum but can also be a manifestation of Methods
an underlying disorder or poor nutritional status (1, 2). Using a 3-step search strategy, studies reporting the agree-
Moreover, for those who have a known systemic illness, ment between Tanner staging conducted by an HCP (hereafter
assessments of growth and puberty are key indicators referred to as HCP assessment) and self-assessed Tanner sta-
of effective management. Knowledge of pubertal status ging by adolescents were identified. An initial limited search of
PubMed was undertaken followed by an analysis of the text
is also important in the assessment of some aspects words contained in the title and abstract, as well as the index
of physiology; for example, assessments of metabolic terms used to describe the articles. A second search using all
to a strict definition of Tanner staging (7, 8) that, for example, estimates and corresponding 95% CI were obtained by meta-
does not include the assessment of testicular volume by Prader analysis using random effects model.
orchidometer. The index test was self-assessed puberty staging To determine the impact of different self-assessment tools
using Tanner criteria. Several self-assessment tools were used on agreement, a stratified analysis was conducted with the
in the included studies. All tools included an image with either tool as the variable using 5 × 5 tables for overall Tanner
a photograph or line drawing, with or without an accom- stages (1-5) female B, male G, mPH, and fPH. The tool vari-
panying description. Most studies developed their self-assess- able was categorized into 4 categories: (a) line drawings
ment tools based on the Morris and Udry (9) or Tanner (7, 8) without descriptions, (b) line drawings with descriptions, (c)
drawings; however, some designed their tools. photographs without descriptions, or (d) photographs with
descriptions. Pooled κ and linearly weighted κ w statistics
Study selection and data collection with corresponding 95% CI for each tool category were cal-
Screening was conducted in duplicate (J.M. and S.C.) for culated (13).
titles, abstracts, and full-text articles. All references were man- Because a body mass index (BMI) > 85th percentile can
Physical assessment was conducted before self-assess- Income Groups. Using World Health Organization re-
ment in 5 studies. One study stated that self-assessment gions, 12 studies (54.5%) were from the Region of the
was conducted either before or after the HCP assess- Americas; 3 studies (13.6%) from the European Region;
ment. Twenty studies recruited participants by conveni- 3 studies (13.6%) from the Western Pacific Region; 2
ence sampling (ie, from endocrinology clinics), 1 study studies (9.0%) from the South-East Asia Region; 1 study
conducted consecutive recruitment, and 1 study used (4.5%) from the African Region; and 1 study (4.5%)
random sampling. A single study of participants with from the Eastern Mediterranean Region.
anorexia nervosa was excluded. As well, a single study The overall quality assessments of included studies
examined the reliability of the parent assessment in add- were high for applicability (patient selection, index test,
ition to the reliability of the child assessment in com- and reference standard) and for the risk of bias (index
parison to HCP assessment. In this case, only the child test, reference standard, and flow and timing). Ratings
data were used for the meta-analysis and the parent for the risk of bias pertaining to the patient selection for
data were disregarded. Line drawings with description 13 (59%) studies indicated low risk of bias; 9 studies
were used to aid/guide the self-assessment in 12 (55%) (41%) were identified as “unclear risk.” Among those
studies; line drawings alone in 5 (23%); photographs with “unclear risk,” risk of bias was downgraded mostly
alone in 2 (9%); photographs with a description in 2 because participant selection was not random. Due
(9%); and 1 study (5%) used photographs and line to the nature of the topic at hand, convenience sam-
drawings plus description (Table 1). Fifteen studies ples, such as school settings or cross-sections of cohort
(68%) were from high-income countries and 7 (32%) studies were reported in these studies (6). No studies
from middle-income countries according to World Bank were excluded based on the risk-of-bias assessment.
Table 1. Characteristics of Included Studies
Health Professional
Publication Performing the Physical
Author Year Country Sample Size Age Range (years) Assessment Self-Assessment Tool
Boas et al (16) 1995 USA 61 12.0–18.9 Physician and nurse Photos (7,8) + description
Bonat et al (17) 2002 USA 244 12.9–15.5 Physician and nurse Line drawings + description
Brooks-Gunn et al (18) 1987 USA 85 11.0–13.0 Physician and nurse Line drawings
Chan et al (19) 2008 China 354 8.0–18.0 Other clinician Line drawings (9) + description
Chavarro et al (20) 2017 Mexico 245 8.0–13.0 Physician Line drawings + description
Desmangles et al (21) 2006 USA 240 6.0–16.0 Physician Photos (7,8)
Ernst et al (22) 2018 Denmark 197 13.7–16.6 Physician Line drawings + description
Hergenroeder et al (23) 1999 USA 107 8.0–17.0 Physician Line drawings + description
Jaruratansirikul et al (24) 2015 Thailand 1934 8.0–18.0 boys Physician Line drawings + description
2850 Campisi et al Reliability of Self-assessed Puberty Stages
7.0–16.0 girls
Lamb et al (25) 2011 USA 232 8.0–18.0 Physician Line drawings
Lee et al (26) 2006 USA 67 8.0–18.0 Physician and nurse Line Drawings (9)
Leone et al (27) 2007 Canada 47 12.0–17.0 Physician Line drawings + description
Morris et al (9) 1980 USA 47 12.0–16.0 Physician Line drawings (9)
Norris et al (28) 2005 South Africa 182 10.0–18.0 Nurse Photographs (7,8) + line drawings (9)
+ description
Peng et al (29) 2018 China 174 8.0–18.0 Physician Photos (computer generated)
Rabbani et al (30) 2013 Iran 190 9.0–16.0 Physician Line drawings + description
Rasmussen et al (31) 2015 Denmark 868 7.4–14.9 Physician Line drawings + description
Rollof et al (32) 2012 Sweden 100 10.0–16.0 Nurse Line drawings + description
Schall et al (33) 2012 USA 100 8.0–18.0 Physician Line drawings + description
Stephen et al (34) 2008 USA 87 8.0–16.0 Physician Line drawings
Sun et al (35) 2012 China 16 046 8.0–18.9 Physician Line drawings + description
Wacharasindhu et al (36) 2002 Thailand 194 7.0–15.0 Physician Photos (7,8) + description
J Clin Endocrinol Metab, August 2020, 105(8):2846–2856
For females, we observed substantial agreement (κ observed for G 1, G 4, mPH 2, mPH 3, and mPH 4,
value: 61-80) for B 1 and fPH 1. Moderate agreement whereas a fair agreement was observed for G 2 and G
(κ value: 41-60) was observed for B 2, B 4, B 5, fPH 3. Again, we observed greater agreement when broad-
2, fPH 3, and fPH 5. Fair agreement (κ value: 21-40) ening the puberty criteria by using puberty phases. For
was observed for B 3 and fPH 4. When broadening the boys, no significant differences were observed between
puberty criteria by using puberty phases (pre [Tanner overweight/obese and nonobese studies for most genital
stage 1], early/in [Tanner stages 2-3], and late/com- and mPH Tanner stages or puberty phases. Few excep-
pleting [Tanner stages 4-5]), substantial agreement was tions to note are the significantly lower agreement for
observed for prepuberty and completing puberty for G 2, G 5, mPH 2, and mPH 3 as well as pre- versus
both B and fPH development and moderate agreement in puberty phases for obese/overweight compared with
for in puberty for both B and fPH (Fig. 2A). nonobese/overweight males. Substantial agreement was
80 Substantial
Agreement
60
κ (95%CI)
Moderate
Agreement
40
Fair
Agreement
20
100
80
Substantial
Agreement
60
κ (95%CI)
Moderate
Agreement
40
Fair
Agreement
20
Figure 2. Pooled Kappa estimates (κ) and 95% confidence intervals for self-assessed Tanner stages and puberty phases by sex.
2852 Campisi et al Reliability of Self-assessed Puberty Stages J Clin Endocrinol Metab, August 2020, 105(8):2846–2856
80
n=4
n=11 Substantial
n=4 Agreement
60
Moderate
Agreement
40
Fair
0
Line Drawing Line Drawing with Description Photographs Photographs with Description
Breast Development Pubic Hair
n=1
Substantial
n=2 Agreement
n=3
60
Moderate
Agreement
40
Fair
Agreement
20
0
Line Drawing Line Drawing with Description Photographs Photographs with Description
with an accompanying description rather than an image at identifying the onset/progression of true puberty
alone will likely yield more accurate results, although it than boys.
is recognized that at times societal, cultural, or religious Another important question is whether self-assess-
factors may limit the use of images. Finally, one must ment can be used for evaluation of pubertal develop-
also remember that pubic hair alone is not necessarily ment among overweight and obese youth. Our results
indicative of central puberty (hypothalamic-pituitary- indicate that overweight/obese children may be able to
gonadal axis maturation) as the hair could derive from assess their puberty development as effectively as chil-
adrenarche instead of gonadarche. The difference in the dren with normal BMI with the exception of G 2, mPH
agreement between self vs HCP assessments of breast 2, mPH 3, and in puberty mPH in boys. Interestingly,
development compared with male genital development although much concern surrounds assessment of breast
suggest the possibility that girls may be more accurate development in females and the ability to distinguish
2854 Campisi et al Reliability of Self-assessed Puberty Stages J Clin Endocrinol Metab, August 2020, 105(8):2846–2856
true glandular tissue from adipose tissue, these re- its categories do not align 1:1 with the 5 Tanner stages.
sults did not indicate that overweight/obese girls’ self- We also excluded studies reporting parent-assessed or
assessments were more inaccurate than those made by parent-assisted assessments instead of self-assessments
girls of normal weight. Although encouraging, these conducted by the adolescent independently because our
results should be considered preliminary within this goal was to evaluate self-assessment. Inconsistent re-
subpopulation as they are limited to 1 or 2 studies. More sults regarding the correlation between parent reports
evidence including those from different geographic loca- and physical examination warrant further examination
tions and ethnicities would increase the representative- as a study by Rasmussen et al (31) reported that parents
ness of these findings. tended to underestimate pubertal development, whereas
Our meta-analysis does have limitations. Although Dorn et al (48) reported a correlation ranging from
physical examination is the gold standard for assessment r = 0.75 to 0.87. Regardless, self-assessment is unlikely
in a school-based survey. J Pediatr Endocrinol Metab. 37. Dorn LD, Dahl RE, Woodward HR, Biro F. Defining the bound-
2015;28(3-4):367–374. aries of early adolescence: a user’s guide to assessing pubertal
25. Lamb MM, Beers L, Reed-Gillette D, McDowell MA. Feasibility status and pubertal timing in research with adolescents. Appl Dev
of an Audio Computer-Assisted Self-Interview method to self- Sci. 2006;10(1):30–56
assess sexual maturation. J Adolesc Health. 2011;48(4):325–330. 38. Walker IV, Smith CR, Davies JH, Inskip HM, Baird J. Methods
26. Lee K, Valeria B, Kochman C, Lenders CM. Self-assessment of for determining pubertal status in research studies: literature re-
height, weight, and sexual maturation: validity in overweight view and opinions of experts and adolescents. J Dev Orig Health
children and adolescents. J Adolesc Health. 2006;39(3):346–352. Dis. 2020;11(2):168–187.
27. Leone M, Comtois AS. Validity and reliability of self-assessment 39. Coleman L, Coleman J. The measurement of puberty: a review. J
of sexual maturity in elite adolescent athletes. J Sports Med Phys Adolesc. 2002;25(5):535–550.
Fitness. 2007;47(3):361–365. 40. Biro FM, Dorn LD. Issues in Measurement of Pubertal Development.
28. Norris SA, Richter LM. Usefulness and Reliability of Tanner Handbook of Anthropometry. New York: Springer; 2012:
Pubertal Self-Rating to Urban Black Adolescents in South Africa. 237–251.
Journal of Research on Adolescence. 2005;15(4):609–624. 41. Dorn LD. Moving research on puberty forward: measures are the