Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: Most patients requesting aesthetic rejuvenation treatment expect to look healthier and younger. Some
Paper received 22 October 2015 scales for ageing assessment have been proposed, but none is focused on patient age prediction. The aim
Accepted 27 January 2016 of this study was to develop and validate a new facial rating scale assessing facial ageing sign severity.
Available online 15 February 2016
One thousand Caucasian patients were included and assessed. The Rasch model was used as part of
the validation process. A score was attributed to each patient, based on the scales we developed. The
Keywords:
correlation between the real age and scores obtained, the inter-rater reliability and test-retest reliability
Facial ageing
were analysed. The objective was to develop a tool enabling the assigning of a patient to a specific age
Score
Rejuvenation medicine
range based on the calculated score.
Rejuvenation surgery All scales exceeded criteria for acceptability, reliability and validity. The real age strongly correlated
Ageing with the total facial score in both sex groups.
Facial ageing scale The test-retest reliability confirmed this strong correlation.
We developed a facial ageing scale which could be a useful tool to assess patients before and after
rejuvenation treatment and an important new metrics to be used in facial rejuvenation and regenerative
clinical research.
© 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights
reserved.
http://dx.doi.org/10.1016/j.jcms.2016.01.022
1010-5182/© 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
776 S. La Padula et al. / Journal of Cranio-Maxillo-Facial Surgery 44 (2016) 775e782
Table 1
Scale assessment of the facial age.
Upper face
Forehead lines at rest No lines Mild lines Moderate lines Severe lines
0 1 2 3
Forehead lines dynamics No lines Mild lines Moderate lines Severe lines
0 1 2 3
Brow positioning Very high with arch High with arch Medium Low and flat
0 1 2 3
Glabellar lines at rest No glabellar lines Mild glabellar lines Moderate glabellar lines Severe glabellar lines
0 1 2 3
Glabellar lines dynamic No glabellar lines Mild glabellar lines Moderate glabellar lines Severe glabellar lines
0 1 2 3
Crow's feet at rest No wrinkles Mild wrinkles Moderate wrinkles Severe wrinkles
0 1 2 3
Crow's feet dynamic No wrinkles Mild wrinkles Moderate wrinkles Severe wrinkles
0 1 2 3
Inferior eyelids dark No dark circles and bags Mild dark circles and bags Moderate dark circles and bags Severe dark circles and
circles and bags 0 1 2 bags
3
Superior eyelid skin elasticity Eyelid fold well defined Mild loss of skin elasticity Moderate loss of skin elasticity Severe skin redundancy
0 1 2 3
Mid face
Lower face
Hair High hair density Mild loss of hair density Moderate loss of hair density Severe loss of hair density
0 1 2 or baldness (men)
3
Skin Thick and elastic skin Mild loss of thickness Moderate loss of thickness Severe loss of thickness
0 and elasticity and elasticity and elasticity
1 2 3
S. La Padula et al. / Journal of Cranio-Maxillo-Facial Surgery 44 (2016) 775e782 777
ageing-related facial changes, so that all scale severity grades could examines difference (or match) between the observed scores (rater
be represented. responses to items) and the expected values predicted by the Rasch
One thousand Caucasian patients (500 men and 500 women) model, which is assessed using a range of statistical tests to
aged 18 to 75, with Fitzpatrick skin types I to IV, were included in examine each scale item (Rasch, 1960; Andrich et al., 1988a,b). This
the study. Exclusion criteria were: previous rejuvenation surgery, model allows assessing the overall scale quality. Our results were
botulin toxin or filler treatment, antiretroviral-related facial lipo- interpreted as follows:
atrophy, temporary or permanent make-up and a disease causing
premature ageing (Lu et al., 2014; Seco-Cervera et al., 2014; Gordon 2.3.1. Item response category
et al., 2014). Subject two-dimensional photographs were taken Each item of the FACE-Objective Assessment Scale could belong
with a high-resolution photography system by two independent to four severity categories (no, mild, moderate, severe), which
plastic surgeons under the same light conditions and in the same reflect an ordered continuum that increases for the construct of
sitting position: patient facing the surgeon. A digital database interest. A threshold is the location at which the probability of
containing the 1000 subject photographs was created. All subjects responding in adjacent pairs of response options is 50% (Andrich
were informed about the study purpose and gave their consent for et al., 1988a,b). When the categories operate as expected, the
data analysis and publication. Fifteen raters (6 plastic surgeons, 3 thresholds are ordered. “Disordered” thresholds imply that the
dermatologists, 2 nurses, 2 psychologists, and 2 hospital secre- response categories for that item are not operating as expected.
taries), who participated in the scale validation process, not aware They occur when responders show difficulties to consistently
of the overall subject selection, were asked to rate photographs. The distinguish between the different response options (Zhu et al.,
scales were printed next to a separate field to enter the ratings for 1997). When the response options operate as expected, the scale
the 20 aesthetic areas of interest. Photographs were shown at the validity is confirmed (Andrich, 1982).
same time and in the same office to the raters using 15 identical
computers with the same image setting. The raters were asked to 2.3.2. Item matching statistics
use four severity items (no, mild, moderate, severe) and to note in The items of a scale must match together as a conformable set
the separate field next to the scales, the ageing sign severity for both clinically and statistically. When items do not match together
each aesthetic area of interest in the 1000 subjects. Each rater (mismatch), it would be inappropriate to sum the individual item
independently made the assessments in the scale validation cycle, responses to obtain a total score. Matching statistics are usually
assigning a total score derived from each scale score sum. At the interpreted together in the context of their clinical usefulness as an
end of the assessment, the real age was noted. The experts were item set, but as a guide, the residual match should range
instructed to assess patients independently and to return the between 2.5 and þ 2.5, and chi-square values should not be
printed scales with their ratings. The entire assessment process significant.
lasted two days and was repeated one month later to test the
intrarater reliability. Some cropped images representing different
2.3.3. Item locations
aesthetic areas were chosen from the 1000 subjects to be coupled
The scale items define a continuum, and inspecting where items
to the scales at the end of the validation process, based on the area
are located on the continuum shows how well the items map out a
of interest, image quality and clarity. Images from the photograph
construct. Items should be evenly distributed within a reasonable
database were selected to represent varying degrees of severity of
range.
the ageing processes in the corresponding facial areas. Selected
images were then associated with each numerical grade of the 20
scales (Fig. 1). Photographs were considered eligible to be associ- 2.4. Internal consistency reliability
ated with the scales if at least seven raters assigned the same rating
for a definite aesthetic area in a given patient. The scales were built Internal consistency assesses the extent to which individual
in strict accordance with recommended guidelines to develop a scale items are consistent to each other and reflect an underlying
scientifically credible and clinically meaningful tool (Hays et al., construct. Internal consistency of the FACE-Objective Assessment
1993; Cano and Hobart, 2008; Klassen et al., 2010; Lasch et al., Scale and its dimensions were estimated using the Cronbach's a
2010; Mokkink et al., 2010). Using an inductive methodological coefficient, ranging between 0 (no internal consistency) and 1
approach, all scale ratings were combined to obtain the score sum (High degree of internal consistency) (Cronbach, 1951). It is
of the different aesthetic areas (upper, mid, lower face, hair and considered a measurement of the scale reliability.
skin) to calculate a total facial score which was used to holistically
investigate scale validity to predict subject age. Descriptive Statis- 2.5. Person separation index
tics (arithmetic mean, standard deviation) were calculated for pa-
tient age and score. The Rasch model was used as part of the This reliability statistic is comparable to the Cronbach's a coef-
validation process. ficient and quantifies the error associated with subject measure-
The correlation between the real age and scores obtained, the ments in a sample. Higher values indicate a greater reliability.
inter-rater reliability and test-retest reliability were analysed.
2.6. Reliability and validity of the FACE-Objective assessment scale
2.3. Rasch measurement theory
Validity of the Facial Score was assessed by investigating the
The Rasch measurement theory was used (RUMM2030 soft- correlation of the scores obtained for each patient with their age,
ware) to analyse the FACE-Objective Assessment Scale (Wright and using the Pearson test (r) (Alkrisat and Dee, 2014). The inter-rater
Masters, 1982; Andrich et al., 1988a,b; Andrich, 2004). It examines reliability was analysed to assess the reliability of aesthetic scales
differences between observed and predicted item responses to (Neumann et al., 2000).
determine the extent to which data for a set of items match a The scores following a normal distribution obtained by each
mathematical model. When the data match the Rasch model, the rater were compared using a paired T test. Repeatability or test-
measurement theory (i.e., a scale measures a specific construct) is retest reliability was used to observe if the intrarater variability
supported by the data. The Rasch measurement theory analysis could be excluded (Rieu et al., 2015).
778 S. La Padula et al. / Journal of Cranio-Maxillo-Facial Surgery 44 (2016) 775e782
Fig. 1. Representative example of two photonumerical rating scales for the upper and lower face. Forehead lines at rest: no lines (a); mild lines (b); moderate lines (c); severe lines
(d). Nasolabial folds: no folds (e); mild folds (f); moderate folds (g); severe folds (h).
All patients were reassessed one month later by the same raters using the Pearson test. A value of p < 0.05 was considered signifi-
to test the score accuracy and possible changes over time. The total cant. Continuous variable normal distribution was analysed using
scores obtained one month later and their correlation with patient the KolmogoroveSmirnov test. All analyses were performed using
real age following a normal distribution were compared to the PRISM, version 5 (Graph Pad, USA). All the authors had full access to
initial scores using a paired T test. The same data were analysed and take full responsibility for the integrity of the data.
S. La Padula et al. / Journal of Cranio-Maxillo-Facial Surgery 44 (2016) 775e782 779
Table 3 and men mean age was 45.1 ± 10.4 years. The mean score obtained
Overall fit to the Rasch model and person separation index for each scale. in women and men at the time of the first rating was respectively
Scale Degrees P X2 Person 28.2 ± 11.3 and 28.3 ± 12.3. At the time of the second rating (one
of separation month later), the mean score was respectively 28.2 ± 11.5 and
freedom index 28.4 ± 11.3. Twelve different age group series were randomly
Forehead lines at rest 15 0.16 24.3 0.88 selected from the patient database and the mean score and stan-
Forehead lines dynamics 20 0.56 33.8 0.90 dard deviation were calculated in each age group (Table 2).
Brow positioning 20 0.41 15.3 0.89
Glabellar lines at rest 20 0.15 21.7 0.90
Glabellar lines dynamic 20 0.12 32.4 0.90 3.2. Rasch measurement theory
Crow's feet at rest 16 0.74 41.7 0.90
Crow's feet dynamic 16 0.23 13.3 0.90 The matching statistics for the Rasch model are summarised in
Inferior eyelids dark 20 0.34 34.9 0.90
Table 3 (how closely observed data matched with those expected
circles and bags
Superior eyelid 16 0.15 49.5 0.90
by the model). The targeting was good and all items in each of the
skin elasticity 20 scales showed ordered thresholds, indicating that the raters
Infraorbital hollow 20 0.25 55.8 0.90 were able to distinguish between the four item options (no, mild,
Cheek fullness 20 0.16 14.8 0.90 moderate, severe). A non-significant chi-square value confirmed
Nasolabial folds 20 0.60 33.3 0.90
that the 20 scales matched the Rasch model. All the scale items had
Marionette lines 16 0.21 54.8 0.90
lip wrinkles at rest 15 0.55 31.2 0.88 a residual matching within the recommended range of 2.5
Lip wrinkles dynamic 20 0.45 27.4 0.90 to þ2.5. The Person Separation Index values for each scale were
Oral commissures 20 0.41 20.1 0.90 greater than or equal to 0.8, indicating a good reliability. These
Jawline 20 0.33 61.8 0.90 findings supported the reliability and validity of each of the 20
Neck folds 30 0.45 44.7 0.90
Hair 40 0.67 21.5 0.85
scales for their respective constructs.
Skin 41 0.70 34.8 0.84
3.3. Internal consistency reliability: Cronbach's alpha coefficients
(Table 4)
2.7. Preliminary FACE-Objective Assessment Scale validation in
clinical practice All scales exceeded criteria for acceptability, reliability and val-
idity. In particular, Cronbach's alpha coefficients (0.90) and intra-
To test its efficacy in real-practice conditions, the FACE- class correlation coefficients (0.78) supported scale reliability and
Objective Assessment Scale was used to assess 70 patients (35 validity. These findings indicated that the items of each scale
men and 35 women). They underwent a facelift surgery performed formed a statistically conformable group, and that these scores
by the same surgeon. The ageing facial sign severity was rated by were reliable and valid.
each patient and by the surgeon before surgery and postsurgery
after a 3-month follow-up. 3.4. Validity of the total facial score
The Pearson correlation showed that the real age very strongly
3. Results correlated with the total facial score both in the female and male
groups (Fig. 2a-b). The 12 age group mean scores strongly corre-
3.1. Descriptive statistics lated with the mean age of each group (Fig. 2c). The second vali-
dation cycle (test-retest reliability) confirmed all these strong
Patient mean age was 44.7 ± 14.1 years (range: 18e76). The correlations with almost identical correlation coefficients (Table 5).
men-to-women ratio was 1. Woman mean age was 44.5 ± 11.4 years
3.5. Reliability of the total facial score
Table 5 Acknowledgements
Correlation of patients age with the total face score at first assessment and one
month later; PCCV (Pearson correlation coefficient value).
The authors declare that all contributors meet the criteria for
Raters PCCV PCCV PCCV PCCV authorship.
females males females males
(test-retest reliability) (test-retest reliability)
References
Rater 1 0.97 0.98 0.97 0.98
Rater 2 0.97 0.98 0.97 0.98 Acaster S, Cimms T, Lloyd A: Design and selection of patient reported outcome
Rater 3 0.97 0.98 0.98 0.98 measures for use in patient centered outcomes research. San Francisco, CA:
Rater 4 0.97 0.97 0.98 0.98 Oxford Outcomes, 2012
Rater 5 0.96 0.98 0.96 0.98 Andrich D: Controversy and the Rasch model: a characteristic of incompatible
Rater 6 0.98 0.98 0.98 0.98 paradigms? Med Care 42(Suppl): 1e16, 2004
Rater 7 0.97 0.99 0.97 0.98 Andrich D: Rasch models for measurement. Newbury Park, Calif: Sage, 1988a
Rater 8 0.98 0.98 0.98 0.98 Andrich D: Rasch models for measurement. Beverley Hills, CA: Sage, 1988b
Rater 9 0.98 0.98 0.98 0.97 Andrich D: An index of person separation in latent trait theory, the traditional
Rater 10 0.97 0.98 0.97 0.98 KR20 index and the Guttman scale response pattern. Educ Psychol Res 9(1): 9,
1982
Rater 11 0.98 0.98 0.98 0.98
Alkrisat M, Dee V: The validation of the coping and adaptation processing scale
Rater 12 0.98 0.97 0.98 0.97
based on the Roy adaptation model. J Nurs Meas 22(3): 368e380, 2014
Rater 13 0.97 0.97 0.98 0.98
Cano SJ, Hobart JC: Watch out, watch out, the FDA are about. Dev Med Child Neurol
Rater 14 0.98 0.98 0.98 0.98 50(6): 408e409, 2008
Rater 15 0.96 0.98 0.97 0.98 Chauhan N, Warner JP, Adamson PA: Perceived age change after aesthetic facial
surgical procedures quantifying outcomes of aging face surgery. Arch Facial
Plast Surg 14: 258e262, 2012
Codner MA, Kikkawa DO, Korn BS, Pacella SJ: Blepharoplasty and brow lift. Plast
Reconstr Surg 126: 1e17, 2010
However, our study has some limitations. First, our sample only Cronbach LJ: Coefficient alpha and the internal structure of tests. Psychometrika 16:
included Caucasian subjects. Future research could investigate the 297e334, 1951
use of our scales in Black and Asian patients. Second, a bias could Cula GO, Bargo PR, Nkengne A, Kollias N: Assessing facial wrinkles: automatic
detection and quantification. Skin Res Technol 19: 243e251, 2013
have been introduced during the patient enrolment process. Funk W, Podmelle F, Guiol C, Metelmann HR: Aesthetic satisfaction scoring e
Further studies are thus needed to confirm our findings and make introducing an aesthetic numeric analogue scale (ANA-scale).
the FACE-Objective Assessment Scale a universally accepted tool for J Craniomaxillofac Surg 40(5): 439e442, 2012
Gordon LB, Massaro J, D'Agostino Sr RB, Campbell SE, Brazier J, Brown WT, et al:
facial ageing assessment.
Impact of farnesylation inhibitors on survival in Hutchinson-Gilford progeria
syndrome. Circulation 1(130): 27e34, 2014
Hays R, Anderson R, Revicki D: Psychometric considerations in evaluating health-
5. Conclusion related quality of life measures. Qual Life Res 2: 441e449, 1993
Honigman R, Castle DJ: Aging and cosmetic enhancement. Clin Interv Aging 1:
115e119, 2006
We suggest using the FACE-Objective Assessment Scale both in Klassen AF, Cano SJ, Scott A, Snell L, Pusic AL: Measuring patient-reported outcomes
clinical research and practice. This is an adjunctive and easy tool to in facial aesthetic patients: development of the FACE-Q. Facial Plast Surg 26:
be used to perform more complete initial and follow-up assess- 303e309, 2010
Kosowski TR, McCarthy C, Reavey PL, Scott AM, Wilkins EG, Cano SJ, et al:
ments in patients undergoing facial rejuvenation treatments.
A systematic review of patient-reported outcome measures after facial cosmetic
surgery and/or nonsurgical facial rejuvenation. Plast Reconstr Surg 123:
1819e1827, 2009
Ethical approval Lasch K, Marquis P, Vigneuz M, Abetz L, Arnould B, Bayliss M, et al: PRO develop-
All procedures performed in studies involving human partici- ment: rigorous qualitative research as crucial foundation. Qual Life Res 19: 9,
2010
pants were in accordance with the ethical standards of the insti- Lu H, Fang EF, Sykora P, Kulikowicz T, Zhang Y, Becker KG, et al: Senescence induced
tutional and/or national research committee and with the 1964 by RECQL4 dysfunction contributes to RothmundeThomson syndrome features
Helsinki declaration and its later amendments or comparable in mice. Cell Death Dis 15(5): 1226, 2014
Marshall S, Haywood K, Fitzpatrick R: Impact of patient- reported outcome measures
ethical standards.
on routine practice: a structured review. J Eval Clin Pract 12: 559e568, 2006
Miyamoto K, Nagasawa H, Inoue Y, Nakaoka K, Hirano A, Kawada A: Development of
new in vivo imaging methodology and system for the rapid and quantitative
Disclosure evaluation of the visual appearance of facial skin firmness. Skin Res Technol 19:
The authors have no financial interest to declare including any 525e531, 2013
Mokkink L, Terwee C, Patrick D, Alonso J, Stratford PW, Knol DL, et al: The COSMIN
support of grants in relation to the context of this article. checklist for assessing the methodological quality of studies on measurement
properties of health status measurement instruments: an international Delphi
study. Qual Life Res 19(4): 539e549, 2010
Author contribution to the content Neumann L, Press J, Glibitzki M, Bolotin A, Rubinow A, Buskila D: CLINHAQ scale
First author: Study design/writing and data collection. validation of a Hebrew version in patients with fibromyalgia. Clinical Health
Assessment Questionnaire. Clin Rheumatol 19(4): 265e269, 2000
Second author: Statistics. Panchapakesan V, Klassen AF, Cano SJ, Scott AM, Pusic AL: Development and Psy-
Third author: Statistics. chometric evaluation of the FACE-Q aging appraisal scale and patient-perceived
Fourth author: Data collection. age visual analog scale. Aesthet Surg J 33: 1099e1109, 2013
Rasch G: Probabilistic models for some intelligence and attainment tests. Copen-
Last author: Study designer. hagen: Danish Institute for Education Research, 1960
Raschke GF, Rieger UM, Bader RD, Schaefer O, Guentsch A, Gomez Dammeier M,
et al: Perioral aging e an anthropometric appraisal. J Craniomaxillofac Surg
Funding 42(5): 312e317, 2014
None. Rieu I1, Martinez-Martin P, Pereira B, De Chazeron I, Verhagen Metman L,
Jahanshahi M, et al: International validation of a behavioral scale in Parkinson's
disease without dementia. Mov Disord 30(5): 705e713, 2015 Apr 15
Rzany B, Carruthers A, Carruthers J, Flynn TC, Geister TL, Go €rtelmeyer R, et al:
Conflict of interest statement Validated composite assessment scales for the global face. Dermatol Surg 38:
None. 294e308, 2012
782 S. La Padula et al. / Journal of Cranio-Maxillo-Facial Surgery 44 (2016) 775e782