Professional Documents
Culture Documents
net/publication/345308238
CITATIONS READS
3 1,383
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Development of the Chinese Version of the Sport Motivation Scale-II View project
All content following this page was uploaded by Masato Kawabata on 07 November 2020.
1
Nanyang Technological University,
E-mail: masato-k@hotmail.com
*corresponding author
2
The University of Queensland,
School of Human Movement and Nutrition Sciences, Brisbane QLD 4072, Australia
3
Queensland University of Technology,
School of Exercise and Nutrition Sciences, Kelvin Grove, QLD 4059, Australia
Acknowledgements
The present study was conducted without financial support and preregistration. Parts of
this paper were presented at the Association for Applied Sport Psychology's 2018 Annual
The authors declare that the research was conducted in the absence of any commercial or
The datasets used and/or analyzed during the current study will be available from authors
on reasonable request.
3
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Abstract
measure one’s level of mental toughness. Despite its wide popularity in psychological
studies, the questionnaire has been criticized due to its factorial validity. The present study
aimed to re-assess the factorial validity of the instrument and propose alternative models to
provide researchers with theoretically and practically useful instruments to measure mental
toughness. Two studies were conducted using large samples of university students (Study 1: n
= 2,186; Study 2: n = 3,209). In Study 1, none of 1-, 4- and 6-factor models with 48 items
satisfactorily fit the data set. Instead, two refined 18- and 6-item versions of the
questionnaire, covering 6 aspects of mental toughness, were proposed: the Short MTQ and
Very Short MTQ. Both measures demonstrated excellent fit to the data. These results were
replicated with a larger independent sample in Study 2. With the Short MTQ, it is possible to
toughness factor and 6 specific factors. The Very Short MTQ is a practical tool for occasions
where constraints prevent use of the Short MTQ. The refined questionnaires are promising
options to measure and understand individuals’ mental toughness with the MTQ.
In the performance psychology literature, the past two decades has witnessed an
exponential increase of research and applied interest in the topic of mental toughness.
Broadly defined, mental toughness is a personality trait that determines how people deal
effectively with challenges, stressors, and pressure, regardless of the circumstances (Clough
& Strycharczyk, 2015). To outline this capacity, researchers have long debated the core
subsequently developed several conceptual models (e.g., Jones, Hanton, & Connaughton,
2007). Based on these models, they have also devised a collection of self-report instruments
for assessing mental toughness (see Coulter, Mallett, & Singer, 2018), of which the most
widely used is the Mental Toughness Questionnaire-48 (MTQ48; Clough, Earle, & Sewell,
2002).
The MTQ48 operationalizes the 4/6Cs model of mental toughness. Clough et al.
(Kobasa, 1979), which buffers the impacts of stress. They proposed that mental toughness is
Challenge, and Control (in life and emotion regulation) – together with the construct of
Confidence (in one’s abilities and interpersonal relationships). Since its publication, the
MTQ48 has been widely regarded as a promising tool for assessing mental toughness (see
Lin, Mutz, Clough, & Papageorgiou, 2017 for a recent review; see also Perry, Clough, Crust,
Earle, & Nicholls, 2013; Vaughan, Hanna, & Breslin, 2018). However, despite its widespread
utilization, the MTQ48 has been criticized, due to its validity issues (e.g., Birch, Crampton,
Greenlees, Lowry, & Coffee, 2017; Gucciardi, Hanton, & Mallett, 2012).
5
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Recent studies examining the psychometric properties of the MTQ48 based on the
4/6C model have not demonstrated full support for the factor structure of the instrument.
From these studies, it has been reported that there are consistent factorial validity issues with
the MTQ48 at both overall and individual parameter levels: a) poor or unsatisfactory overall
fit of hypothesized measurement models to data within the framework of confirmatory factor
analysis (CFA) and exploratory structural equation modeling (ESEM) (1-, 4-, and 6-factor
CFA models: Birch et al., 2017; Gucciardi et al., 2012; Perry et al., 2013; 1- and 4-factor
ESEM models: Vaughan et al., 2018); b) weak convergent validity (items either cross-loading
or not loading well onto target factors; Birch et al., 2017; Vaughan et al., 2018); and c) lack
of discriminant validity between several factors in CFA models (too high correlations [e.g., r
> .90] between Competence (Ability) and Control (Life) factors, Gucciardi et al., 2012; Perry
et al., 2013).
Some researchers have also expressed concerns in the scale development and content
validity of the instrument (i.e., adequacy of item content, clarity, and structure; see Birch et
al., 2017; Gucciardi et al., 2012; Vaughan et al., 2018). Consequently, the emerging evidence
has raised major concerns about the questionnaire’s construct validity, which place
uncertainty about the legitimacy of earlier findings and validity of the MTQ48 as a
the factorial validity issue of the MTQ48. Rather than merely criticizing the instrument, scale
development should be seen as an ongoing process, and efforts to improve the measure
should also be respected and encouraged (Kawabata, Mallett, & Jackson, 2008; Mallett,
Kawabata, & Newcombe, 2007). As developers of the MTQ48, Clough and colleagues have
welcomed refinement of their measure on an ongoing basis (Perry et al., 2013). Several
6
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
approaches are proposed to refine the instrument. The first approach is to identify good and
problematic items to measure each target factor. Subsequently, problematic items need to be
replaced with new, good items or removed from the instrument. Because the MTQ48 is
copyrighted material, the latter approach is suitable for non-developers of the MTQ48 to
factorial validity issues, in the present study) from theoretical, empirical, and practical
perspectives (see Mallett et al., 2007, for the details of the three perspectives). A factorial
validity issue that has been overlooked in past psychometric studies on the MTQ48 is how to
represent a global construct of mental toughness, based on the 4/6C model. For example, a
single global mental toughness score has been calculated based on the total or averaged score
of the 48 items and used in several research studies (e.g., Gerber et al., 2013; Papageorgiou,
Wong, & Clough, 2017). However, such a global representation of mental toughness has not
been supported empirically in the psychometric studies on the MTQ48. For instance, Perry et
al. (2013) examined a single factor measurement model consisting of all the 48 items and
reported that the model did not fit their data satisfactorily according to the overall goodness
of fit indices.
model, the mental toughness construct is specified as a unidimensional construct, rather than
construct, based on the 4/6C model (Clough et al., 2002), and the single factor measurement
model is not suitable to represent the multidimensionality of the mental toughness construct.
Instead, hierarchical (i.e., higher order) and bifactor (i.e., general-specific) models, in which
7
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
global and specific factors coexist, should be employed to examine the presence of a global
question emerges from a practical perspective. Namely, which score should be used for
correlation analysis when structural equation modeling (SEM) is unavailable; a scale score or
factor score? Scale scores, such as total or averaged scores, are often used by summing up or
averaging item scores when the sample size is too small to analyze data within the framework
of SEM. Even when the sample size is large enough to examine the factor structure of an
instrument with CFA or ESEM in the preliminary analysis, and it is possible to calculate
factor scores, some researchers still use scale scores for correlation analysis (see
Papageorgiou et al., 2017). However, it is important for researchers to understand that sum
scoring requires a quite restricted model which is different from a model used to validate the
scale through factor analysis (see McNeish & Wolf, 2020, for details of this issue).
Latent scores and correlations are corrected for measurement errors within the SEM
framework, whereas scale scores are purely based on items which include a part of random
measurement error (Morin et al., 2016). Pearson’s correlations based on scale scores tend to
be lower than CFA-based latent correlations that are more sensitive to measurement error
(see Mallett et al., 2007, for further details of this tendency). However, it is unknown how
different outcomes emerge when correlation analysis is conducted with Pearson’s correlations
based on scale scores and factor scores, rather than latent correlations obtained from the SEM
framework. This question is practically important for the MTQ48 users to confidently
conduct correlation analysis based on scale scores and then interpret the results when SEM is
Dagnall et al. (2019) recently examined the factor structure of other shorter versions
of the MTQ48 (MTQ18: Clough et al., 2002; MTQ10: Papageorgiou et al., 2018) for the
responses collected from 944 high school students within the framework of CFA. They
reported that the overall fit of the 4-factor first-order model and the bifactor model was
acceptable for the MTQ10 responses from high school students. However, information about
individual parameters, such as latent factor correlations and factor loadings, was not
sufficiently reported for each of the models in their study. As a result, it is unclear if the two
CFA models fit the MTQ10 data at the individual parameter level.
The current investigation aimed to improve the MTQ48 by resolving its factorial
validity issue from theoretical, empirical, and practical perspectives. To this end, the
investigation was conducted in three stages, consisting of two studies. Stage 1 of the first
study involved re-assessing the factorial validity and reliability of the MTQ48 with a large
sample of university students. After confirming the lack of factorial validity, Stage 2 of the
study involved proposing refined versions (short and very short versions) of the MTQ48 to
measure mental toughness through the 4/6C’s framework. In doing so, other previously
overlooked measurement issues were also addressed with the refined versions of the
c) comparisons of correlation values between latent correlations from ESEM models and
Pearson’s correlations based on scale and factor scores. Stage 3 was conducted in the second
study for the cross-validation of the refined versions of the MTQ with another independent
sample. To evaluate the usefulness of the newly refined versions of the MTQ48, their
9
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
psychometric properties were rigorously compared with those of the MTQ18 (Clough et al.,
2002) and the MTQ10 (Papageorgiou et al., 2018) in the present study.
Study 1
Method
Participants. A total of 2,186 university students (802 men, 1,384 women; Mage =
23.9, SD = 8.1, the range of age: 16-70 years old), whose first language was English,
participated in the study. The majority of participants (93.3%) were Australians and the rest
(6.7%) were Americans, British, and Canadians. Participants’ majors were health (29.2%),
science and engineering (20.9%), business (15.6%), law (12.4%), creative industry (11.3%),
Measures.
2002) is a self-report instrument designed to measure one’s level of mental toughness. The
MTQ48 covers four components of mental toughness and consists of six subscales:
commitment, challenge, control (emotion and life), and confidence (abilities and
interpersonal). These components and subscales are abbreviated to 4/6Cs. Respondents were
asked to indicate the degree to which they generally agreed with the statement of each item
on a 5-point Likert-type scale, ranging from 1 (strongly disagree) to 5 (strongly agree). In the
MTQ48, 22 items include negatively worded statements and the scores of these items were
concurrent validity of the refined versions of the MTQ, participant’s perceived stress level
was measured with the stress subscale of the DASS-21 (Lovibond & Lovibond, 1995). The
stress subscale, consisting of 7 items, was only used in the present study. Respondents were
10
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
asked to indicate the degree to which each statement applied to them over the last week on a
4-point Likert-type scale ranging from 0 (did not apply to me at all) to 3 (applied to me very
Procedure. The current study was approved by the institutional ethics review
committee of Queensland University of Technology and adhered to the guidelines for ethical
practice. University students were invited to participate in the study via email in Australia.
Participation was voluntary and informed consent was obtained from each of the participants
In the first stage, the factorial validity and reliability of the MTQ48 was re-examined
with the current large sample. After observing the lack of the factorial validity of the MTQ48,
problematic items related to the factorial validity issue were identified statistically. In the
second stage, two refined versions of the MTQ48 – an 18-item short version (Short MTQ [S-
MTQ]) and a 6-item brief version (Very Short MTQ [VS-MTQ]) – were proposed from
theoretical and empirical perspectives and, subsequently, their factorial validity and
Item selection. Previous research (Birch et al., 2017; Gucciardi et al., 2012; Vaughan
et al., 2018) has identified conceptual misfit in several of the MTQ48 items that question its
adequacy to represent the dimension definitions of an underpinning 4/6C model. Despite this
the current study, Lynn’s (1986) Content Validation Index (CVI) guidelines were used to
conduct the face validity check on the MTQ48 items. Following these guidelines, three
independent experts separately reviewed the content and structure of the items. The experts
are well published in mental toughness literature and experienced in the procedures and
In reviewing the MTQ48 items, the experts were asked to rate the relevancy of each
item to its hypothesized factor definition. Relevancy was rated across a 4-point scale, where
‘1’ implies an irrelevant item and ‘4’ a very relevant item (see Lynn, 1986). The CVI score
for an item was determined by the proportion of experts who rated it as content valid. With
the MTQ48 being a previously published instrument, items were only considered content
valid if they received a score of 4 from all expert raters (i.e., 100%). The CVI score for each
item ranged from 0 (0 × 3 raters) to 12 (4 × 3 raters). The CVI score for the whole
questionnaire was calculated as the proportion of total items judged content valid (i.e., the
review, the experts were also asked to clarify decisions made in rating each item’s relevancy
Identifying reliable and clear definitions of the 4/6Cs is a convoluted task. Different
sources (e.g., technical manual, user guide, empirical articles) are not always consistent in
their descriptive language of each dimension. Moreover, the breadth of descriptors linked to
each dimension (i.e., the attributes of people with high and low scores) make it problematic to
articulate what is, in fact, the core definition of each 4/6C component. Using the MTQ48
technical manual (Clough, Perry, Crust, Strycharczyk, & Rowlands, 2015) and other
comprehensive reviews of the MTQ48 (e.g., Clough & Strychrczyk, 2012), the research team
identified consistent descriptors that define the questionnaire’s 6 main subscales as follows.
opportunity.
• Commitment: The extent to which an individual is likely to persist with a goal, despite
• Control Emotion: The extent to which people control their anxieties and emotions.
12
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
• Control Life: The extent to which people believe they have sufficient control over their
• Confidence Interpersonal: The extent to which people are prepared to assert themselves
Expert raters were instructed to only review the relevancy of items against these core
Data analyses. To examine the factor structure of the MTQ48, confirmatory factor
analysis (CFA) and exploratory structural equation modeling (ESEM) were conducted with
Mplus (Version 8.4; Muthén & Muthén, 1998-2019) based on Mplus robust maximum
likelihood estimation (MLR). In the CFA model, each item was allowed to load on only one
target factor and all non-target cross-loadings were constrained to be zero. In the ESEM
model, all items were allowed to load on every factor and all factor loadings were estimated
by imposing appropriate restrictions on the factor loading matrix and the factor covariance
matrix (Asparouhov & Muthén, 2009; Marsh et al., 2010). An oblique geomin rotation was
used in the ESEM model, because the MTQ48 factors are expected to covary and the geomin
rotation criterion is the most effective criterion when the true factor loading structure is
In the first stage, CFA and ESEM were conducted for three hypothesized models (1-,
4-, and 6-factor models) with 48 items. Clough et al. (2002) also proposed 18 items for a
shorter version of the MTQ48 (Commitment: Items 11, 35, 42; Challenge: Items 14, 23, 30;
Control Life: Item 2; Control Emotion: Items 21, 27, 31, 37; Confidence Abilities: Items 3,
13
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
13, 16, 36; Confidence Interpersonal: Items 17, 43, 46.) Furthermore, Papageorgiou et al.
(2018) proposed 10 items for another shorter version of the MTQ48 (Commitment: Items 11,
42; Challenge: Items 23, 30; Control Life: Item 2; Control Emotion: Items 27, 31; Confidence
Abilities: Items 3, 16, 36). Dagnall et al. (2019) recently examined the factor structure of
other shorter versions of the MTQ48 (MTQ18: Clough et al., 2002; MTQ10: Papageorgiou et
al., 2018) within the framework of CFA. They reported the results of 1- and 4-factor CFA
measurement models. For completeness, the 18-items and 10-items 1-factor and 4-factor
models were examined with CFA and ESEM in the present study. In the second stage, CFA
and ESEM were conducted for the models with selected items. When a first-order
measurement model fit the data adequately, hierarchical and bifactor ESEM models were also
examined. Following the procedures by Morin et al. (2016) and Morin and Asparouhov
(2018), an orthogonal target rotation was used for the bifactor ESEM.
To assess overall model fit, several criteria were used: the MLR chi-square statistic
(Muthén & Muthén, 1998–2019), the comparative fit index (CFI; Bentler, 1990), the Tucker-
Lewis index (TLI; Tucker & Lewis, 1973), the root mean square error of approximation
(RMSEA; Steiger, 1990), and the standard root mean square residual (SRMR; Hu & Bentler,
1998). Values on the CFI and TLI that are greater than 0.90 and 0.95 are generally taken to
reflect acceptable and excellent fits to the data (e.g., Marsh et al., 2010). For the RMSEA,
values of 0.05 or less indicate a close fit, and 0.08 or less indicate an adequate fit (Brown &
Cudeck, 1993). Values on the SRMR that are less than 0.08 indicate an adequate fit (Hu &
Bentler, 1998). Conventional multiple cut-off values (i.e., the CFI and TLI ≥ 0.90, the
RMSEA ≤ 0.08, the SRMR ≤ 0.08) were considered minimum thresholds for accepting
overall model fit. For the assessment of the fit of individual items, standardized factor
After confirming that the hypothesized factor structure of the MTQ was tenable for
the current data, the internal consistency reliability of the MTQ responses was assessed using
Cronbach’s (1951) coefficient alpha (α) and McDonald’s (1999) coefficient omega (ɷ). The
that a set of items are measuring the single construct before reporting α as a measure of
reliability of the set of the observed scores (Hayes & Coutts, 2020). The assumption of equal
factor loadings (tau equivalent) is essential for α, but ɷ is not based on the assumption.
Methodologists (e.g., Hayes & Coutts, 2020; Raykov, & Marcoulides, 2011) recommend
using ɷ instead of α because ɷ is a more general estimator of reliability. However, α has been
commonly used in the literature on the measurement in mental toughness. Therefore, both
reliability coefficients were reported in the present study, for the sake of completeness.
Results
Descriptive analyses. The means of the 48 item scores ranged from 2.24 (SD = 1.07)
to 4.27 (SD = .66). The items with the lowest and highest mean scores were Item 27 (Control
[Emotion]: “I tend to worry about things well before they actually happen”) and Item 19
respectively.
Stage 1: Re-examination of the factor structure of the MTQ48. None of 1-, 4-, and
6-factor CFA models with 48 items fit to the data adequately (see Table 1). Although values
on the RMSEA and SRMR were acceptable, values on the CFI and TLI were consistently
below minimum acceptable levels for the three models. Similar to the CFA models, all of 1-,
4-, and 6-factor ESEM models with 48 items did not fit to the data satisfactorily. To identify
15
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
problematic items, the factor loadings of 48 items were carefully examined based on the
solutions of the 6-factor ESEM model. It was found that 12 of 22 items, including negatively
worded statements, did not load well to their targeted factor. The wording effect of the
negatively worded item (Wang, Chen, & Jin, 2014) was apparent in the data.
The 1-factor CFA model with the 18 items (Clough et al., 2002) and 10 items
(Papageorgiou et al., 2018) did not fit the data adequately (Table 1). These results were
consistent with the unsatisfactory fit of the 1-factor CFA model reported in Dagnall et al.
(2019). Although they correlated seven pairs of error terms for the MTQ18 and two pairs for
the MTQ10 to achieve an adequate model fit, it is not encouraged to free up parameters on
the basis of modification indices without substantive meaningfulness (Byrne, 2005). Dagnall
et al. reported goodness-of-fit indices for the 4-factor CFA model; however, the solution of
the model was improper here because the latent correlation between Challenge and Control
was greater than 1. This indicates that the two factors were not empirically indistinguishable.
As for the ESEM results, the 1-factor ESEM model with the 18 items (Clough et al.,
2002) and 10 items (Papageorgiou et al., 2018) did not fit the data either (see Table 1). The 4-
factor ESEM model with the 18 items did not fit data adequately, whereas the 4-factor ESEM
model with 10 items showed an excellent overall fit to the data. However, inspection of item
factor loadings revealed that half of the 10 items did not load on its target factor.
Collectively, the 4-factor CFA model with the 18 items and 10 items produced an
improper solution and the 4-factor ESEM model with the 18 items and 10 items did not fit
hierarchical and bifactor CFA and ESEM models with the 18 items and 10 items were not
Stage 2: Refined versions of the MTQ. At least three items are technically required to
test the fit of a single factor model and calculate the model-based McDonald’s (1999)
mental toughness through the 4/6C’s framework, five items showing next highest CVI scores
with good face validity (Challenge: 1 item; Control Emotion: 1 item; Control Life: 1 item;
Confidence Abilities: 2 items) were added to the 13 items with high face validity (see the
Section of item selection) so that there were three items for each factor. For Confidence
Ability, Items 18 and 24 were selected although their CVI scores (4 for both) were slightly
lower than Items 3 (CVI = 5) and 13 (CVI = 6). The rational of selecting Items 18 and 24 was
that they are the only other items in this factor that link to ability (or lack of ability).
Consequently, 18 items (3 items × 6 factors) were included in the refined short version of the
MTQ (Short MTQ: S-MTQ) (Commitment: Items 7, 29, 47; Challenge: Items 4, 44, 48;
Control Life: Items 2, 12, 41; Control Emotion: Items 27, 31, 45; Confidence Abilities: Items
Subsequently, CFA and ESEM were conducted with the 18 items. The 4- and 6-factor
CFA models did not fit to the data adequately (see Table 1). The overall fit of the 4-factor
ESEM model was satisfactory according to all the overall fit indices. However, it was found
that all three items for Confidence Abilities did not load well on their target factor (factor
loadings varying from -.03 to .16). Instead, they loaded on a non-target factor of Control
(factor loadings varying from .32 to .56). The 6-factor ESEM model fit to the data very well.
In the 6-factor ESEM model, latent correlations between the six factors ranged from .13 to
.42, and factor loadings for the target factor ranged from .12 to .73 (see Table 2). The internal
consistency coefficients (α; ɷ [95% CI] in order) for the six subscales of the S-MTQ were
Control Emotion (.62; .63 [.60-.65]), Control Life (.69; .70 [.67-.72]), Challenge (.69; .71
17
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
[.69-.74]), Commitment (.65; .66 [.64-.69]), Confidence Interpersonal (.60; .63 [.60-.66]),
Because the 6-factor first-order ESEM model fit the data satisfactorily at both overall
and individual parameter levels, corresponding bifactor and hierarchical ESEM models were
also tested. Both bifactor and hierarchical ESEM models fit the data very well (see Table 1).
The bifactor ESEM solution shows that the global mental toughness factor was well-defined
by the presence of strong and significant target loadings from all the 18 items (ranging from
.22 to .68). The six specific factors were also well-defined through strong and significant
target loadings from 16 of 18 items (ranging from .30 to .70). The loading of two items (Item
47 for Commitment; Item 41 for Control Life) to their target factor were non-significant, but
they loaded substantively to the global mental toughness factor (> .56). As for the
hierarchical ESEM solution, the six first-order factors were well-defined through strong and
significant target loadings from all 18 items (ranging from .14 to .86). The factor loadings of
most first-order factors on the global mental toughness factor were significant and substantial
from .40 to .72. However, the loadings from Control Life (.05) and Confidence Interpersonal
(.13) on the global mental toughness factor were non-significant. These results indicated that
Control Life and Confidence Interpersonal were not related to the global mental toughness
factor in the hierarchical ESEM model. Given that the higher-order mental toughness factor
was unable to explain correlations among the 6 first-order factors, the bifactor ESEM model
seems to represent the S-MTQ responses better than the hierarchical ESEM model.
To develop a very short version of the MTQ (VS-MTQ) that covers all the six
components of the MTQ with the minimum number of items, one item was selected for each
of six components from the S-MTQ. In doing so, reverse score items were not selected to
exclude potential wording effects (Wang et al., 2014) from the single factor model. Four sets
18
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
of competing models were proposed from theoretical and statistical perspectives. Statistical
parameters (e.g., overall fit of the CFA model, standardized factor loadings, and internal
consistency coefficients) were similar between the four different models, but one set of six
items (Items 4, 7, 8, 12, 20, 31; all with 100% CVI scores of 12 for the items) was considered
best among them against core dimension definitions. The fit of both 1-factor CFA and ESEM
models with the finally selected six items were excellent (see Table 1). Factor loadings
ranged from .42 (Confidence Interpersonal: Item 20) to 69 (Confidence Abilities: Item 8) in
the CFA and ESEM models. The internal consistency coefficients (α; ɷ [95% CI] in order)
Concurrent validity. Latent correlations between the refined versions of the MTQ
(S-MTQ and VS-MTQ) and the stress factor of the DASS-21 were assessed to examine the
concurrent validity of the S-MTQ and VS-MTQ responses. For the S-MTQ, six factors were
specified as ESEM factors with target rotation and the stress factor was specified as a CFA
factor. Given that Clough et al. (2002) considered mental toughness an extension of hardiness
that buffers the impacts of stress (Kobasa, 1979), individuals who are mentally tough are less
likely to perceive stress symptoms. Gerber et al. (2013) reported negative correlations
between mental toughness subscale scores and a perceived stress score. Thus, it was assumed
that the six MTQ factors would negatively correlate with the stress factor. Both the models
provided an acceptable fit to the data (the first-order MTQ ESEM with the stress CFA model:
χ2 [193, N = 2,186] = 909.68, p < .001; CFI = .957, TLI = .933, RMSEA = .041, SRMR
= .031; the bifactor MTQ ESEM with the stress CFA model: χ2 [180, N = 2,186] = 771.03, p
< .001; CFI = .964, TLI = .940, RMSEA = .039, SRMR = .030). As expected, all the first-
order MTQ factors in the 6-factor ESEM model were significantly and negatively correlated
with the stress factor, ranging from -.17 (Confidence Interpersonal) to -.65 (Control
19
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Emotion). The global mental toughness factor in the bifactor ESEM model also provided a
negative correlation (-.45) with the stress factor (see Table 3).
For the VS-MTQ, both the first-order mental toughness factor and the stress factor
were specified as CFA factors. The model fit the data adequately (MLRχ2 [63, N = 2,186] =
504.794, p < .001; CFI = .926, TLI = .908, RMSEA = .053, SRMR = .053.). The first-order
mental toughness factor was significantly negatively correlated (-.49) with the stress factor. It
was found that the latent correlation (i.e., -.45) between the global mental toughness factor
identified with the S-MTQ and the stress factor was slightly lower than the one (i.e., -.49)
between the first-order mental toughness factor identified with the VS-MTQ and the stress
Finally, correlations between the refined versions of the MTQ (S-MTQ and VS-MTQ)
and the DASS-21 Stress were re-computed by using scale and factor scores to examine if
Pearson’s correlation coefficients based on scale and factor scores were compatible with
latent correlation coefficients obtained from the ESEM models, in which latent constructs
were corrected for measurement errors (see Table 3). Each subscale score was the total of
item scores under the subscale. Factor scores of the S-MTQ were calculated based on the
standardized factor loadings obtained from the bifactor ESEM model. Factor scores of the
VS-MTQ and DASS-21 Stress were calculated based on the standardized factor loadings
from a 1-factor CFA model. The results of correlation analyses between the S-MTQ and the
For scale scores, Pearson’s correlation coefficients between the S-MTQ subscales
were found comparable with the latent correlation coefficients based on the 6-factor ESEM
model (see Tables 3 and 4). Again, only for scale scores, Pearson’s correlation coefficients
between the S-MTQ scores and the DASS-21 Stress score were similar to their corresponding
20
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
latent correlation coefficients, except for the ones between Control Emotion and Stress scores
(see Tables 3 and 4). With regard to the Pearson’s correlation coefficient between the VS-
MTQ and the DASS-21 Stress, it was -.35 (p < .001) for both scale and factor scores, but
Discussion
Study 1 aimed to re-evaluate the factorial validity of the instrument and propose
alternative models to improve its validity with refined versions of the questionnaire
None of 1-, 4-, and 6-factor CFA and ESEM models with 48 items fit to the data
adequately (see Table 1). These results were consistent with those reported in previous
studies (e.g., Gucciardi et al., 2012; Perry et al., 2013). Furthermore, 1-factor CFA models
with the 18 items proposed by Clough et al. (2002) and the 10 items selected by
Papageorgiou et al. (2018) did not fit the data either. The solutions of the 4-factor CFA model
with the 18 items and 10 items were improper, as Challenge and Control were not empirically
correlations are likely to be inflated unless all non-target loadings are close to zero (Marsh et
al., 2010).
Dagnall et al. (2019) reported Pearson’s correlations for the MTQ18 subscale scores,
but they did not report the latent correlations between the four factors in their study.
Therefore, it is unknown if the improper solutions of the 4-factor CFA model with the
MTQ18 are specific to the current sample. In Study 1, the 4-factor ESEM models with the 18
items and 10 items were also examined. The overall fit of the model with 18 items was
unsatisfactory. Despite the excellent overall fit of the model with 10 items, half of 10 items
21
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
did not load on its target factor. Consequently, the hypothesized factor structures of the
MTQ18 and MTQ10 were invalid for the current large sample of university students.
Based on the CVI, 18 items were selected for the S-MTQ. The 6-factor ESEM model,
with 18 items, fit the data very well. Considering that the corresponding 6-factor CFA model
did not fit the data satisfactorily, it was apparent that there were items which loaded to their
non-targeted factors. However, the sizes of non-targeted cross-loadings were far smaller for
most items compared to the significant substantial targeted factor loadings (see Table 2).
Thus, the 6-factor ESEM solution showed well-defined six factors. Because the structure of
the six factors were well defined, corresponding bifactor and hierarchical ESEM models were
examined further. It was found that the bifactor ESEM model represented the S-MTQ
responses better than the hierarchical ESEM model. The well-defined, bifactor structures of
the S-MTQ responses support the multidimensionality of the mental toughness construct.
As for the internal consistency reliability of the S-MTQ responses, both alpha and
omega coefficients were lower than .70 for four of six factors (Control Emotion,
Orosz, and Rigó (2018) stated that “lower level of reliability would be more concerning for
research on scale scores than fully latent variables, given that latent variables are naturally
corrected for measurement errors, and thus perfectly reliable” (p. 278). In the present study,
omega coefficients were calculated within the framework of factor analysis and all of them
were above .60. Thus, the observed ɷ values were considered reasonable. The coefficient α is
affected by the number of items and increases as the number of items increases on a certain
condition (Hayes & Coutts, 2020). Given that the α coefficient for each subscale was
calculated with three items, they would also be reasonable. The concurrent validity of the S-
MTQ responses was examined as one of between-construct studies. As hypothesized, all the
22
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
specific MTQ factors and the global mental toughness factor were negatively associated with
the stress factor. Thus, the concurrent validity of the S-MTQ responses was well supported.
In the second stage, the VS-MTQ was proposed as a very short version of the MTQ to
provide a practical tool for occasions where constraints prevent use of the S-MTQ. In the VS-
MTQ, each item is a representative indicator of each component proposed in the 4/6C model,
and the single factor is specified as a global mental toughness construct that encompasses the
six components of the 4/6C model. Both 1-factor CFA and ESEM models with the six items
showed an excellent fit to the data, and factor loadings of the six items were adequate. The
internal consistency coefficients for the single factor were also acceptable. The VS-MTQ is
Study 2
The purpose of Study 2 was to cross-validate the factor structure of the newly-refined
S-MTQ and VS-MTQ with a larger independent sample. The establishment of measurement
invariance is required to make appropriate group comparisons (Chen, 2007; Cheung &
Rensvold, 2007). However, the MTQ48 has rarely been subject to such examination
(Vaughan et al., 2018 for an exception). Therefore, measurement invariance was also tested
Method
Participants. A total of 3,209 university students (1,206 men, 2,003 women; Mage =
24.0, SD = 8.4, the range of age: 16-66 years old) voluntary participated in Study 2. Their
first language was English and most of them (94.0%) were Australians. Participants’ majors
were health (29.4%), science and engineering (21.3%), business (15.7%), law (12.1%),
creative industry (12.0%), and education (9.5%). There were no overlapping participants
Measures. Participants in Study 2 were also asked to complete the MTQ48 (Clough
et al., 2002) on a 5-point Likert-type scale ranging from 1 (strongly disagree) to 5 (strongly
agree) as well as the stress subscale of the DASS-21 (Lovibond & Lovibond, 1995) on a 4-
point Likert-type scale ranging from 0 (did not apply to me at all) to 3 (applied to me very
Procedure. The online survey was conducted the same way as Study 1. Participants
Data analyses. To examine the factor structure of the S-MTQ and VS-MTQ, CFA
and ESEM were conducted with the same procedures as Study 1. For completeness, other
Measurement invariance was tested across gender for the combined sample of Studies
1 and 2. Equality constraints were hierarchically imposed on the parameters across the gender
samples in the following sequence: configural invariance (no constraints), factor loadings,
intercepts, and uniqueness of observed variables. The invariance of two nested measurement
models was considered to be tenable when the overall pattern of goodness-of-fit indexes was
adequate and the change in the value of the CFI and RMSEA were negligible (i.e., less than
or equal to .01 for CFI and .015 for RMSEA; Chen 2007, Cheng & Rensvold, 2002).
CFA and ESEM. The results of CFA and ESEM on the S-MTQ responses were
similar to Study 1. The 4-factor ESEM model showed adequate overall fit to the data.
Consistent with Study 1, however, all three items for Confidence Abilities did not load on
their target factor (factor loadings varying from .02 to .17) but loaded on a non-target factor
of Control (factor loadings varying from .40 to .54). The fit of the 6-factor ESEM model was
excellent based on all the overall fit indices (see Table 5). In the 6-factor ESEM model, latent
24
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
correlations between the six factors ranged from .14 to .39, and factor loadings for the target
factor ranged from .12 to .76. The internal consistency coefficients (α; ɷ [95% CI] in order)
for the six subscales of the S-MTQ were Control Emotion (.63; .63 [.61-.66]), Control Life
(.65; .66 [.64-.69]), Challenge (.71; .72 [.71-.74]), Commitment (.67; .67 [.65-.70]),
Study 1, both alpha and omega coefficients were lower than .70 for most factors. However,
the observed values of α and ɷ in Study 2 were comparable with those in Study 1 and
considered reasonable, as stated earlier. Both bifactor and hierarchical ESEM models fit the
data very well (see Table 5). In the bifactor model, all target loadings for the specific factor
were significant, ranging from .07 to .61, and target loadings for the global mental toughness
factor from all the 18 items were also significant, ranging from .25 to .68. These results
replicated that the well-definition of the global mental toughness factor and the six specific
factors. In the hierarchical model, all target loadings for the first-order factor were significant,
ranging from .11 to 1.00, and the factor loadings of most of the first-order factors on the
global mental toughness factor were significant and substantial, from .67 to .90. However, the
loadings from Challenge and Confidence Abilities on the global mental toughness factor were
found non-significant. These results also replicated that correlations among the 6 first-order
factors were not explained well by the higher-order mental toughness factor. Study 2 cross-
validated that the bifactor ESEM model was better to represent the S-MTQ responses than the
As for the VS-MTQ, the fit of both 1-factor CFA and ESEM to the data were
excellent (see Table 5). Factor loadings ranged from .47 (Confidence Interpersonal: Item 20)
to .69 (Confidence Abilities: Item 8) in the CFA and ESEM models. The internal consistency
coefficients (α; ɷ [95% CI] in order) for the single factor were .72; .73 (.71-.74). Consistent
25
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
with Study 1, none of 1-, 4-, and 6-factor CFA and ESEM models with 48 items fit the data
satisfactorily. Furthermore, the 1- and 4-factor CFA and ESEM models with the 18 items by
Clough et al. (2002) and the 10 items by Papageorgiou et al. (2018) fit the large data
The results of the invariance analyses are summarized in Table 6. For both S-MTQ
and VS-MTQ, measurement invariance across gender was achieved at the factor-loading and
uniqueness levels, but not at the intercept level. These results indicated that the strength of
relationships between items and the underlying factors is identical across gender, but the
origin of the latent variable may differ. Measurement invariance at the factor-loading level is
a prerequisite for meaningful cross-group comparison (Cheng & Rensvold, 2002). The
comparison of relationships between the mental toughness factor, measured by the S-MTQ
and the VS-MTQ, and other external variables is possible across gender.
(e.g., Kawabata et al., 2008), since parameter estimates are unique to the sample on which
they are based. The results of Study 2 cross-validated the factor structure of the S-MTQ and
General Discussion
In the present study, two refined versions of the MTQ48 were proposed to improve
the questionnaire by resolving its factorial validity issue from theoretical, empirical, and
practical perspectives: the S-MTQ and VS-MTQ. The results of the two studies strongly
supported the factorial and concurrent validity, as well as reliability, of the responses to both
the refined versions. The S-MTQ and VS-MTQ are psychometrically sound, but much shorter
Another advantage of the S-MTQ is that there are 3-items for each of the six factors
researchers are interested in measuring each aspect of the mental toughness construct,
proposed by Clough et al. (2002), or the global mental toughness factor, the S-MTQ would be
toughness, as one of many constructs in their study, and scoring it as a unidimensional single
score, the VS-MTQ would be a possible choice for that need. Such usage of short and very
short versions of an instrument is also seen for other psychological constructs, such as the big
five personality traits (Gosling, Rentfrow, & Swann, 2003) and flow (Jackson, Martin, &
Eklund, 2008).
Both the refined versions were developed based on data from a large sample in Study
1 and their factorial validity and reliability were replicated with an even larger sample in
Study 2. The advantage of the S-MTQ and VS-MTQ over the MTQ18 (Clough et al., 2002)
and MTQ10 (Papageorgiou et al., 2018) is that a) the selected 18 items were considered to
have high or good content validity, b) they measure all six components proposed in the 6C
Measurement invariance has been rarely tested for the MTQ48 (see Vaughan et al.,
2018). Therefore, measurement invariance of gender was also tested for the S-MTQ and VS-
MTQ responses. The results of the measurement test showed that it is possible to compare the
relationships between the mental toughness factor measured by the S-MTQ and VS-MTQ and
Correlations between the refined versions of the MTQ (S-MTQ and VS-MTQ) and
the DASS-21 Stress were re-computed by using scale and factor scores and compared with
27
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
latent correlations obtained from the ESEM models. It was revealed that Pearson’s
correlation coefficients based on scale and factor scores were found comparable with their
because latent correlations are corrected for measurement errors, whereas scale scores are
purely based on items which include a part of random measurement error (Morin et al.,
2016). The findings suggest that when data are collected from a large sample, Pearson’s
correlations based on scale scores can be used confidently to examine relationships between
the refined versions of the MTQ (S-MTQ and VS-MTQ) and other constructs. The absolute
value of Pearson’s correlations, based on scale scores, were found to be compatible with
latent correlations. The information observed in the comparisons is practically useful for
applied researchers to use and interpret Pearson’s correlation coefficients based on scale
Lastly, in evolving the validity of the MTQ48, we were cognizant of the potential
statistical and conceptual consequences of refining the instrument. Previous debate has raised
issues with the problems associated with scale purification and an over emphasis in seeking
statistical fit (Clough, Earle, Perry, & Crust, 2012). The VS-MTQ, in particular, has a
significant reduction in items from its predecessor, which, some researchers might argue,
lacks the conceptual essence and breadth of the original work. In this study, the aim of
conducting the item face validity check was an attempt to maintain the integrity of the 4/6C
model, as originally defined. The cost of strictly matching each item to its main dimensional
definition (for Confidence, Commitment, etc.) meant only 13 items were retained. The
rejection of 35 items in this procedure did not go unnoticed and, presumably, represented a
large proportion of the conceptual work originally included in the MTQ48’s design.
However, on the surface, 25 of the 35 rejected items seem to be measuring something else,
28
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
outside their allocated factor; for instance, they appear more relevant to other psychological
constructs – such as, self-esteem, optimism, focus, decision making, motivation, coping, and
extraversion (see Supporting Information) – some of which are represented in other mental
toughness frameworks (e.g., Gucciardi, Hanton, Gordon, Mallett, & Temby, 2015). The other
10 rejected items were either conceptually vague (e.g., Item 9), used ambiguous language
(e.g., Item 35), lacked specificity (e.g., Item 21), or incorrectly categorized (e.g., Item 5).
It appears the MTQ48’s designers have included items that either measure their
dimension directly – and, hence, are considered a “good fit” in this study (with 100% CVI
score for items) – or, in the case of the 25 rejected items, relate to other constructs and actions
associated with high and low scores in that factor. For example, people who see setbacks as
opportunities (Challenge definition) might also be individuals that cope well with their
problems (e.g., Item 23). For others who persist through obstacles (Commitment definition),
keeping focused could be something that they do well, too (e.g., Item 22). Similarly, people
that believe in themselves (Confidence definition) might be equally optimistic (e.g., Item 16),
while those struggling to manage their emotions (Control – Emotion definition) could also,
conceivably, find themselves excessively worrying about the future (e.g., Item 27).
By removing 25 items denoting constructs associated with high and low scores for the
4/6Cs, our approach might be criticized for leaving out “the language of the participants” in
the questionnaire’s original design (see Clough et al., 2012, p. 284). Our decision, however,
with the face validity check, was to focus on items that match the core definitions for each
dimension. If it is later decided that these associated constructs and correlates are actually
central components of mental toughness, the 4/6C model may be better understood as one
currently masking a broader, underlying framework. For example, if optimism is later agreed
to be a key aspect of mental toughness, Items 13, 15, 16, and 32 quickly become much more
29
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
relevant, according to the face validity review in this study (see Supporting Information). As
it stands, these items sit across 2 dimensions of the 4/6C model – namely, the Control (Life;
Item 15) and Confidence (Abilities; Items 13, 16, 32) dimensions. If deemed to be important
to the conceptual building blocks of mental toughness, reconfiguring the MTQ48 around
these additional constructs might prove generative to a greater inclusion of the original items
for further statistical analysis. Hypothetically, it would raise the inventory’s CVI score above
the existing 27.1%, reported here. This step, of course, would require expansion to the
Limitations
The S-MTQ and VS-MTQ were proposed as alternative models to evolve the validity
of the MTQ. Their psychometric properties were rigorously examined and cross-validated
with two large data sets. However, both are university student cohorts. Considering that the
MTQ48 has been widely used in education, business, military, and sport, the validity and
reliability of the S-MTQ and VS-MTQ should be further evaluated by examining different
types of validity (e.g., predictive validity) with individuals from different domains (e.g.,
of education (e.g., high school) in future research. In the present study, the data were
collected by using the original MTQ48. However, test length is one of multiple factors that
affect true and observed variance of scores due to the possibility that the respondents are
more likely to get tired or disinterested in the questionnaire, carefully or honestly (Hayes &
Coutts, 2020; Raykov, & Marcoulides, 2011). Thus, it is recommended collecting data with
the S-MTQ or the VS-MTQ for further evaluations of their psychometric properties.
30
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Conclusion
ongoing basis (Perry et al., 2013). In following this suggestion, the current study aimed to
improve the MTQ48 by resolving its factorial validity issue and provide researchers with
theoretically and practically useful instruments to confidently measure the mental toughness
construct. The unique and significant contribution of the study was to identify problematic
items that were associated with the issue and propose alternative models to improve the
validity of the MTQ. Based on the findings of the present study, the S-MTQ and VS-MTQ
References
Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural
107, 238–246.
Birch, P. D. J., Crampton, S., Greenlees, I. A., Lowry, R. G., & Coffee, P. (2017). The
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A.
Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 445–455).
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing
Clough, P., Earle, K., Perry, J. L., & Crust, L. (2012). Comment on “Progressing
Questionnaire 48” by Gucciardi, Hanton, and Mallett (2012). Sport, Exercise, and
Clough, P., Earle, K., & Sewell, D. (2002). Mental toughness: the concept and its
London: Thomson.
32
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Clough, P., Perry, J., Crust, L., Strycharczyk, D., & Rowlands, C. (2015). The MTQ48
Clough, P., & Strycharczyk, D. (2015). Developing mental toughness: Coaching strategies to
Coulter, T. J., Mallett, C. J., & Singer, J. A. (2018). A three-domain personality analysis of a
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,
16, 297-334.
Dagnall N., Denovan, A., Papageorgiou, K. A., Clough, P. J., Parker, A., & Drinkwater, K.
(MTQ): Factor structure of the MTQ-18 and the MTQ-10. Frontiers in Psychology,
10, 1933.
Gerber, M., Kalak, N., Lemola, S., Clough, P. J., Perry, J. L., Pühse, U., . . . Brand, S. (2013).
Are adolescents with high mental toughness levels more resilient against stress?
Gucciardi, D. F., Hanton, S., Gordon, S., Mallett, C. J., & Temby, P. (2015). The concept of
Gucciardi, D. F., Hanton, S., & Mallett, C. J. (2012). Progressing measurement in mental
Hayes, A. F., & Coutts, J. J. (2020). Using omega rather than Cronbach’s alpha for estimating
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to
Jackson, S. A., Martin, A. J., & Eklund, R. C. (2008). Long and short measures of flow: The
construct validity of the FSS-2, DFS-2, and new brief counterparts. Journal of Sport
Jones, G., Hanton, S., & Connaughton, D. (2007). A framework of mental toughness in the
Kawabata, M., Mallett, C. J., & Jackson, S. A. (2008). The Flow State Scale-2 and
Dispositional Flow Scale-2: Examination of their factorial validity and reliability for
Kobasa, S. C. (1979). Stressful life events, personality and health: An enquiry into hardiness.
Lin, Y., Mutz, J., Clough, P. J., & Papageorgiou, K. A. (2017). Mental toughness and
Lovibond, S. H., & Lovibond, P. F. (1995). Manual for the Depression Anxiety & Stress
35, 382–385.
Mallett, C. J., Kawabata, M., & Newcombe, P. (2007). Progressing measurement in sport
Marsh, H. W., Lüdtke, O., Muthén, B., Asparouhov, T., Morin, A. J. S., Trautwein, U., &
Nagengast, B. (2010). A new look at the big-five factor structure through exploratory
McNeish, D. & Wolf, M.G. (2020). Thinking twice about sum scores. Behavior Research
Morin, A. J. S., Arens, A.K., & Marsh, H. W. (2016). A bifactor exploratory structural
139.
Muthén, L. K., & Muthén, B. (1998–2019). Mplus user’s guide (8th ed.). Los Angeles, CA:
Papageorgiou, K. A., Wong, B., & Clough, P. J. (2017). Beyond good and evil: Exploring the
mediating role of mental toughness on the dark triad of personality traits. Personality
Papageorgiou, K. A., Malanchini, M., Denovan, A., Clough, P. J., Shakeshaft, N., Schofield,
toughness and school achievement. Personality and Individual Differences, 131, 105-
110.
Perry, J. L., Clough, P. J., Crust, L., Earle, K., Nicholls, A. R. (2013). Factorial validity of the
592.
35
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. New York:
Routledge.
Tóth-Király, I., Morin, A. J. S., Bőthe, B., Orosz, G., & Rigó, A. (2018). Investigating the
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor
Vaughan, R., Hanna, D., & Breslin, G. (2018). Psychometric properties of the Mental
Wang, W. C., Chen, H. F., & Jin K. Y. (2014). Item response theory models for wording
157-178.
36
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Table 1
48 items 1-factor CFA 12665.08 1080 .624 .607 .072 .070 .069 – .071
4-factor CFA 11318.42 1074 .668 .651 .069 .066 .065 – .067
6-factor CFA 9968.11 1065 .711 .694 .070 .062 .061 – .063
18 items
1-factor CFA 3028.397 135 .695 .655 .076 .099 .096 – .102
(Clough et al., 2002)
(Dagnall et al., 2019) 4-factor CFA –a – – – – – –
10 items
1-factor CFA 819.502 35 .843 .798 .057 .101 .095 – .107
(Papageorgiou et al., 2018)
18 items (S‐MTQ) 4-factor CFA 1611.15 129 .838 .808 .056 .072 .069 – .076
6-factor CFA 1045.86 120 .899 .871 .048 .059 .056 – .063
6 items (VS‐MTQ) 1-factor CFA 49.05 9 .976 .959 .022 .045 .033 – .058
(Continued)
37
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
48 items 4-factor ESEM 6871.01 942 .832 .799 .038 .054 .052 – .055
6-factor ESEM 4294.55 855 .903 .871 .026 .043 .042 – .044
18 items
1-factor ESEM 3474.62 135 .692 .651 .076 .106 .103 – .109
(Clough et al., 2002)
4-factor ESEM 867.97 87 .928 .873 .031 .064 .066 – .068
10 items
1-factor ESEM 976.40 35 .839 .792 .057 .111 .111 – .117
(Papageorgiou et al., 2018)
4-factor ESEM 35.120b 11 .996 .983 .010 .032 .020 – .044
18 items (S‐MTQ) 4-factor ESEM 661.13 87 .947 .906 .025 .055 .051 – .059
6 items (VS‐MTQ) 1-factor ESEM 58.54 9 .975 .958 .022 .050 .038 – .063
Note. CFA = confirmatory factor analysis; ESEM = exploratory structural equation modeling; CFI = robust comparative fit index; TLI = Tucker-
Lewis index; SRMR = standard root mean square residual; RMSEA = robust root mean square error of approximation; S‐MTQ = the short
version of the Mental Toughness Questionnaire; VS‐MTQ = the very short version of the Mental Toughness Questionnaire. aSolutions were
improper; bHalf of 10 items did not load on its target factor; cESEM within CFA was estimated with maximum likelihood estimation.
38
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Table 2
Note. ESEM = exploratory structural equation modeling; S-MTQ = the short version of the
Mental Toughness Questionnaire; F = factor; R = residuals. Item numbers are based on the
MTQ48 (Clough et al., 2002). ESEM was estimated with an oblique geomin rotation. Target
factor loadings are presented in bold and all targeted factor loadings were significant at p
< .001.
39
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Table 3
Latent Factor Correlations Between the S-MTQ and the DASS-21 Stress (N = 2,186)
Subscale CM CH CL CE CA CI ST Subscale ST
Control Life (CL) .46 .54 — -.46 Control Life (CL) -.24
Control Emotion (CE) .33 .47 .58 — -.65 Control Emotion (CE) -.50
Confidence Abilities (CA) .29 .05 .31 .29 — -.48 Confidence Abilities (CA) -.24
Confidence Interpersonal (CI) .34 .42 .35 .29 .12 — -.17 Confidence Interpersonal (CI) .06
Note. S-MTQ = the short version of the Mental Toughness Questionnaire; DAAS-21 = the Depression Anxiety Stress Scale-21; ST = Stress;
CFA = confirmatory factor analysis; ESEM = exploratory structural equation modelling. In the model, the S-MTQ factors were specified as
ESEM factors with target rotation and the Stress factor was specified as a CFA factor. All latent correlations larger than |.07| were significant at
p < .01.
40
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Table 4
Pearson’s Correlations Between the S-MTQ and the DASS-21 Stress Based on Scale and
Specific factor
Control Life (CL) .50 .47 – .53 .62 .23 -.43 -.33
Control Emotion (CE) .40 .45 .47 – .68 .22 -.48 -.44
Confidence Abilities (CA) .44 .42 .61 .54 – .10 -.43 -.45
Confidence Interpersonal (CI) .29 .35 .29 .25 .27 – -.10 -.08
Global factor
Note. S-MTQ = the short version of the Mental Toughness Questionnaire; DAAS-21 = the
Depression Anxiety Stress Scale-21; STs = Stress (scale score); STf = Stress (factor score);
Pearson’s correlations of the S-MTQ subscale scores are below diagonals while correlations
of the S-MTQ factor scores are above diagonal. All correlations larger than |.02| were
Table 5
Summary of Goodness-of-Fit Statistics for Specified Models (N = 3,209)
48 items 1-factor CFA 18327.28 1080 .609 .592 .073 .071 .070 – .071
4-factor CFA 16244.10 1074 .656 .639 .071 .066 .065 – .067
6-factor CFA 14236.511 1065 .701 .684 .071 .062 .061 – .063
18 items
1-factor CFA 4413.795 135 .677 .634 .077 .099 .097 – .102
(Clough et al., 2002)
(Dagnall et al., 2019) 4-factor CFA –a – – – – – –
10 items
1-factor CFA –a – – – – – –
(Papageorgiou et al., 2018)
18 items (S‐MTQ) 4-factor CFA 2421.36 129 .823 .790 .058 .074 .072 – .077
6-factor CFA 1650.63 120 .882 .849 .050 .063 .060 – .066
6 items (VS‐MTQ) 1-factor CFA 76.57 9 .973 .955 .023 .048 .039 – .059
(Continued)
42
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
48 items 4-factor ESEM 9363.60 942 .835 .835 .037 .053 .052 – .054
6-factor ESEM 5512.87 855 .908 .879 .025 .041 .040 – .042
18 items
1-factor ESEM 5076.21 135 .673 .629 .077 .107 .104 – .109
(Clough et al., 2002)
4-factor ESEM 1247.92 87 .923 .865 .031 .064 .061 – .068
10 items
1-factor ESEM 1438.06 35 .824 .774 .059 .112 .107 – .117
(Papageorgiou et al., 2018)
4-factor ESEM –b – – – – – –
18 items (S‐MTQ) 4-factor ESEM 1008.43 87 .941 .895 .026 .057 .054 – .061
6 items (VS‐MTQ) 1-factor ESEM 58.54 9 .975 .958 .022 .050 .038 – .063
Note. CFA = confirmatory factor analysis; ESEM = exploratory structural equation modeling; CFI = robust comparative fit index; TLI = Tucker-
Lewis index; SRMR = standard root mean square residual; RMSEA = robust root mean square error of approximation; S‐MTQ = the short
version of the Mental Toughness Questionnaire; VS‐MTQ = the very short version of the Mental Toughness Questionnaire. aSolutions were
improper; bSolutions were not converged; cESEM within CFA was estimated with maximum likelihood estimation.
43
EVOLVING THE VALIDITY OF A MENTAL TOUGHNESS MEASURE
Table 6
Note. MTQ = Mental Toughness Questionnaire; ESEM = exploratory structural equation modeling; CFA = confirmatory factor analysis; CFI =
comparative fit index; TLI = Tucker-Lewis index; SRMR = standard root mean square residual; RMSEA = robust root mean square error of
Supporting Information
Challenge: The extent to which a person is likely to view a challenge or setback as an opportunity.
Commitment: The extent to which an individual is likely to persist with a goal, despite any problems or obstacles that arise.
Control Emotion: The extent to which people control their anxieties and emotions.
Control Life: The extent to which people believe they have sufficient control over their lives and the environment around them.
Item 2. I generally feel… 3, 3, 3 Lacks specificity (e.g., could relate to life and/or emotions factor)
Item 9. I usually find… 1, 1, 1 Conceptually vague - unclear how item links to definition
Item 33. Things just… 1, 1, 1 Conceptually vague - unclear how item links to definition
Confidence Abilities: The degree of confidence people have in their abilities to successfully complete tasks.
Confidence Interpersonal: The extent to which people are prepared to assert themselves and deal with social challenge or ridicule.
Note. MTQ48 = the Mental Toughness Questionnaire-48; CVI = Content Validity Index.