You are on page 1of 18

Internet Research

Heuristics versus statistics in discriminant validity testing: a comparison of four


procedures
George Franke, Marko Sarstedt,
Article information:
To cite this document:
George Franke, Marko Sarstedt, (2019) "Heuristics versus statistics in discriminant validity testing: a
comparison of four procedures", Internet Research, https://doi.org/10.1108/IntR-12-2017-0515
Permanent link to this document:
https://doi.org/10.1108/IntR-12-2017-0515
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

Downloaded on: 26 February 2019, At: 03:13 (PT)


References: this document contains references to 47 other documents.
To copy this document: permissions@emeraldinsight.com

Access to this document was granted through an Emerald subscription provided by emerald-
srm:191537 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald
for Authors service information about how to choose which publication to write for and submission
guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as
well as providing an extensive range of online products and additional customer resources and
services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the
Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for
digital archive preservation.

*Related content and download information correct at time of download.


The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/1066-2243.htm

Heuristics
Heuristics versus statistics in versus
discriminant validity testing: statistics

a comparison of four procedures


George Franke
Department of Marketing, Culverhouse College of Business,
Received 23 December 2017
University of Alabama, Tuscaloosa, Alabama, USA, and Revised 22 June 2018
Marko Sarstedt Accepted 24 June 2018

Otto von Guericke Universitat Magdeburg, Magdeburg, Germany and


School of Business and Global Asia in the 21st Century Research Platform,
Monash University Malaysia, Subang Jaya, Malaysia
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

Abstract
Purpose – The purpose of this paper is to review and extend recent simulation studies on discriminant
validity measures, contrasting the use of cutoff values (i.e. heuristics) with inferential tests.
Design/methodology/approach – Based on a simulation study, which considers different construct
correlations, sample sizes, numbers of indicators and loading patterns, the authors assess each criterion’s
sensitivity to type I and type II errors.
Findings – The findings of the simulation study provide further evidence for the robustness of the heterotrait–
monotrait (HTMT) ratio of correlations criterion as an estimator of disattenuated (perfectly reliable) correlations
between constructs, whose performance parallels that of the standard constrained PHI approach. Furthermore,
the authors identify situations in which both methods fail and suggest an alternative criterion.
Originality/value – Addressing the limitations of prior simulation studies, the authors use both directional
comparisons (i.e. heuristics) and inferential tests to facilitate the comparison of the HTMT and PHI methods.
Furthermore, the simulation considers criteria that have not been assessed in prior research.
Keywords PLS–SEM, Measurement, Structural equation modelling
Paper type Research paper

1. Introduction
In many areas of management information systems research and in business research in
general, the concepts of interest are not directly observable and are therefore generally
measured with scales comprising sets of items. These items indirectly measure the concept
of interest, assuming that some underlying construct manifests itself through these items.
Some of these constructs include individual perceptions and attitudes such as perceived
ease of use and usefulness of information technology (e.g. Venkatesh et al., 2003), user
engagement with social networking sites (e.g. Muñoz-Expósito et al., 2017) and user
attachment in brand communities (e.g. Fiedler and Sarstedt, 2014). Other constructs capture
organizational traits such as organizational factors impacting e-business implementation
(e.g. Migdadi et al., 2016), competitive advantage from digital transformation (e.g. Chen et al.,
2016) and organizational online knowledge sharing (e.g. Ho et al., 2012). Although diverse, a
common theme among these and other constructs developed in management information
systems research is their purpose: the measurement of the value added through investment
in, or use of, information technologies. Therefore, it is not surprising that issues surrounding
the assessment of measurement properties remain a major concern of researchers in
the field (e.g. Bagozzi, 2011; MacKenzie et al., 2011; Rigdon et al., 2014). Such properties
reveal the accuracy (validity) and consistency (reliability) of construct measurement
(e.g. Mooi et al., 2018). Internet Research
One of the most fundamental elements of validity assessment concerns establishing © Emerald Publishing Limited
1066-2243
discriminant validity, which ensures that each construct is empirically unique and captures DOI 10.1108/IntR-12-2017-0515
INTR a phenomenon not represented by other constructs in a statistical model. When analyzing
relationships among constructs that lack discriminant validity, “researchers cannot be
certain whether results confirming hypothesized structural paths are real or whether they
are a result of statistical discrepancies” (Farrell, 2010, p. 324).
A variety of procedures have been developed for testing whether two constructs’
measures correlate perfectly once measurement error is taken into account; if so, they lack
discriminant validity (e.g. Forsyth and Feldt, 1969; Kristof, 1973; Lord, 1957, 1973;
McNemar, 1958; Rogers, 1976). Some of these procedures come in the form of a heuristic,
where researchers compare a metric with a threshold; others are statistical tests, which
allow statistically inferring whether discriminant validity is supported or not. Anderson and
Gerbing (1988) helped to popularize Jöreskog’s (1971) approach (labeled PHI in this paper),
in which the correlation ϕXY between latent variables X and Y is either estimated freely or
constrained to equal unity. A significant difference in model fit suggests that the measures
are not perfectly correlated. Fornell and Larcker (1981) proposed another widely used
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

approach (labeled FL in this paper), according to which discriminant validity is established


if for each of two constructs, the squared multiple correlation between items and constructs
(i.e. the average variance extracted (AVE)), is greater than the squared correlation between
constructs (i.e. the shared variance (SV )). This criterion implies that items share more
variance with their intended underlying construct than the construct shares with another
construct. The FL criterion has widely been viewed as “the most stringent” test of
discriminant validity (e.g. Wang and Netemeyer, 2002, p. 222), but recent research questions
its universal applicability. Whereas Henseler et al. (2015, p. 129) find that FL is “largely
unable to detect a lack of discriminant validity” in the context of composite-based partial
least squares structural equation modeling (PLS–SEM) (Wold, 1974), Voorhees et al. (2016)
report a more positive performance in factor-based modeling ( Jöreskog, 1971). Also, the FL
criterion as proposed by Fornell and Larcker (1981) and implemented in practice is a
heuristic rather than a statistical test of the effects of sampling error relative to a null
hypothesis. An AVE slightly above or below the SV is seen as supporting or disconfirming
discriminant validity, regardless of how small the difference is. For example, Farrell (2010)
interprets SV ¼ 0.62 as evidence of discriminant validity relative to AVE ¼ 0.63, but
SV ¼ 0.67 relative to AVE ¼ 0.65 as a failure of discriminant validity. The conventional FL
criterion could serve as the basis of a statistical discriminant validity test (e.g. by using
bootstrapping), but prior research has overlooked this possibility. Furthermore, as we will
show in this paper, the FL criterion’s use of AVE focuses on item loadings but ignores
overall composite reliability, which is influenced by the number of indicators as well as
their AVE.
Recently, Henseler et al. (2015) introduced the heterotrait–monotrait (HTMT) ratio of
correlations as an estimator of disattenuated (perfectly reliable) construct correlations, such
as ϕXY, and therefore a convenient alternative test of discriminant validity. The HTMT
contrasts the indicator correlations between constructs with the correlations within sets of
indicators of the same constructs. While Henseler et al. (2015) proposed the HTMT as a more
comprehensive and less constrained approach to discriminant validity assessment for
researchers using PLS–SEM, the approach is universally applicable to all latent variable
methods, as its computation does not rely on the actual model estimates. Simulation results
in Henseler et al. (2015) in the context of PLS–SEM and Voorhees et al. (2016) in the context
of confirmatory factor analyses indicate that the HTMT performs effectively, though with
some ambiguity about which cut-off to use in inferring or rejecting discriminant validity.
While these studies make valuable contributions to the literature on discriminant
validity assessment, they do not offer a full picture of HTMT’s performance, both absolutely
and relatively compared to PHI. Specifically, Henseler et al. (2015) compare their HTMT
with several other discriminant validity tests using fixed cutoffs and an inference test.
However, these authors only use inferential tests to assess whether HTMT differs Heuristics
significantly from 1, rather than statistically (as opposed to heuristically) testing HTMT vs versus
smaller cutoff values. Furthermore, Henseler et al. (2015) contrast HTMT with the FL statistics
criterion but not with the commonly used PHI. Voorhees et al. (2016) on the other hand
compare inferential tests of PHI with a range of HTMT fixed cutoffs, but do not apply an
HTMT-based inference test. Furthermore, these authors consider a maximum population
correlation of only 0.90, rather than assuming a unit correlation between the construct
measures. While 0.90 can be interpreted as an upper boundary of acceptable construct
correlations (Grewal et al., 2004), this condition does not represent a definite violation of
discriminant validity (Henseler et al., 2015). Assuming a unit correlation is crucial as this
represents a base condition under which all the approaches should indicate a lack of
discriminant validity. This particularly holds since a unit correlation is the null hypothesis
underlying PHI and Henseler et al.’s (2015) HTMT-based inferential test. Finally, Voorhees
et al.’s (2016) simulation results show that PHI almost never creates false positives
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

(i.e. producing a confidence interval that includes 1, given true correlations of 0.75 or 0.90),
which the authors interpret as a failure to identify discriminant validity rather than
evidence of high statistical power to correctly infer discriminant validity.
Addressing these limitations, we present the results of a simulation study that considers a
series of factors relevant to the performance of the discriminant validity assessment criteria
(construct correlations, loading patterns, numbers of indicators and sample size). To put both
the PHI and HTMT procedures on an equal footing, our simulation study uses both directional
comparisons (i.e. heuristics) and inferential tests, and considers different cutoff values in the
inferential tests rather than focusing only on the case of perfectly correlated constructs.
In addition, we contrast the former approaches with an inference test of the FL criterion that
complements the criterion’s use as a heuristic. Finally, we offer an alternative interpretation of
the FL criterion in terms of item loadings and correlations, showing that the criterion only
partly considers measurement error in the items. Based on our elaborations, we introduce an
alternative criterion originally proposed by McDonald (1999), which extends the FL and
HTMT criteria by explicitly considering differences in the constructs’ reliabilities.

2. Assessing discriminant validity


In its purest sense, establishing discriminant validity requires ensuring that the correlation
ϕXY between two latent variables X and Y is significantly lower than unity, such that
1 WϕXY.
Following this concept, the PHI method ( Jöreskog, 1971) involves constraining ϕXY to
equal 1 and comparing the resulting χ2 statistic to that obtained with ϕXY free, or else by
examining the confidence interval around the observed ϕXY to see whether it includes
1 (Anderson and Gerbing, 1988). This test conventionally uses two-tailed p-values, but
one-tailed tests are more appropriate (Henseler et al., 2015).
In their widely cited article on tests to evaluate structural equation models, Fornell and
Larcker (1981) suggest another means to assess discriminant validity. According to their FL
criterion, discriminant validity is established if a latent variable accounts for more variance
in its associated items than it shares with other constructs in the same model. For any two
constructs, the criterion has to be successfully applied to both, not just one or the other or to
their average (Farrell, 2010; Fornell and Larcker, 1981). Specifically, for discriminant
validity to be established among two latent variables X and Y, min (AVEX, AVEY) must be
larger than SV (i.e. f2XY ). With standardized latent variables and loadings, AVE is
calculated as (ignoring subscripts for X and Y):
X 2 . X 2 X 
‘i ‘i þ di ; (1)
INTR where ℓi is the loading of item i on the latent variable and di is the item’s unique variance.
The standardization causes the squared loading plus the residual in the denominator to
equal 1 for each item i. Thus, the summation in the denominator equals the number of items;
hence AVE’s label and meaning as the AVE of the terms in the numerator.
To understand what the FL criterion actually represents, we can interpret the AVE in
terms of item correlations (r). Specifically, latent variable modeling techniques estimate the
loadings such that the product of standardized ℓi and ℓj equals rij as closely as possible for
any two items i and j. With parallel measures (i.e. equal loadings and error terms),
ℓiℓj ¼ ℓ2 ¼ rij ¼ r for all i and j. With congeneric measures (i.e. unequal loadings and error
terms) ℓiℓj may only approximate rij, though if the model fits well the approximation will
normally be good. Also, as measures generally have loadings of around 0.70 or higher and
error terms of 0.50 or less (so that AVE ¼ 0.50 or higher; e.g. Fornell and Larcker, 1981;
Netemeyer et al., 2003), mean(ℓ2) will often be a good approximation of mean(r). Thus, while
the FL criterion explicitly takes measurement error into account in its consideration of SV,
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

which disattenuates the estimated correlation between two constructs based on their
measure reliabilities, this is not the case in its treatment of AVE for each construct.
To clarify a distinction between the FL criterion and criteria that explicitly take construct
(or composite) reliability (CR) into account, another useful approximation involves the
relationship between CR and coefficient α. With standardized loadings, CR is given by:
X 2 X 2 X 
‘i ‘i þ di ; (2)

and standardized coefficient α equals:


     
k mean r ij = 1 þ ðk1Þmean r ij ; (3)

where k is the number of items (Nunnally, 1978, p. 211). Coefficient α may be larger or
smaller than composite reliability in a particular sample of items and observations if error
terms are allowed to correlate across constructs, but very often the two are close in value
when the loadings are similar (essentially τ-equivalent; Raykov, 1998).
A discriminant validity criterion from McDonald (1999, labeled MCD) that incorporates
CR for two constructs simultaneously is conceptually appealing but not widely used.
The correlation between a set of items and their underlying construct equals the square root
of their CR. Given CRX, CRY and ϕXY, the correlation between one set of items and the other
set’s underlying construct equals:

CR0:5 0:5
X ϕX Y ðX with Y Þ and CRY ϕX Y ðY with X Þ: (4)
Logically, items should be better measures of their intended construct than any other
construct. McDonald (1999, p. 212) suggests arranging CR0.5 for any number of constructs
on the diagonal of a “convergent–discriminant validity matrix,” with correlations across
constructs shown off the diagonal as calculated using (4). McDonald (1999) does not propose
specific criteria for using such a matrix to evaluate measures, but testing whether CR0:5 Y is
greater than the first term in (4), and CR0:5 X is greater than the second term, would be
analogous to the criteria in FL and PHI. The comparisons could be based on directional
differences (heuristic) or significance tests (inference). Because this criterion uses CR, the
number of indicators and not just their AVE can influence inferences about discriminant
validity. If CRX ¼ CRY then MCD is equivalent to PHI, but when measure reliabilities differ
the two criteria can lead to different conclusions.
Henseler et al.’s (2015) HTMT criterion is defined as the mean value of the item
correlations across constructs (i.e. the heterotrait correlations) relative to the geometric mean
of the average correlations for the items measuring each construct (i.e. the two sets of Heuristics
monotrait correlations). That is, if RXX is a matrix of correlations between each item xi and versus
xj of construct X (not including the diagonal of the xi−xj correlation matrix), RYY includes statistics
the corresponding correlations of construct Y, and RXY is the matrix of correlations
between every xi and yj, then because the geometric mean of two numbers is the square root
of their product:

HTMT ¼ meanðRXY Þ=ðmean½RX  mean½RY Y Þ0:5 : (5)


A high ratio of correlations across constructs (heterotrait) relative to those within constructs
(monotrait) indicates a lack of discriminant validity. Because Henseler et al. (2015) show that
HTMT is an estimator of ϕXY, if a reflective common factor model holds true for both
constructs then both HTMT and PHI should give similar results. Based on prior research
and simulation study results, Henseler et al. (2015) suggest a threshold value of 0.90 if
constructs are conceptually very similar and 0.85 if the constructs are conceptually more
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

distinct, rather than strictly comparing HTMT with unity as in conventional PHI
applications. Voorhees et al. (2016) echo this recommendation in their study on discriminant
validity assessment in confirmatory factor analyses. Because the calculation of HTMT does
not provide an estimate of its standard error, Henseler et al. (2015) suggest using
bootstrapping to formally test whether the HTMT is significantly lower than 1.
The above concepts allow a convenient summary and comparison of the four criteria for
discriminant validity. HTMT does not assume a reflective common factor model, and the
formulas shown below for MCD and FL are less accurate as item loadings become
more heterogeneous, but in many empirical applications these relationships are good
approximations. For simplicity, the criteria are shown only for one construct X for MCD and
FL, but the corresponding criteria for the matching construct Y should be obvious:
PHI: 1 4fX Y ; (6)

HTMT: 1=0:90=0:85 4HTMT; (7)

MCD: CR0:5 0:5


X 4 CRY fX Y ; (8a)

MCD: CR0:5 0:5


X =CRY 4 fXY ; (8b)


0:5
MCD: kX ðmeanðr X ÞÞ=ð1 þ ðkX –1Þmeanðr X ÞÞ =

0:5
kY ðmeanðr Y ÞÞ=ð1þ ðkY –1Þmeanðr Y ÞÞ 4 fX Y ; (8c)

FL: meanðr X Þ0:5 4 fXY ; (9a)


0:5
FL: CRX =ðkX –ðkX –1ÞCRX Þ 4fX Y : (9b)
PHI and HTMT1 have the highest threshold for inferring a lack of discriminant validity and
therefore should produce the fewest errors when ϕXY actually is less than 1. HTMT0.90 and
HTMT0.85 may signal that two constructs lack discriminant validity when ϕXY is very high,
say 0.95, but actually less than 1. Thus these lower thresholds are more conservative,
and perhaps more useful in some contexts (Henseler et al., 2015; Voorhees et al., 2016).
Equation (8a) shows that the X items should correlate more highly with their intended
INTR construct than with the other construct Y. Equation (8b) divides both sides of (8a) by CR0:5 Y
to show that MCD is influenced by the relative reliabilities of the two constructs. As
mentioned earlier, if the two reliabilities are equal so that their ratio equals 1, then MCD is
equivalent to PHI. If the reliabilities are not equal, then the one with the lower CR is more
likely to show a lack of discriminant validity. Equation (8c) uses (3) to express (8b) in terms
of the number of indicators of each construct and their average correlations.
Equation (9a) was developed earlier in several steps. The criterion that AVEX WSV is
equivalent to AVE0:5X 4 fXY , and AVEX equals mean (ℓ ), which approximately equals mean
2

(r). Equation (3) can be solved for mean(r) in terms of reliability and the number of items k, and
substituting the result into (9a) gives (9b). Therefore, (9b) illustrates how the FL criterion is
affected by not considering the number of scale items and their effects on reliability.
Equations (6)–(9) suggest several factors to consider in designing a simulation to
compare the four methods for discriminant validity assessment. ϕXY is obviously important,
because it is the standard of comparison in every test (given that HTMT is an estimator of
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

ϕXY). Average item correlations r and the number of items k are also relevant because they
contribute to CR, but within a range of values typical of research using latent variables, their
effects may not be large. When two constructs have different reliabilities, MCD and FL may
detect problems with the less reliable measure that PHI and HTMT do not reveal, because
they assess the overall relationship rather than testing each construct separately.
Sample sizes n do not play a role in Equation (6)–(9), but larger samples should reduce
standard errors and therefore increase power to detect the failures of discriminant validity
in inferential criteria. However, differences in sample sizes should not influence heuristic
criteria, because they depend only on directional differences and not standard errors.

3. Simulation design
Our choice of factors and their levels is motivated by the objective to define models that
resemble setups commonly encountered in applied research (Paxton et al., 2001), consider
the different criteria’s modes of operation and compare well with those considered in prior
methodological research on discriminant validity (e.g. Henseler et al., 2015; Voorhees et al.,
2016). The simulation design considers the factors just described that potentially have an
impact on the discriminant validity criteria’s performance: construct correlations, item
correlations (or equivalently, item loadings) and their homogeneity across constructs, the
numbers of items and sample sizes. Meeting these requirements, our simulation draws on a
2 (k ¼ 3 or 5 indicators) × 2 (n ¼ 100–500 observations) × 2 (equal loadings ℓ ¼ 0.71 for both
constructs or 0.71 for one (r ¼ 0.5) and 0.85 (r ¼ 0.72) for the other) × 4 (population
ϕXY ¼ 0.70, 0.85, 0.90, or 1.0) design. For each combination of conditions, we generated 1,000
replications, resulting in 32,000 estimates of FL, HTMT, MCD, and PHI and their
corresponding standard errors for further analysis. We used Mplus (Muthén and Muthén,
2015) for all simulations. The Appendix presents example Mplus code for all procedures.
Mplus has very convenient features for Monte Carlo simulation studies. Generating data
for the simulations is straightforward. It involves specifying values for k, n, ℓ, ϕXY and the
number of replications (1,000 in every condition). Because separate model specifications
are needed for HTMT vs the other approaches, each combination of factor levels must
be analyzed twice. Common seed values for random number generation lead to identical
data values for both versions, allowing the results to be matched. Results for each
simulation cell are saved as a text file, then matched and analyzed outside of Mplus to
summarize the findings.
Because Mplus estimates standard errors of the model parameters, both heuristic and
inferential tests can be performed on the same simulation results. The inferential tests are
one-tailed. For the PHI and HTMT tests, the estimated disattenuated correlation between
the two constructs and the HTMT value are compared to the cutoffs used by Henseler et al.
(2015): 0.85, 0.90 and 1.0. An additional value of 0.70 is also used because of its relevance to Heuristics
the FL criterion (i.e. to compare an average AVE of 0.5 when ℓ ¼ 0.71 vs a slightly lower SV versus
of 0.49). The FL and MCD tests are conducted separately for each construct, because they statistics
may suggest discriminant validity for one but not the other, especially when the loadings
differ across constructs. Thus, expanding on Henseler et al. (2015) and Voorhees et al. (2016),
the simulations provide directional results for PHI and HTMT1, inferential results for
HTMT0.85, HTMT0.90 and FL and both results for MCD.

4. Simulation results
Before discussing the simulation results in Tables I and II, it is worth noting that PHI and
HTMT produce extremely consistent estimates of latent variable correlations. In 32,000
samples, PHI and HTMT correlate 0.997, and their standard errors correlate 0.998. Their
estimated means are almost identical: 0.862 for PHI and 0.863 for HTMT, and 0.0393 and
0.0392 for their respective standard errors. Both PHI and HTMT estimates are unbiased,
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

equaling the population value of ϕXY to two decimal places on average. They may be less
consistent in other circumstances, such as when items correlate across constructs (Voorhees
et al., 2016), but in the simulations performed here, they are interchangeable.
The results in Table I show the proportion of inferential tests indicating discriminant
validity. The PHI and HTMT subscripts indicate the criterion value in the test. When the
loadings are unequal (0.85 vs 0.71), FL1 and MCD1 refer to the constructs that have the higher
loadings; in every other case, FL1, FL2, MCD1 and MCD2 refer to constructs with 0.71 loadings.
In the top row, for example, ϕXY ¼ 0.7, each construct has three indicators with loadings of 0.71,
and 100 observations. PHI1, HTMT1 and both MCD tests show a strong power to detect
discriminant validity under these circumstances as all tests correctly indicate discriminant
validity in 99 percent of the cases. FL1 and FL2 are much less powerful, indicating discriminant
validity in only 7 and 8 percent of the cases. The reason is that in this simulation condition AVE
((0.712 + 0.712 + 0.712)/3 ¼ 0.50) essentially equals SV among the constructs (0.702 ¼ 0.49) on
average. In the second row, the loadings are unequal across constructs and power increases
across most of the tests. Here, for FL1 the AVE increases to (0.852 + 0.852 + 0.852)/3 ¼ 0.72),
whereas SV among the constructs remains at 0.49. FL2 results are unchanged, though, because
its AVE (0.50) remains at almost the same level as the SV among the constructs (0.49).
As the population value of ϕXY increases, the power to detect discriminant validity
naturally decreases as evidenced in the lower proportions in Table I. For example, with
ϕXY ¼ 0.85, 3 indicators, 100 observations and equal 0.71 loadings, PHI0.85 and HTMT0.85
indicate discriminant validity in 4 percent of the cases (Table I). Because the true correlation
of 0.85 is equal to the standard of comparison, this proportion corresponds to the Type I
error rate as the criterion falsely indicates discriminant validity. Comparing the Type I error
rates with an α level of 0.05 (one-sided) allows assessing whether or not the tests are
conservative in that they tend to indicate a lack of discriminant validity when, in fact,
discriminant validity is given (i.e. Type I error rateo0.05). The results for constellations
with different combinations of sample size, loading patterns and number of indicators show
that inferential tests of PHI0.85 and HTMT0.85 effectively control Type I errors (i.e. falsely
indicating discriminant validity) by rejecting somewhat less than 5 percent of true null
hypotheses. Similar results are shown in the rows and columns with ϕXY ¼ 0.90 and with
ϕXY ¼ 1. These results suggest that the tests are actually slightly conservative, since in
multiple cases they reject the true null hypothesis less than expected under the α level of
0.05 (one-tailed).
The results for the directional cutoffs in Table II show that these criteria also generally
perform as expected. When a cutoff matches the population value of the criterion, random
sampling error is equally likely to make the observed results higher or lower than the population
value. For example, for ϕXY ¼ 0.90, PHI0.90 and HTMT0.90 are almost equally likely to be smaller
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

INTR

Table I.

indicating
Proportion of
inferential tests

discriminant validity
ϕXY k n Loadings PHI0.85 PHI0.90 PHI1 HTMT0.85 HTMT0.9 HTMT1 FL1 FL2 MCD1 MCD2

0.7 3 100 Equal 0.51 0.77 0.99 0.50 0.75 0.99 0.08 0.07 0.95 0.94
0.7 3 100 Unequal 0.62 0.90 1 0.62 0.90 1 0.67 0.07 1 0.87
0.7 3 500 Equal 0.99 1 1 0.99 1 1 0.08 0.08 1 1
0.7 3 500 Unequal 1 1 1 1 1 1 1 0.08 1 1
0.7 5 100 Equal 0.68 0.93 1 0.68 0.93 1 0.05 0.06 1 1
0.7 5 100 Unequal 0.79 0.97 1 0.79 0.98 1 0.84 0.08 1 1
0.7 5 500 Equal 1 1 1 1 1 1 0.07 0.07 1 1
0.7 5 500 Unequal 1 1 1 1 1 1 1 0.08 1 1
0.85 3 100 Equal 0.04 0.13 0.73 0.04 0.12 0.72 0 0 0.64 0.62
0.85 3 100 Unequal 0.03 0.18 0.90 0.03 0.17 0.90 0.05 0 1 0.28
0.85 3 500 Equal 0.04 0.45 1 0.04 0.44 1 0 0 1 1
0.85 3 500 Unequal 0.05 0.62 1 0.05 0.61 1 0.05 0 1 0.78
0.85 5 100 Equal 0.04 0.19 0.96 0.04 0.18 0.95 0 0 0.92 0.92
0.85 5 100 Unequal 0.02 0.25 1 0.02 0.24 1 0.03 0 1 0.76
0.85 5 500 Equal 0.04 0.74 1 0.04 0.74 1 0 0 1 1
0.85 5 500 Unequal 0.04 0.86 1 0.04 0.86 1 0.05 0 1 1
0.9 3 100 Equal 0 0.02 0.43 0 0.03 0.42 0 0 0.38 0.39
0.9 3 100 Unequal 0 0.03 0.67 0 0.03 0.66 0.01 0 1 0.10
0.9 3 500 Equal 0 0.04 0.97 0 0.04 0.97 0 0 0.94 0.92
0.9 3 500 Unequal 0 0.05 1 0 0.04 1 0 0 1 0.20
0.9 5 100 Equal 0 0.02 0.78 0 0.03 0.77 0 0 0.72 0.72
0.9 5 100 Unequal 0 0.03 0.94 0 0.03 0.94 0 0 1 0.35
0.9 5 500 Equal 0 0.03 1 0 0.03 1 0 0 1 1
0.9 5 500 Unequal 0 0.04 1 0 0.04 1 0 0 1 0.92
1 3 100 Equal 0 0 0.04 0 0 0.03 0 0 0.05 0.05
1 3 100 Unequal 0 0 0.03 0 0 0.02 0 0 0.82 0
1 3 500 Equal 0 0 0.04 0 0 0.04 0 0 0.05 0.05
1 3 500 Unequal 0 0 0.06 0 0 0.05 0 0 1 0
1 5 100 Equal 0 0 0.03 0 0 0.02 0 0 0.04 0.03
1 5 100 unequal 0 0 0.03 0 0 0.02 0 0 0.88 0
1 5 500 Equal 0 0 0.04 0 0 0.04 0 0 0.04 0.04
1 5 500 Unequal 0 0 0.03 0 0 0.03 0 0 1 0
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

ϕXY k n Loadings PHI0.85 PHI0.90 PHI1 HTMT0.85 HTMT0.9 HTMT1 FL1 FL2 MCD1 MCD2

0.7 3 100 Equal 0.95 0.99 1 0.96 0.99 1 0.54 0.53 1 1


0.7 3 100 Unequal 0.99 1 1 0.99 1 1 0.98 0.53 1 1
0.7 3 500 Equal 1 1 1 1 1 1 0.57 0.55 1 1
0.7 3 500 Unequal 1 1 1 1 1 1 1 0.61 1 1
0.7 5 100 Equal 0.99 1 1 0.99 1 1 0.53 0.51 1 1
0.7 5 100 Unequal 0.99 1 1 0.99 1 1 0.99 0.55 1 1
0.7 5 500 equal 1 1 1 1 1 1 0.60 0.59 1 1
0.7 5 500 Unequal 1 1 1 1 1 1 1 0.62 1 1
0.85 3 100 equal 0.49 0.75 0.99 0.46 0.74 0.98 0.06 0.05 0.98 0.96
0.85 3 100 Unequal 0.48 0.80 1 0.47 0.79 1 0.50 0.03 1 0.87
0.85 3 500 Equal 0.47 0.93 1 0.47 0.94 1 0 0 1 1
0.85 3 500 Unequal 0.49 0.98 1 0.48 0.98 1 0.51 0 1 0.99
0.85 5 100 Equal 0.50 0.83 1 0.48 0.84 1 0.01 0.01 1 1
0.85 5 100 Unequal 0.50 0.88 1 0.50 0.88 1 0.51 0 1 0.98
0.85 5 500 Equal 0.52 0.99 1 0.51 0.99 1 0 0 1 1
0.85 5 500 Unequal 0.46 1 1 0.47 1 1 0.46 0.01 1 1
0.9 3 100 Equal 0.21 0.46 0.93 0.20 0.44 0.93 0.01 0.01 0.90 0.90
0.9 3 100 Unequal 0.15 0.49 0.98 0.14 0.48 0.98 0.17 0 1 0.64
0.9 3 500 Equal 0.04 0.48 1 0.04 0.47 1 0 0 1 1
0.9 3 500 unequal 0.02 0.49 1 0.02 0.49 1 0.03 0 1 0.77
0.9 5 100 Equal 0.13 0.48 1 0.14 0.48 1 0 0 0.99 0.99
0.9 5 100 Unequal 0.10 0.50 1 0.09 0.49 1 0.10 0 1 0.88
0.9 5 500 Equal 0 0.49 1 0.01 0.50 1 0 0 1 1
0.9 5 500 Unequal 0 0.51 1 0 0.51 1 0 0 1 1
1 3 100 Equal 0.01 0.04 0.51 0 0.04 0.48 0 0 0.49 0.53
1 3 100 Unequal 0 0 0.47 0 0 0.45 0 0 0.99 0.04
1 3 500 Equal 0 0 0.47 0 0 0.46 0 0 0.48 0.49
1 3 500 Unequal 0 0 0.50 0 0 0.50 0 0 1 0
1 5 100 Equal 0 0 0.47 0 0 0.46 0 0 0.51 0.48
1 5 100 Unequal 0 0 0.50 0 0 0.49 0 0 1 0.02
1 5 500 equal 0 0 0.50 0 0 0.49 0 0 0.48 0.50
1 5 500 Unequal 0 0 0.50 0 0 0.51 0 0 1 0
statistics
versus

Proportion of
Heuristics

discriminant validity
directional cutoffs
Table II.

indicating
INTR or larger than 0.90. Thus, the directional cutoffs have a very high power to detect discriminant
validity, but at the cost of numerous Type II errors (i.e. falsely rejecting discriminant validity)
when the population value matches the cutoff value. Fortunately, the simulation results suggest
that using a higher test cutoff than a population value in question can give good power to assess
discriminant validity. For example, both PHI1 and HTMT1 give excellent power to infer
discriminant validity with ϕXY as high as 0.90. The MCD results tend to perform well simply
because under most simulation conditions, each construct’s indicators correlate more highly with
their underlying construct than with the other construct. The exception is with equal 0.71
loadings and ϕXY ¼ 1, in which case the directional comparison goes either way about 50 percent
of the time because CR0:5 0:5
X ¼ CRY 1 and vice versa. The conventional FL heuristic detects
AVEWSV in around half the cases when ϕXY ¼ 0.70 and ℓ ¼ 0.71 and also when ϕXY ¼ 0.85
and ℓ ¼ 0.85, but rarely signals false positive results when AVE is actually less than SV.

5. Empirical illustration
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

In order to illustrate the efficacy of the various discriminant validity criteria, we draw on data
from a survey sample by Al-Gahtani et al. (2007) of 722 knowledge workers in Saudi Arabia
voluntarily using desktop computer applications. These data were initially analyzed within the
context of a modified unified theory of acceptance and use of technology (UTAUT) model
(e.g. Venkatesh and Davis, 2000). UTAUT postulates that four constructs act as determinants of
behavioral usage intentions (BI) and actual usage behavior: performance expectancy
(i.e. the degree to which individuals believe that using the system will help them attain improved
job performance); effort expectancy (EE) (i.e. the degree of ease associated with the use of the
system); subjective norm (i.e. the degree to which individuals perceive that important others
believe they should use computers); and facilitating conditions (FC) (i.e. the degree to which
individuals believe that an organizational and technical infrastructure supports the use of the
system). Ringle and Sarstedt (2016) use this model to illustrate the use of impact–performance
map analysis (Hair et al., 2018) in the context of PLS–SEM.
As in conventional assessments of discriminant validity, all four constructs were
analyzed simultaneously in a confirmatory factor analysis for the PHI, MCD and FL tests.
HTMT results are unaffected by whether or not other constructs are included in each
pairwise comparison. To conserve space, Table III shows just three pairs of discriminant
validity assessments: EE vs FC, EE vs BI and FC vs BI. The first column in each set of
results is the parameter estimate; positive values indicate discriminant validity for
directional tests. The numbers in parentheses are 90% confidence intervals. The results
shown in bold indicate a failure of discriminant validity, as with FL2 which indicates that

Criterion EEa–FCb EEa–BIb FCa–BIb

k 4 3 4 3 3 3
AVE 0.591 0.401 0.591 0.275 0.401 0.275
CR 0.859 0.647 0.859 0.531 0.647 0.531
HTMT 0.599 (0.531/0.666) 0.735 (0.669/0.800) 0.786 (0.700/0.871)
PHI 0.355 (0.285/0.426) 0.737 (0.672/0.802) 0.552 (0.467/0.637)
FL1a 0.465 (0.410/0.520) 0.048 (−0.049/0.145) 0.096 (−0.003/0.195)
FL2b 0.274 (0.215/0.334) −0.268 (−0.381/−0.155) −0.030 (−0.136/0.076)
MCD1a 0.641 (0.583/0.698) 0.390 (0.342/0.437) 0.402 (0.337/0.467)
Table III.
MCD2b 0.475 (0.406/0.544) 0.046 (−0.031/0.122) 0.284 (0.204/0.365)
Discriminant validity
tests and 90% Notes: Values in italics indicate a failure of discriminant validity according to the test’s criterion (for HTMT,
confidence intervals relative to a cutoff of HTMT ¼ 0.85). Superscripts indicate corresponding constructs and tests. For example,
on selected the superscript in EEa indicates that FL1a compares the AVE of EE with the shared variance of EE and FC.
UTAUT constructs Similarly, MCD1a uses the composite reliability of EE as the standard of comparison
the AVE for the BI construct is less than its shared variance with EE. The other rejections of Heuristics
discriminant validity show different conclusions for heuristics vs statistics. For example, the versus
AVE for FC is larger than its shared variance with BI, but its confidence interval as shown statistics
in FL1 includes 0 (though only fractionally). For the same two variables, the confidence
interval around the HTMT of 0.786 has an upper boundary of 0.871, which would indicate
discriminant validity for HTMT0.9 and HTMT1, but not for HTMT0.85.
This small-scale example illustrates several interesting points. One is that HTMT can be
noticeably higher than PHI when a common factor model does not fit the data.
Items with substantial cross-loadings on multiple constructs would have been deleted in a
factor-based analysis, though not necessarily in a composite-based PLS–SEM analysis.
Because confirmatory factor analysis results may fit well though not perfectly, HTMT
should be used as a routine complement to the more common PHI test. Another implication
of the example is that the FL and MCD results reveal cases where one of two constructs
lacks discriminant validity, contradicting the overall conclusion of PHI and HTMT.
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

Specifically, BI lacks discriminant validity relative to EE according to the FL and MCD


criteria. BI also lacks discriminant validity relative to FC according to FL (and HTMT0.85),
but not according to MCD. Therefore, researchers who use only PHI or HTMT (or both)
may overlook discriminant validity limitations in one of the two constructs in every pair
of variables.
The results also suggest that MCD may be useful in future research. In a
convergent–discriminant validity matrix, the square roots of the three CR values would be
on the diagonal, and the six MCD values would be off the diagonal. In the FC–BI comparison,
for example, values of 0.804 (i.e. the square root of CR of FC) and 0.729 (i.e. the square root of
CR of BI) would be on the diagonal, 0.402 could be below it, and 0.284 above it. These positive
MCD values indicate that each set of items correlates more highly with its intended construct
than with the other construct. However, BI fails the FL test, because its indicators share less
variance with BI than with FC. MCD points to a more appropriate interpretation of BI’s
discriminant validity as this criterion takes the measures’ reliability into account. In light of
FL’s conceptual shortcomings, assuming discriminant validity between FC and BI, as
indicated by MCD, seems more appropriate and aligns with HTMT0.9 and HTMT1. Based on
FL, however, researchers would have to revise the model by, for example, merging FC and BI,
a step which would be hard to defend from a theoretical perspective (Venkatesh and Davis,
2000). As such, the example illustrates the consequences of using an inappropriate criterion
for model validation.
A concrete implication of this example for researchers using the data from Al-Gahtani
et al. (2007) is that the measurement model for BI is problematic. Identifying and eliminating
BI items that cross-load with EE, and perhaps FC, should improve discriminant validity
relative to these constructs. Examining confidence intervals around the criteria using
inferential tests is also helpful in revealing problems that directional heuristics may conceal.

6. Discussion
6.1 Summary and recommendations
In their highly influential paper on convergent and discriminant validation, Campbell and
Fiske (1959, p. 84) note that “one cannot define without implying distinctions, and the
verification of these distinctions is an important part of the validational process.” Tests of
discriminant validity are an important component of the measurement process – but only if
they accurately substantiate distinctions between measures. With this requirement in mind,
our study set out to shed further light on the efficacy of several discriminant validity
criteria, including the recently proposed HTMT criterion and the less familiar MCD
approach, by comparing them in terms of cutoff values and inferential tests.
INTR The findings of our simulation study provide further evidence for the robustness of HTMT
as an estimator of disattenuated (perfectly reliable) correlations between constructs. HTMT
can be calculated easily using spreadsheets or with software such as SmartPLS (Ringle et al.,
2015) that estimates the statistic for pairs of constructs in a model. Similarly, researchers can
use bootstrap confidence intervals (e.g. Hair, Hult, Ringle and Sarstedt, 2017) or constrained
parameters in software such as Mplus to test HTMT not just against a value of 1.0, but against
any specified value such as 0.85 or 0.90 that might raise concerns about discriminant validity.
Similar comparisons can be made for the PHI test, rather than automatically comparing
observed correlations between latent variables only with a value of 1.0. Such comparisons for
both PHI and HTMT would align with Werts et al.’s (1974, p. 281) notion that “discriminant
validity consists in demonstrating that the true correlation between two traits is meaningfully
less than unity,” rather than strictly less than unity. This recommendation would also be
consistent with Henseler et al.’s (2015) advice that the choice of an HTMT cutoff value depends
on the conceptual similarity of the constructs under investigation and how conservative the
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

researcher wants to be. Rather than following a recipe-like, mechanistic approach to model
evaluation, researchers should interpret the criteria in the context of the analysis as
recommended by methodological authorities such as Nunnally (1978), who also refer to the
need to contextualize measurement (see also Rigdon et al., 2017).
HTMT is computationally much less demanding and universally applicable in
composite- and factor-based modeling, since it is purely a function of observed correlations.
Because PLS–SEM does not allow comparing models with constrained and freely estimated
parameters (e.g. Hair, Hult, Ringle, Sarstedt and Thiele, 2017; Sarstedt et al., 2016), PHI is not
applicable in this context. However, using PHI in the course of a confirmatory factor analysis
helps in identifying and eliminating problematic cross-loadings between constructs. For most
researchers using confirmatory factor analyses and covariance-based SEM, this approach is a
conventional and transparent means of dealing with cross-loadings, compared to using
techniques that are not designed for this task (HTMT, as in Voorhees et al., 2016) or are less
familiar and straightforward (cross-loadings and partial cross-loadings, as in Henseler et al.,
2015). Similarly, researchers using consistent PLS (Dijkstra and Henseler, 2015) to mimic factor
models within a composite-based SEM framework might prefer this approach.
Whatever approach is adopted, our results clearly indicate that researchers should prefer
inferential tests over simple cutoff values. Inferential tests have less power than directional
heuristics, but produce considerably fewer false positives when the standard of comparison is
equal to the actual test criterion. If for some reason inferential tests are not used, researchers
should recognize that directional heuristics provide reliable guidance only relative to a lower
criterion than the one actually used (e.g. using HTMT1 to detect relationships of HTMT0.85 or
HTMT0.90). Finally, when unequal CR values are observed, researchers should consider using
the MCD test, which can more reliably detect failures of discriminant validity in these instances.
Contrary to the endorsement of FL by Voorhees et al. (2016), this study’s conceptual
analysis and simulation results raise questions about FL’s value in assessing discriminant
validity. The argument that AVE should be greater than SV seems plausible, but the
implication that average item correlations should be greater than SV is less compelling. As a
heuristic, when AVE actually does equal SV, FL results will be wrong in about 50 percent of
samples. This result is disconcerting considering this criterion’s popularity in applied research.
Voorhees et al.’s (2016) review of 621 studies published in leading marketing journals shows
that half of all authors used FL or PHI in their confirmatory factor analyses. Our own review of
all empirical studies published in Internet Research between 2008 and 2017 that engaged in
some type of discriminant validity assessment shows an even greater preference for FL. Of the
90 studies that used confirmatory factor analyses and covariance-based SEM, 69 (77 percent)
report the FL, while 20 (22 percent) report PHI. Authors using PLS–SEM show an even greater
preference for FL as all 35 Internet Research studies published in these ten years report this
criterion, which is not surprising in light of lacking alternatives (Henseler et al., 2015). Heuristics
On the contrary, only one study has used a variant of MCD (O’Cass and Carlson, 2010) and versus
Ortiz et al.’s (2017) recent study has adopted HTMT. statistics
6.2 Limitations and future research
This study, like previous assessments of discriminant validity tests, focuses on the pairs of
constructs. However, in a seminal discussion of construct validity, Cronbach and Meehl
(1955, pp. 299-300) note that “a construct is defined implicitly by a network of associations or
propositions in which it occurs.” When members of a set of measures are compared just two
at a time, results can be misleading. For example, latent variables X and Y may correlate
perfectly and X and Z also correlate perfectly (ϕXY ¼ ϕXZ ¼ 1), yet Y and Z do not correlate at
all (ϕYZ ¼ 0). This pattern creates a contradiction: If X and Y are the same, and X and Z are the
same, how can Y be completely different from Z? Probably the most important avenue for
improved assessment of discriminant validity is to develop tests that consider complete
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

networks of relationships between variables rather than just individual pairs of variables.
Tests of discriminant validity within nomological networks are easy to implement in a
covariance-based SEM context simply by imposing equality constraints between correlations
of two focal variables with other theoretically related variables (van der Sluis et al., 2005).
For example, two constructs X and Y could be constrained to correlate perfectly (ϕXY ¼ 1)
(or to some other relevant criterion), and also simultaneously be constrained to have equal
correlations with other variables Zi: ϕXZ 1 ¼ ϕYZ 1, ϕXZ 2 ¼ ϕYZ 2, etc. This test uses more
information and can lead to different conclusions than the tests examined in the simulations.
In contrast, discriminant validity assessment in a PLS–SEM context is restricted to pairwise
comparisons of construct measures, rather than taking the entire nomological net into account.
Even though researchers using covariance-based SEM do not reap the benefits of a holistic
discriminant validity assessment as described above, PLS–SEM’s methodological restriction is
problematic and should be resolved in future research.
Finally, any simulation is limited in terms of how many factors and factor levels can be
studied. Differential loadings within constructs, as in previous simulations, might be
considered in combination with differential loadings across constructs, as in this study.
Robustness to non-normality or missing data could be a fruitful avenue for future research.
Other tests of disattenuated correlations could be studied in more depth.

6.3 Conclusion
This research provides evidence that both PHI and HTMT are reliable tools for assessing
discriminant validity. MCD is a promising supplemental test, which is easy to calculate and can
pick up problems when two measures clearly differ in terms of reliability. In such settings, PHI
and HTMT may overlook potential discriminant validity problems. Finally, FL has limitations
that do not justify its reputation for rigor and its widespread use in empirical research.

References
Al-Gahtani, S.S., Hubona, G.S. and Wang, J. (2007), “Information technology (IT) in Saudi Arabia: culture
and the acceptance and use of IT”, Information and Management, Vol. 44 No. 8, pp. 681-691.
Anderson, J.C. and Gerbing, D.W. (1988), “Structural equation modeling in practice: a review and
recommended two-step approach”, Psychological Bulletin, Vol. 103 No. 3, pp. 411-423.
Bagozzi, R.P. (2011), “Measurement and meaning in information systems and organizational research:
methodological and philosophical foundations”, MIS Quarterly, Vol. 35 No. 2, pp. 261-292.
Campbell, D.T. and Fiske, D.W. (1959), “Convergent and discriminant validation by the
multitrait–multimethod matrix”, Psychological Bulletin, Vol. 56 No. 2, pp. 81-105.
INTR Chen, Y.-Y.K., Jaw, Y.-L. and Wu, B.-L. (2016), “Effect of digital transformation on organisational
performance of SMEs: evidence from the Taiwanese textile industry’s web portal”, Internet
Research, Vol. 26 No. 1, pp. 186-212.
Cronbach, L.J. and Meehl, P.E. (1955), “Construct validity in psychological tests”, Psychological Bulletin,
Vol. 52 No. 4, pp. 281-302.
Dijkstra, T.K. and Henseler, J. (2015), “Consistent partial least squares path modeling”, MIS Quarterly,
Vol. 39 No. 2, pp. 297-316.
Farrell, A.M. (2010), “Insufficient discriminant validity: a comment on Bove, Pervan, Beatty and Shiu
(2009)”, Journal of Business Research, Vol. 63 No. 3, pp. 324-327.
Fiedler, M. and Sarstedt, M. (2014), “Influence of community design on user behaviors in online
communities”, Journal of Business Research, Vol. 67 No. 11, pp. 2258-2268.
Fornell, C. and Larcker, D.F. (1981), “Evaluating structural equation models with unobservable
variables and measurement error”, Journal of Marketing Research, Vol. 18 No. 1, pp. 39-50.
Forsyth, R.A. and Feldt, L.S. (1969), “An investigation of empirical sampling distributions of
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

correlation coefficients corrected for attenuation”, Educational and Psychological Measurement,


Vol. 29 No. 1, pp. 61-71.
Grewal, R., Cote, J.A. and Baumgartner, H. (2004), “Multicollinearity and measurement error in
structural equation models: implications for theory testing”, Marketing Science, Vol. 23 No. 4,
pp. 519-529.
Hair, J.F., Hult, G.T.M., Ringle, C.M. and Sarstedt, M. (2017), A Primer on Partial Least Squares
Structural Equation Modeling (PLS–SEM), 2nd ed., Sage, Thousand Oaks, CA.
Hair, J.F., Hult, G.T.M., Ringle, C.M., Sarstedt, M. and Thiele, K.O. (2017), “Mirror, mirror on the wall: a
comparative evaluation of composite-based structural equation modeling methods”, Journal of
the Academy of Marketing Science, Vol. 45 No. 5, pp. 616-632.
Hair, J.F., Sarstedt, M., Ringle, C.M. and Gudergan, S.P. (2018), Advanced Issues in Partial Least Squares
Structural Equation Modeling (PLS–SEM), Sage, Thousand Oaks, CA.
Henseler, J., Ringle, C.M. and Sarstedt, M. (2015), “A new criterion for assessing discriminant validity in
variance-based structural equation modeling”, Journal of the Academy of Marketing Science,
Vol. 43 No. 1, pp. 115-135.
Ho, L.-A., Kuo, T.-H. and Lin, B. (2012), “How social identification and trust influence organizational
online knowledge sharing”, Internet Research, Vol. 22 No. 1, pp. 4-28.
Jöreskog, K.G. (1971), “Simultaneous factor analysis in several populations”, Psychometrika, Vol. 36
No. 4, pp. 409-426.
Kristof, W. (1973), “Testing a linear relation between true scores of two measures”, Psychometrika,
Vol. 38 No. 1, pp. 101-111.
Lord, F.M. (1957), “A significance test for the hypothesis that two variables measure the same trait
except for errors of measurement”, Psychometrika, Vol. 22 No. 3, pp. 207-220.
Lord, F.M. (1973), “Testing if two measuring procedures measure the same dimension”, Psychological
Bulletin, Vol. 79 No. 1, pp. 71-72.
McDonald, R.P. (1999), Test Theory: A Unified Treatment, Lawrence Erlbaum, Mahwah, NJ.
McNemar, Q. (1958), “Attenuation and interaction”, Psychometrika, Vol. 23 No. 2, pp. 259-266.
MacKenzie, S.B., Podsakoff, P.M. and Podsakoff, N.P. (2011), “Construct measurement and validation
procedures in MIS and behavioral research: integrating new and existing techniques”,
MIS Quarterly, Vol. 35 No. 2, pp. 293-334.
Migdadi, M.M., Zaid, M.K.S.A., Al-Hujran, O.S. and Aloudat, A.M. (2016), “An empirical assessment of
the antecedents of electronic-business implementation and the resulting organizational
performance”, Internet Research, Vol. 26 No. 3, pp. 661-688.
Mooi, E.A., Sarstedt, M. and Mooi-Reci, I. (2018), Market Research. The Process, Data, and Methods
using STATA, Springer, Berlin.
Muñoz-Expósito, M., Oviedo-García, M.A. and Castellanos-Verdugo, M. (2017), “How to measure Heuristics
engagement in Twitter: advancing a metric”, Internet Research, Vol. 27 No. 5, pp. 1122-1148. versus
Muthén, L.K. and Muthén, B.O. (2015), Mplus user’s Guide, 7th ed., Muthén and Muthén, statistics
Los Angeles, CA.
Netemeyer, R.G., Bearden, W.O. and Sharma, S. (2003), Scaling Procedures: Issues and Applications,
Sage, Thousand Oaks, CA.
Nunnally, J.C. (1978), Psychometric Theory, 2nd ed., McGraw-Hill, New York, NY.
O’Cass, A. and Carlson, J. (2010), “Examining the effects of website-induced flow in professional
sporting team websites”, Internet Research, Vol. 20 No. 2, pp. 115-134.
Ortiz, J., Chih, W.-H. and Teng, H.-C. (2017), “Electronic word of mouth in the Taiwanese social
networking community: participation factors”, Internet Research, Vol. 27 No. 5, pp. 1058-1084.
Paxton, P., Curran, P.M., Bollen, K.A., Kirby, J. and Chen, F. (2001), “Monte Carlo experiments: design
and implementation”, Structural Equation Modeling, Vol. 8 No. 2, pp. 287-312.
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

Raykov, T. (1998), “Coefficient alpha and composite reliability with interrelated nonhomogeneous
items”, Applied Psychological Measurement, Vol. 22 No. 4, pp. 375-385.
Rigdon, E.E., Sarstedt, M. and Ringle, C.M. (2017), “On comparing results from CB–SEM and
PLS–SEM: five perspectives and five recommendations”, Marketing ZfP – Journal of Research
and Management, Vol. 39 No. 3, pp. 4-16.
Rigdon, E.E., Becker, J.-M., Rai, A., Ringle, C.M., Diamantopoulos, A., Karahanna, E.,
Straub, D. and Dijkstra, T.K. (2014), “Conflating antecedents and formative indicators: a
comment on Aguirre-Urreta and Marakas”, Information Systems Research, Vol. 25 No. 4,
pp. 780-784.
Ringle, C.M. and Sarstedt, M. (2016), “Gain more insight from your PLS–SEM results.
The importance–performance map analysis”, Industrial Management and Data Systems,
Vol. 116 No. 9, pp. 1865-1886.
Ringle, C.M., Wende, S. and Becker, J.-M. (2015), SmartPLS 3, SmartPLS, Bönningstedt.
Rogers, W.T. (1976), “Jackknifing disattentuated correlations”, Psychometrika, Vol. 41 No. 1, pp. 121-133.
Sarstedt, M., Hair, J.F., Ringle, C.M., Thiele, K.O. and Gudergan, S.P. (2016), “Estimation issues
with PLS and CBSEM: where the bias lies!”, Journal of Business Research, Vol. 69 No. 10,
pp. 3998-4010.
van der Sluis, S., Dolan, C.V. and Stoel, R.D. (2005), “A note on testing perfect correlations in SEM”,
Structural Equation Modeling, Vol. 12 No. 4, pp. 551-577.
Venkatesh, V. and Davis, F.D. (2000), “A theoretical extension of the technology acceptance model: four
longitudinal field studies”, Management Science, Vol. 46 No. 2, pp. 186-204.
Venkatesh, V., Morris, M.G., Davis, G.B. and Davis, F.D. (2003), “User acceptance of information
technology: toward a unified view”, MIS Quarterly, Vol. 27 No. 3, pp. 425-478.
Voorhees, C.M., Brady, M.K., Calantone, R. and Ramirez, E. (2016), “Discriminant validity testing in
marketing: an analysis, causes for concern and proposed remedies”, Journal of the Academy of
Marketing Science, Vol. 44 No. 1, pp. 119-134.
Wang, G. and Netemeyer, R.G. (2002), “The effects of job autonomy, customer demandingness and trait
competitiveness on salesperson learning, self-efficacy and performance”, Journal of the Academy
of Marketing Science, Vol. 30 No. 3, pp. 217-228.
Werts, C.E., Linn, R.L. and Jöreskog, K.G. (1974), “Quantifying unmeasured variables”, in Blalock, H.M. Jr
(Ed.), Measurement in the Social Sciences: Theories and Strategies, Aldine, Chicago, IL,
pp. 270-292.
Wold, H.O.A. (1974), “Causal flows with latent variables: partings of ways in the light of NIPALS
modelling”, European Economic Review, Vol. 5 No. 1, pp. 67-86.
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

INTR
Appendix. Example Mplus code for inferential tests
Heuristics
versus
statistics
Downloaded by Macquarie University At 03:13 26 February 2019 (PT)

Corresponding author
Marko Sarstedt can be contacted at: marko.sarstedt@ovgu.de

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

You might also like