Professional Documents
Culture Documents
Lower Bounds On Sample Size in Structural Equation Modeling PDF
Lower Bounds On Sample Size in Structural Equation Modeling PDF
Published in Electronic Commerce Research and Applications, forthcoming Dec 2010, PII:S15674223(10)000542, DOI: 10.1016/j.elerap.2010.07.003 (with software downloadable from
Elsevier)
J.ChristopherWestland
Professor,Information&DecisionSciences
UniversityofIllinois,Chicago
601S.MorganStreet,Chicago,IL606077124
(312)8600587email:westland@uic.edu
J ULY 2010
A BSTRACT
LOWERBOUNDSONSAMPLESIZEINSTRUCTURALEQUATIONMODELING
Computationallyintensivestructuralequationmodeling(SEM)approacheshavebeenindevelopmentovermuchofthe20thcentury,initiatedbytheseminalworkofSewallWright. Tothis
day,samplesizerequirementsremainavexingquestioninSEMbasedstudies. Complexitieswhichincreaseinformationdemandsinstructuralmodelestimationincreasewiththenumber
ofpotentialcombinationsoflatentvariables;whiletheinformationsuppliedforestimationincreaseswiththenumberofmeasuredparameterstimesthenumberofobservationsinthe
samplesizebotharenonlinear. Thisalonewouldimplythatrequisitesamplesizeisnotalinearfunctionsolelyofindicatorcount,eventhoughsuchheuristicsarewidelyinvokedin
justifyingSEMsamplesize. ThispaperdevelopstwolowerboundsonsamplesizeinSEM,thefirstasafunctionoftheratioofindicatorvariablestolatentvariables,andthesecondasa
functionofminimumeffect,powerandsignificance.ThealgorithmisappliedtoametastudyofasetofresearchpublishedinfiveofthetopMISjournals. Thestudyshowsasystematic
biastowardschoosingsamplesizesthataresignificantlytoosmall. Actualsamplesizesaveragedonly50%oftheminimumneededtodrawtheconclusionsthestudiesclaimed. Overall,
80%oftheresearcharticlesinthemetastudydrewconclusionsfrominsufficientsamples.Lackingaccuratesamplesizeinformation,researchersareinclinedtoeconomizeonsample
collectionwithinadequatesamplesthathurtthecredibilityofresearchconclusions. Guidelinesareprovidedforapplyingthealgorithmsdevelopedinthisstudy,andcompanionsoftware
encapsulatingthepapersformulaeismadeavailablefordownload. (261words)
Keywords: Structuralequationmodeling,SEM,Partialleastsquares,PLS,LISREL,AMOS,samplesize,Ginicorrelation,commonfactorbias,ruleof10
1.INTRODUCTION
Thepasttwodecadeshaveseenaremarkableaccelerationofinterestinstructuralequationsmodeling(SEM)methodsinmanagementresearch,includingpartialleastsquares(PLS)and
implementationsofJreskogsSEMalgorithms(LISREL,AMOS,EQS). ThebreadthofapplicationofSEMmethodshasbeenexpanding,withSEMincreasinglyappliedtoexploratory,
confirmatoryandpredictiveanalysiswithavarietyofadhoctopicsandmodels. SEMisparticularlyusefulinthesocialscienceswheremanyifnotmostkeyconceptsarenotdirectly
observable.Becausemanykeyconceptsinthesocialsciencesareinherentlylatent,questionsofconstructvalidityandmethodologicalsoundnesstakeonaparticularurgency.
Tothisday,methodologiesforassessingsuitablesamplesizerequirementsremainavexingquestioninSEMbasedstudies. Thenumberofdegreesoffreedomconsuminginformationin
structuralmodelestimationincreaseswiththenumberofpotentialcombinationsoflatentvariables;whiletheinformationsuppliedinestimatingincreaseswiththenumberofmeasured
parameters(i.e.,indicators)timesthenumberofobservations(i.e.,thesamplesize)botharenonlinearinmodelparameters. Thisshouldimplythatrequisitesamplesizeisnotalinear
functionsolelyofindicatorcount,eventhoughsuchheuristicsarewidelyinvokedinjustifyingSEMsamplesize. MonteCarlosimulationinthisfieldhaslentsupporttothenonlinearityof
samplesizerequirements,thoughresearchtodatehasnotyieldedasamplesizeformulasuitableforSEM. Thispaperproposesasetofnecessaryconditions(thuslowerbounds)forSEM
sampleadequacy.
Theexpositionproceedsasfollows. Section2describesthehistoricalcontext,commentingonhowparticularresearchobjectivesandcomputationallimitationsresultedinourcurrentSEM
toolsets.Section3summarizesthepriorliteratureonsampleadequacyresultsfromMonteCarlosimulations.Section4developsanalgorithmforcomputingtheminimumsamplesize
neededtodetectaminimumeffectatgivenpowerandsignificancelevelsinthestructuralequationmodel. Section5discussesthesewithanapplicationresearcharticleswhose
conclusionsrestonconfirmatorySEManalyses,andassesseswhetherthesamplesizesusedareadequate.
2.PRIORLITERATURE
SEMevolvedinthreedifferentstreams:(1)systemsofequationregressionmethodsdevelopedmainlyattheCowlesCommission;(2)iterativemaximumlikelihoodalgorithmsforpath
analysisdevelopedmainlyattheUniversityofUppsala;and(3)iterativeleastsquaresfitalgorithmsforpathanalysisalsodevelopedattheUniversityofUppsala. Figure1providesa
chronologyofthepivotaldevelopmentsinlatentvariablestatisticsintermsofmethod(precomputer,computerintensiveandSEM)andobjectives(exploratory/predictionor
confirmation).
INSERT FIGURE1:DEVELOPMENTOFSTRUCTURALEQUATIONMODELESTIMATION
BothLISRELandPLSwereconceivedasiterativecomputeralgorithms,withanemphasisfromthestartoncreatinganaccessiblegraphicalanddataentryinterfaceandextensionofWrights
(1921)pathanalysis. EarlyCowlesCommissionworkonsimultaneousequationsestimationcenteredonKoopmanandHoods(1953)algorithmsfromtheeconomicsoftransportationand
optimalrouting,withmaximumlikelihoodestimation,andclosedformalgebraiccalculations,asiterativesolutionsearchtechniqueswerelimitedinthedaysbeforecomputers. Anderson
andRubin(1949,1950)developedthelimitedinformationmaximumlikelihoodestimatorfortheparametersofasinglestructuralequation,whichindirectlyincludedthetwostageleast
squaresestimatoranditsasymptoticdistribution(Anderson,2005)andFarebrother(1999). Twostageleastsquareswasoriginallyproposedasamethodofestimatingtheparametersofa
singlestructuralequationinasystemoflinearsimultaneousequations,beingintroducedbyTheil(1953a,1953b,1961)andmoreorlessindependentlybyBasmann(1957)andSargan
(1958). Andersonslimitedinformationmaximumlikelihoodestimationwaseventuallyimplementedinacomputersearchalgorithm,whereitcompetedwithotheriterativeSEM
algorithms. Ofthese,twostageleastsquareswasbyfarthemostwidelyusedmethodinthe1960sandtheearly1970s.
LISRELandPLSpathmodelingapproacheswerechampionedatCowlesmainlybyNobelistTrygveHaavelmo(1943).UnfortunatelyunderlyingassumptionsofLISRELandPLSwerechallenged
byeconomistssuchasFreedman(1987)whoobjectedtotheirfailuretodistinguishamongcausalassumptions,statisticalimplications,andpolicyclaimshasbeenoneofthemainreasons
forthesuspicionandconfusionsurroundingquantitativemethodsinthesocialsciences(seealsoWolds(1987)response). Haavelmospathanalysisnevergainedalargefollowingamong
U.S.econometricians,butwassuccessfulininfluencingagenerationofHaavelmosfellowScandinavianstatisticians,includingHermannWold,KarlJreskog,andClaesFornell. Fornell
introducedLISRELandPLStechniquestomanyofhisMichigancolleaguesthroughinfluentialpapersinaccounting(FornellandLarker1981),andinformationsystems(Davis,etal,1989).
Dhrymes(1971;Dhrymes,etal.1974)providedevidencethatPLSestimatesasymptoticallyapproachedthoseoftwostageleastsquareswithexactlyidentifiedequations. Thispointis
moreofacademicimportancethanpractical,becausemostempiricalstudiesoveridentify. Butinonesense,allofthelimitedinformationmethods(OLSexcluded)yieldsimilarresults.
3.SAMPLESIZEANDTHERATIOOFINDICATORSTOLATENTVARIABLES
3
StructuralequationmodelinginMIShastakenacasualattitudetowardschoiceofsamplesize. Sincetheearly1990s,MISresearchershavealludedtoanadhocruleofthumbrequiringthe
choosingof10observationsperindicatorinsettingalowerboundfortheadequacyofsamplesizes. Justificationsforthisruleof10appearinseveralfrequentlycitedpublications(Barclay,
etal.1995;Chin1998;Chin,andNewsted1999;KahaiandCooper2003)thoughnoneoftheseresearchersreferstotheoriginalarticulationoftherulebyNunnally(1967)whosuggested
(withoutprovidingsupportingevidence)thatinSEMestimationagoodruleistohaveatleasttentimesasmanysubjectsasvariables.
WithintheMISfield,Goodhue,etal.(2006,2007)studiedtheruleof10usingMonteCarlosimulationtocomparesamplesizesof40,90,150,and200,alongwithvaryingeffectsizes(large,
medium,smallandnoeffect)todeterminetheadequacyofthisruleforagivensignificanceandpoweroftests. Theyconcludedthat:Infact,forsimple[SEM]modelswithnormally
distributeddataandrelativelyreliablemeasures,noneofthethreetechniqueshaveadequatepowertodetectsmallormediumeffectsatsmallsamplesizesThesefindingsruncounterto
extantsuggestionsinMISliterature(Goodhue,etal.2006,p.202b). Thisfindingisnotcompletelyunexpected,assimilarSEMrulesofthumbhavebeeninvestigatedsinceNunnallys
(1967)proposal. Thedebatehasevolvedsignificantlysincehis1967publication.
Theruleof10couchesthesamplesizequestionintermsoftheratioofobservations(samplepoints)tofreeparametersforexample,Bollen(1989)statedthatthoughIknowofnohard
andfastrule,ausefulsuggestionistohaveatleastseveralcasesperfreeparameterandBentler(1989)suggesteda5:1ratioofsamplesizetonumberoffreeparameters. Butisthisthe
rightquestion? Typicallytheirparameterswereconsideredtobeindicatorvariablesinthemodel,butunliketheearlypathanalysis,structuralequationmodelstodayaretypically
that,andsamplesizeandestimatorperformancearegenerallyuncorrelatedwitheither or .
freeparameters,thesearenotindividuallythefocusofSEMestimation. Rather,freeparametersareclusteredaroundamuchsmallersetoflatentvariableswhicharethefocusofthe
estimation(oralternatively,thecorrelationsbetweentheseunobservedlatentvariablesarethefocusofestimation). Tanaka(1987)arguedthatsamplesizeshouldbedependentonthe
numberofestimatedparameters(thelatentvariablesandtheircorrelations)ratherthanonthetotalnumberofindicators;aviewmirroredinotherdiscussionsofminimumsamplesizes
(BrowneandCudeck1989,1993;GewekeandSingleton1980;GebringandAnderson1985). VeicerandFava(1987,1989,1994)wentfurther,afterreviewingavarietyofsuch
recommendationsintheliterature,concludingthattherewasnosupportforrulespositingaminimumsamplesizeasafunctionofindicators. Theyshowedthatforagivensamplesize,a
convergencetopropersolutionsandgoodnessoffitwerefavorablyinfluencedby:(1)agreaternumberofindicatorsperlatentvariable;and(2)agreatersaturation(higherfactorloadings).
MarshandBailey(1991)concludedthattheratioofindicatorstolatentvariablesratherthanjustthenumberofindicators,assuggestedbytheruleof10,isasubstantiallybetterbasison
estimationincreasesbothwithmoreindicatorsperlatentvariable,aswellaswithmoresampleobservations. Aseriesofstudies(Ding,etal.1995)foundthattheprobabilityofrejecting
truemodelsatasignificancelevelof5%wascloseto5%for 2 (where istheratioofindicatorstolatentvariables)butrosesteadilyas increased for 6 ,rejection
rateswere39%forsamplesizeof50;22%forsamplesizeof100;12%forsamplesizeof200;and6%forsamplesizeof400.
50 450 1100
4.SAMPLESIZEWITHPAIREDLATENTVARIABLES
ThissectiondevelopsanalgorithmforcomputingthelowerboundonsamplesizerequiredtoconfirmorrejecttheexistenceofaminimumeffectinanSEMatgivensignificanceandpower
levels. WhereSEMstudiesaredirectedtowardshypothesistestingforcomplexmodels,withsomelevelofsignificance andpower 1 ,calculatingthepowerrequiresfirstspecifying
theeffectsize youwanttodetect. Fundingagencies,ethicsboardsandresearchreviewpanelsfrequentlyrequestthataresearcherperformapoweranalysis,theargumentisthatifa
studyisinadequatelypowered,thereisnopointincompletingtheresearch. Additionally,intheframeworkofSEMtheassessmentofpowerisaffectedbythevariableinformation
containedinsocialsciencedata. Table1summarizesthenotationused.
INSERT TABLE1:NOTATIONUSEDINTHEPAPER
DECONSTRUCTION
, | , , , ,
andcovariancestructureis
COMBINATORICSOFHYPOTHESISTESTSONLINKS ,ANDSIGNIFICANCELEVEL
ItistypicalintheliteraturetopredicateanSEManalysiswiththecaveatthatoneneedstomakestrongargumentsforthecomplexmodelsconstructedfromtheunobserved,latent
constructstestedwiththeparticularSEM,inordertosupporttheparticularlinksthatareincludedinthemodel. Thisisusuallyinterpretedtomeanthateachproposed(andtested)linkin
theSEMneedstobesupportedwithreferencestopriorresearch,anecdotalevidenceandsoforth. Thismaysimplymeanthewholesaleimportofapreexistingmodel(e.g.,theTechnology
AcceptanceModel)basedonthesuccessofthatmodelinothercontexts,butnotspecificallybuildingontheparticulareffectsunderinvestigation. Unfortunately,itisuncommontosee
anydiscussionoftheparticularlinks(causalorotherwise)orcombinationsoflinksthatareexcluded(eitherimplicitlyorexplicitly)fromtheSEMmodel. Ideally,thereshouldalsobe
similarlystrongargumentsmadefortheinapplicabilityofomittedlinksoromittedcombinationsoflinks.
0:
1:
Ourproblemistocomputethenumberofdistinctstructuralequationmodelsthatcanexistintermsofthe0,1valuesoftheirlinksusingcombinatorialanalysis.
INSERT FIGURE2:ANEXAMPLEOFASTRUCTURALEQUATIONMODELWITHSIXLATENTVARIABLESANDFIVECORRELATIONS
INSERT FIGURE3:THESEMEXAMPLEINFIGURE2WITHALLPOSSIBLEPAIREDLINKSSHOWN
and3)eachrepresentingauniquecombinationoflatentvariables. Theuniquemodelhypothesizedinanyparticularstudywillbesomemodel(binarynumber)whichisexactlyoneoutof
MINIMUMEFFECTSIZE
Minimumeffect,inthecontextofstructuralequationmodels,isthesmallestcorrelationbetweenlatentvariablesthatwewishtobeabletodetectwithoursampleandmodel. Small
effectsaremoredifficulttodetectthanlargeeffectsastheyrequiremoreinformationtobecollected. Informationmaybeaddedtotheanalysisbycollectingmoresampleobservations,
byaddingparameters,andbyconstructingabettermodel.
INSERT FIGURE4:SIGNIFICANCEANDPOWERFORTHEMINIMUMEFFECTTHATNEEDSTOBEDETECTED
Inthecontextofstructuralequationmodels,canonicalcorrelationbetweenlatentvariablesshouldbeseensimplyascorrelation,thecanonicalqualifierreferringtotheparticularsofits
calculationinSEMsincethelatentvariablesareunobserved,andthuscannotbedirectlymeasured. Correlationisinterpretedasthestrengthofstatisticalrelationshipbetweentwo
randomvariablesobeyingajointprobabilitydistribution(KendallandGibbons1990)likeabivariatenormal.Severalmethodsexisttocomputecorrelation:thePearsonsproductmoment
correlationcoefficient(Fisher1921,1990),SpearmansrhoandKendallstau(KendallandGibbons1990)areperhapsthemostwidelyused(MariandKotz2001). Besidesthesethree
classicalcorrelationcoefficients,variousestimatorsbasedonMestimation(ShevlyakovandVilchevski2002)andorderstatistics(SchechtmanandYitzhaki1987)havebeenproposedinthe
literature.Strengthsandweaknessesofvariouscorrelationcoefficientsmustbeconsideredindecisionmaking.ThePearsoncoefficient,whichutilizesalltheinformationcontainedinthe
variates,isoptimalwhenmeasuringthecorrelationbetweenbivariatenormalvariables(StuartandOrd1991). However,itcanperformpoorlywhenthedataisattenuatedbynonlinear
transformations. Thetworankcorrelationcoefficients,SpearmansrhoandKendallstau,arenotasefficientasthePearsoncorrelationunderthebivariatenormalmodel;nevertheless
theyareinvariantunderincreasingmonotonetransformations,thusoftenconsideredasrobustalternativestothePearsoncoefficientwhenthedatadeviatesfrombivariatenormalmodel.
Despitetheirrobustnessandstabilityinnonnormalcases,theMestimatorbasedcorrelationcoefficientssuffergreatlosses(upto63%accordingtoXu,etal.2010)ofasymptoticrelative
efficiencytothePearsoncoefficientfornormalsamples,thoughsuchheavylossofefficiencymightnotbecompensatedbytheirrobustnessinpractice. SchechtmanandYitzhaki(1987)
proposedacorrelationcoefficientbasedonorderstatisticsforthebivariatedistributionwhichtheycallGinicorrelation(becauseitisrelatedtoGinismeandifferenceinawaythatissimilar
totherelationshipbetweenPearsoncorrelationcoefficientandthevariance).
Asameasureofsuchstrength,correlationshouldbelargeandpositiveifthereisahighprobabilitythatlargeorsmallvaluesofonevariableoccur(respectively)inconjunctionwithlargeof
smallvaluesofanother;anditshouldbelargeandnegativeifthedirectionisreversed(GibbonsandChakraborti1992). Figure5providesarugplotofbivariatenormalscatterplots
generatedbytheRmtvnormpackagethatprovideavisualdescriptionoftheclusteringandbehaviorofparticularvaluesofcorrelation betweenthelatentvariables.
Wewilluseastandarddefinitionofminimumeffectsizetobedetectedthestrengthoftherelationshipbetweentwovariablesinastatisticalpopulationasmeasuredbythecorrelation
forpairedlatentvariablesfollowingconventionsarticulatedinWilkinson(1999);Nakagawaetal.(2007)andBrand,etal.(2008). Whereweareassessingcompletedresearch,wecan
substitutefor thesmallestcorrelation(effectsize)onallofthelinksbetweenlatentvariablesintheSEM. Cohen(1988,1992)providesthefollowingguidelinesforthesocialsciences:
smalleffectsize, | | =0.1.23;medium, | | =0.24.36;large, | | =0.37orlarger. Figure5givesusafeelforCohensrecommendations | | =0.37stillhasagreatdealofdispersion,
andwemightfinditdifficulttovisuallydeterminecorrelationmerelybylookingatascatterplotwherethevariablesonthetwoaxeshavecorrelation | | =0.37.
1
2 1 :
1
,
1
2 1 :
1
and
1
2 1 :
1
,
1
2 1 :
1
, , ,
Ginicorrelation possessesthefollowinggeneralproperties(SchechtmanandYitzhaki1987):
1) 1,1
2) , , 1 if isamonotoneincreasing(decreasing)functionof
4) , , , , forboth , and ,
5) , isinvariantunderallstrictlymonotonetransformationsof
6) , isscaleandshiftinvariantwithrespecttoboth and
7) 0, ;i.e.,convergesindistributiontoanormaldistributionwithmeanzeroandvariance (ThisisfromSchechtmanandYitzhaki(1987)applying
methodsdevelopedbyHoeffding(1948))
8) TheSpearmanrhomeasureofcorrelationisaspecialcaseof , ;Xu,etal(2010).
Xu,etal.(2010)showedthatGinicorrelationsareasymptoticallynormalwiththefollowingmeanandvariance 1 :
2 2
2 4 3
1
1 1
1
1 6
Xu,etal.(2010)usedMonteCarlosimulationstoverifytheseformulasasymptoticresults(usingasymptoticrelativeefficiencyandrootmeansquareerrorperformancemetrics)showing
thattheyareapplicablefordataofevenrelativelysmallsamplesizes(downtoaround30samplepoints). TheirsimulationsconfirmedandextendHeaandNagarajabs(2009)Monte
Carlosimulationsexploringthebehaviorofninedistinctcorrelationestimatorsofthebivariatenormalcorrelationcoefficient,includingtheestimator ,thesamplecorrelationforthe
bivariatenormal,andestimatorsbasedonorderstatistics. Theestimator wasfoundgenerallytoreducebiasandimproveefficiencyaswellorbetterthanothercorrelationestimators
inthestudy. Xu,etal.(2010)alsocompared withthreeothercloselyrelatedcorrelationcoefficients: (1)classicalPearsonsproductmomentcorrelationcoefficient,(2)Spearmans
rho,and(3)orderstatisticscorrelationcoefficients. GinicorrelationbridgesthegapbetweentheorderstatisticscorrelationcoefficientandSpearmansrho,anditsestimatorsaremore
mathematicallytractablethanSpearmansrho,whosevarianceinvolvescomplexellipticintegralsthatcannotbeexpressedinelementaryfunctions. Theirefficiencyanalysisshowedthat
estimator slossofefficiencyisbetween4.5%to11.3%,muchlessthanthatofSpearmansrhowhichrangesfrom8.8%to30.5%.
1
convergenceimpliesthatfortheremainingterms gotozerofasterthan ; 0
Constructahypothesistesttojustdetecttheminimumeffectsize :
: 0
Theonesample,twosidedformulation(seeFigure4)thatreconcilesthenullandalternativehypothesistestsfortheestimator is
Thustowithinlittle andusingtheformulafor
1 1 2 1 1
, 1 2
1 6 1 4
Then arethesolutionsforthequadraticequationthatrestates , 0:
2 2
6 6 0
Orintermsof , , , andtakingthelargestroot
1
4 2 2
2 6 6 6
5.METASTUDYANDDISCUSSION
Thisresearchconstructedtwonecessaryconditionsforsampleadequacy:
1. Section3determinedthesamplesizeneededcompensatefortheratioofnumberofindicatorvariablestolatentvariables(summarizedfromMonteCarlosimulationsthathave
appearedintheliterature);and
2. Section4determinedthesamplesizerequiredtoassuretheexistenceornonexistenceofaminimumeffect(correlation)oneachpossiblepairoflatentvariablesintheSEM
(determinedanalytically).
Ofcourse,neitheroftheseconditionsissufficienttoassuresampleadequacyforaparticularchoiceof , becausetherearesomanyotherfactorsthatcanaffectestimationandsample
sizemulticolinearity,appropriatenessofdatasets,andsoforth. Additionally,theinformationcontainedinthesampleandindicatorvariablesmustbeadequatetocompensatefor
variationsinparticularSEMestimationmethodologies. Forexample,partialleastsquare(PLS)approachesgenerateparameterestimatesthatlackconsistency. Dhrymes(1970);
Schneewei(1990,1991,1993);Thomas,etal.(2005);andFhr(1989)alldemonstratethattheIV/2SLStechniquesconvergetothesameestimators,butaremorerobust. Joreskog
(1967,1970;JreskogandSrbom1996)suggeststhatdeparturesfromnormaldistributionfortheindicatorswilldemandlargersamples,andthatnonnormalindicatorsrequireone,twoor
threemagnitudeslargersamples,dependingondistribution.
Fromapracticalviewpoint,samplesizequestionscantakethreeforms:
1. Apriori:willaskwhatsamplesizewillbesufficientgiventheresearcherspriorbeliefsonwhattheminimumeffectisthatthetestswillneedtodetect
2. Exposteriori:willaskwhatsamplesizeshouldhavebeentakeninordertodetecttheminimumeffectthattheresearcheractuallydetectedinanexisting(eithersufficientor
insufficient)test. Iftheexposteriorimeasuredeffectissmallerthantheresearcherspriorbeliefsabouttheminimumeffect(in1.)thensamplesizeneedstobeincreased
commensurately.
3. Sequentialtestoptimalstopping:iscouchedinasequentialtestoptimalstoppingcontext,wherethesamplesizeisincrementeduntilitisconsideredsufficienttostoptesting.
Inthissection,wereportonanexposteriori metastudythatappliesthealgorithmsdevelopedinthispapertoaspecificbodyofSEMresearchstudiespublishedinfivecorejournalsin
MISandeCommerce(ISR,MISQ,ManagementScience,DecisionSciencesandJMIS)between1989(thedateoftheseminalstudybyDavis,etal.1989)and2007. Weassumedthatthelink
withthesmallesteffectactuallyobservedinthesestudiesdetermines aconservativeassumption,becausetheresearchwouldhavebeenverylikelytoholdabiasinactuallywanting
todetectevensmallereffectsthanthoseactuallyobserved,butthemodelanddatawouldhaveonlyhadsufficientresolutiontocapturetheminimumeffectobserved.
Additionally,manyofthestudieslistedinAppendixAanalyzedLikertscaledatathatisnotdistributednormally;nevertheless,theassumptionofnormalcyofdataisacommononeinSEM
studies,evenwherethedataisclearlynotnormal,forexamplewheresurveydatareturnsdiscreteLikertscaledatacensoredat0,andderivesfromamassfunctionwhichislikelytobe
skewed. Becauseestimatorbehaviorisbestunderstoodfornormaldata,wecanassumethat,inthesenonnormaldatastudiesourlowerboundonsamplesizeneedsatanonnormalcy
riskpremiumforsampleadequacydeparturesfromanormalweightmatrixinLISRELsuggestthatthismaybetwotothreeordersofmagnitudelargerthansamplesizerequiredfornormal
data.
Samplesizesactuallyusedindrawingconclusionsinthestudywerecomparedwithourcomputedlowerbound,andadifferencetakenasapercentage(thefarrighthandcolumnof
averagesamplewas770%toosmall;withtheremovalofthreeoutliers,thisdroppedto400%toosmall(figures6and7). Actualsamplesizesinthese74researcharticleswereonaverage
only50%oftheminimumneededtodrawtheconclusionsthestudiesclaimed;mediansamplesizewas38%oftheminimumrequired,reflectingasubstantialnegativeskewinginthe
undersampling,andstandarddeviationwas29%. Overall,80%oftheresearcharticlesinthismetastudydrewconclusionsfromsamplesthatweresmallerthanthelowerboundson
samplesizecomputedhere. Becauseeachadditionalobservationincreasesthecostofthestudyintime,effortandmonetaryterms,aninclinationtoeconomizeondatacollectionis
understandable. TheconclusionthatseemsmostappropriatefromourmetastudyisthatMISresearchershavebeengiveninadequateguidance,andhavenotbeenwellservedby
existingsamplesizeheuristics. Lackingthesamplesizeinformationtheyneed,researchersmaybeinclinedtoskimponsamplecollection. Unfortunately,whensamplesaretoolarge,
thestudiesweremorecostlythantheyneededtobeindrawingparticularconclusions;whensamplesaretoosmall,thecredibilityoftheirconclusionsisweakened.
INSERT FIGURE6:PERCENTERRORINSAMPLESIZEFOR74STUDIESINTHEENTIREMETASTUDY(MEAN=770,STANDARDDEVIATION=25,SKEWNESS=6.5,KURTOSIS=47)
Weshouldnotbesurprised,givenourreviewofthepriorliterature,thatexistingsamplesizeheuristicsaremisleadingresearchersinthisarea. Numerousstudieshaveconcludedthat
linearheuristicsliketheruleof10arepoorguidestofitandexplanatorypowerofthemodeloradequacyofthesamplesize. (BrowneandCudeck1989,1993,Geweke,andSingleton
1980;GebringandAnderson1985);VeicerandFava1987,1989,1994;MarshandBailey1991;Boomsma1982;Ding,etal.1995)
Asnotedearlier,neitheroftheconditionsdevelopedhereissufficienttoassuresampleadequacyforaparticularchoiceof , becausetherearesomanyfactorsthatcanaffect
estimationandsamplesizeinsomethingascomplexasastructuralequationmodel. Consequently,thenecessarysamplesizeforaccurateestimationwillinmostcasesexceedthelower
boundcomputedhere. Butreviewofactualsamplesizessummarizedinfigures6and7suggeststhat,atitsmostunambitious,thislowerboundwillinsureagainsttheveryerraticunder
sizingofsamplesthatseemscommoninSEManalysis.
10
FutureresearchonsamplesizechoiceshouldbeconductedonlinesspecifictothevariousalgorithmsusedtoestimateSEMPLSsprincipalcomponentsanalysisalgorithms;LISRELand
AMOSsgradientsearchalgorithms;andsystemsofequationsregressionalgorithms. Indeed,seminalresearchineachoftheseareasalludedtothisdecadesago. Wold(1980,1981)went
evenfurtherinadvisingthatPLSismoresuitableforexploratorymodelspecificationsearchesratherthanhypothesestesting,andintroducedtheconceptofplausiblecausalityforthatvery
reason. ThusinPLS,thesamplesizequestionisprobablybothlessrelevantandlesscritical,becausehypothesistestingisbetterlefttoLISRELandsystemsofequationapproaches.
Theprobleminbuildingthestructuralmodelcompletelyontheory,withoutreferencetothedataisthatthelatentconstructschosenbytheresearchermaybesubstantiallydifferentthan
thosethatwoulddropoutofanexploratoryfactoranalysis. ResearchershavedevelopedatestforthiscalledHarmonsonefactortest(PodsakoffandOrgan1986)commonlyusedto
checkforcommonfactorbiasinSEM(andoftenconductedexposteriori). Commonfactorbiasappearsbecauseinherentclusteringresultsfromaparticulardistancemeasureusedto
positiondatapointsinndimensionalspaceforexample,principalcomponentsanalysisdesignsadistancemeasuretominimizethevariancenotexplainedbythemaincomponents
(clusters). ButSEMwillimposepriorbeliefsonthedata,intheformofthestructureoflatentvariables. Thusdataareassumedtoclusteraroundthelatentconstructsthefactor
loadingsdeterminehowthisclusteringoccurs. SEMmodelsareoftenconstructedwithoutreferencetoclusteringintheunderlyingdatagivenaparticulardistancemeasure;itisentirely
theorydriven,thoughthisisnotinitselfabadthing. Commonfactorbiasreflectsthisdivergenceinthemodelandthedata,andifitistooextreme,mayindicatethatthedatais
incomplete,orthatthemodelismisspecified.
Commonfactorbiascanbeavoidedapriorithroughapretestoftheclusteringofindicatordata. Commonfactorbiasoccursbecauseproceduresthatshouldbeastandardpartofmodel
specificationareinpracticeleftuntilafterthedatacollectionandconfirmatoryanalysis.JreskogdevelopedPRELISforthesesortsofpretestsandmodelrespecifications. Ifthisclustering
showsthattheindicatorsareprovidinginformationonfewervariablesthantheresearcherslatentSEMcontains,thisisanindicationthatmoreindicatorsneedtobecollectedthatwill
provide(1)additionalinformationaboutthelatentconstructsthatdontshowupintheclusteranalysis;and(2)additionalinformationtosplitoneexploratoryfactorintothetwoormore
latentconstructstheresearchneedstocompletethehypothesizedmodel. Inexploratoryfactoranalysis,thetwoteststhataremostusefulforthisaretheKaiser(1960)criterionthat
retainsfactorswitheigenvaluesgreaterthanone(unlessafactorextractsatleastasmuchinformationastheequivalentofoneoriginalvariable,wedropit)andthescreetestproposedby
Cattell(1966)thatcomparesthedifferencebetweentwosuccessiveeigenvaluesandstopstakingfactorswhenthisdropsbelowacertainlevel. Ineithercase,thesuggestedfactorsarenot
necessarilythelatentfactorsthattheresearcherstheorywouldsuggestrathertheyaretheinformationthatisactuallyprovidedinthedata,thisinformationbeingthemainjustification
forthecostofdatacollection. Soinpractice,eithertestwouldsetamaximumnumberoflatentfactorsintheSEMifthatSEMistobeexploredwithonesownparticulardataset.
WhenSEMarebuiltaroundvalidrealworldconstructs(eveniftheseareunobservable)thealgorithmsproposedinthispaperimposeonlyweakadditionalassumptionsontheindicatorsand
latentvariablesinordertocomputesamplesizesadequateforestimation.OurlimitedapplicationtoawindowofISandecommercepublicationshasshownthatconcernsarewarranted
concerningexistingSEMsamplesizecalculationsandweneedtoremainsuspiciousofconclusionsreachedinstudiesbasedoninadequatesamplesizes. Furthermore,alargenumberof
studiesinoursampledevisedtheirtestswithoutfirstcommittingtominimumeffectsizethattheyweretryingtodetect,orindicatedinportionofnonresponseinsurveys. Itisclearthat
journalrefereesneedtobeginaskingforsurveyresponse,minimumeffectsize andajustificationofthesamplesize. Byincorporatingthesesuggestions,itisarguedthattheresearch
communitywillenhancethecredibilityandapplicabilityoftheirresearch,withacommensurateimprovedimpactandinfluenceinbothindustryandacademe.
APPENDIXA:SAMPLEADEQUACYINASETOFECOMMERCEANDMISSEMSTUDIES
INSERTAPPENDIXA*******************************************
REFERENCES
Anderson,T.Originsofthelimitedinformationmaximumlikelihoodandtwostageleastsquaresestimators.JournalofEconometrics127,2005,116.
Anderson,T.andRubin,H..Estimatoroftheparametersofasingleequationinacompletesystemofstochasticequations.AnnalsofMathematicalStatistics20,1949,4663.
11
Anderson,T.andRubin,H. Theasymptoticpropertiesofestimatesoftheparametersofasingleequationinacompletesystemofstochasticequations.AnnalsofMathematicalStatistics
21,1950,57082.
BalakrishnanN.andC.R.Rao OrderStatistics:Applications,ser.Handbookofstatistics;v.17.NewYork:Elsevier,1998.
Barclay,D.W.,Higgins,C.,&Thompson,R..Thepartialleastsquares(PLS)approachtocausalmodeling:Personalcomputeradaptationanduseasanillustration.TechnologyStudies,2(2),
1995,285309.
Basmann,R. Ageneralizedclassicalmethodoflinearestimationofcoefficientsinastructuralequation.Econometrica25,19577783.
Bollen,K.A. Structuralequationswithlatentvariables.NewYork:Wiley,1989,p.268
Boomsma,A RobustnessofLISRELagainstsmallsamplesizesinfactoranalysismodels,inKGJoreskogandHWold(eds)Systemsunderindirectobservations,Causality,structure,prediction
(part1)1982,pp149173,Amsterdam:NorthHolland.
BrandA,BradleyMT,BestLA,StoicaG Accuracyofeffectsizeestimatesfrompublishedpsychologicalresearch".PerceptualandMotorSkills106(2)2008.645649
Browne,M.W.,andCudeck,R. Singlesamplecrossvalidationindicesforcovariancestructures.MultivariateBehavioralResearch,24,1989,445455.
CattellRB Handbookofmultivariateexperimentalpsychology1966RandMcNallyChicago
Chin,W.W. Thepartialleastsquaresapproachtostructuralequationmodeling.InG.A.Marcoulides(Ed.),ModernMethodsforbusinessresearch(pp.295336).Mahwah,1998,New
Jersey:LawrenceErlbaumAssociates.
Chin,W.W.,andNewsted,P.R.StructuralEquationModelinganalysiswithSmallSamplesUsingPartialLeastSquares.InRickHoyle(Ed.),StatisticalStrategiesforSmallSampleResearch,
SagePublications,1999,pp.307341
Cochran,WG SamplingTechniques,3rdEdition1977NewYork:Wiley
Cohen,J StatisticalPowerAnalysisfortheBehavioralSciences(seconded.)1988,LawrenceErlbaumAssociates
DhrymesPJ.,R.Berner,D.CumminsAComparisonofSomeLimitedInformationEstimatorsforDynamicSimultaneousEquationsModelswithAutocorrelatedErrorsEconometrica,Vol.42,
No.2,1974,pp.311332
DhrymesP.DistributedLags:problemsofestimationandformulation,SanFrancisco:Holden Day,1971
Dhrymes,PJ. EconometricsStatisticalFoundationsandApplications,NewYorkEvanstonandLondon(Harper&Row),1970,p.53
Ding,L.,Belicer,W.F.andHarlow,LL Theeffectsofestimationmethods,numberofindicatorsperfactorandimpropersolutionsonstructuralequationmodelingfitindices,Structural
EquationModeling,2,1995,119144
Farebrother,R. FittingLinearRelationships:AHistoryoftheCalculusofObservations17501900.1999,NewYork:Springer.
Fhr,K. ComparisonofLISRELandPLSEstimationMethodsinLatentVariableModels.IntroducingLatentVariablesintoEconometricModels,ManuscriptSFB303,UniversityofBonn,
Bonn,1989
Fisher,R.A. StatisticalMethods,ExperimentalDesign,andScientificInference.NewYork:OxfordUniv.Press,1990.
Fornell,ClaesandDavidF.Larker EvaluatingStructuralEquationModelswithUnobservableVariablesandMeasurementError,JournalofMarketingResearch181981,3950.
Gerbing,D.W.,&Anderson,J.C. Theeffectsofsamplingerrorandmodelcharacteristicsonparameterestimationformaximumlikelihoodconfirmatoryfactoranalysis.Multivariate
BehavioralResearch,20,1985, 255271.
Gibbons,J.D.andS.Chakraborti,NonparametricStatisticalInference,3rded.NewYork:MarcelDekker,1992.
Goodhue,Dale WilliamLewis,RonaldThompson,StatisticalPowerinAnalyzingInteractionEffects:QuestioningtheAdvantageofPLSwithProductIndicators(ResearchNote),Information
SystemsResearchVol.18,No.2,2007,pp.211227
Goodhue,D.WilliamLewis,RonThompson,"PLS,SmallSampleSize,andStatisticalPowerinMISResearch,"HICSS,vol.8,pp.202b,Proceedingsofthe39thAnnualHawaiiInternational
ConferenceonSystemSciences(HICSS'06)2006
Haavelmo,T. TheStatisticalImplicationsofaSystemofSimultaneousEquationsEconometrica11,1943,112.
Hea,Q.and H.N.NagarajabCorrelationEstimationUsingConcomitantsofOrderStatisticsfromBivariateNormalSamples,CommunicationsinStatisticsTheoryandMethods,Volume38,
Issue12,January2009,pages20032015
Hoeffding,W. Aclassofstatisticswithasymptoticallynormaldistribution.Ann.Mathemat.Statist.19,1948,293325.
Jreskog,K.G.andSrbom,D.,LISREL8User'sReferenceGuide,Chicago:ScientificSoftwareInternational,1996.
Joreskog,K.G. Ageneralmethodforanalysisofcovariancestructures,Biometrika,57,1970,239251.
Kahai,S.S.andCooper,R.B. ExploringtheCoreConceptsofMediaRichnessTheory:TheImpactofCueMultiplicityandFeedbackImmediacyonDecisionQuality,JournalofManagement
InformationSystems,20,1,2003263299
12
Kendall,MandJ.D.Gibbons RankCorrelationMethods,5thed.NewYork:OxfordUniv.Press,1990.
Kish,L SurveySampling,1995,NewYork:Wiley
Koopmans,T.andHood,W.Theestimationofsimultaneouslineareconomicrelationships.InStudiesinEconometricMethod,ed.W.HoodandT.Koopmans.CowlesFoundationMonograph
14.1953NewHaven:YaleUniversityPress.
Lohr,SL.Sampling:DesignandAnalysis.1999Duxbury
MariD.D.andS.Kotz, CorrelationandDependence.London,U.K.:ImperialCollegePress,2001.
Marsh,H.W.,Balla,J.R.,&McDonald,R.P.Goodnessoffitindexesinconfirmatoryfactoranalysis:Theeffectofsamplesize.PsychologicalBulletin,103,1988 391410.
Marsh,H.W.,Balla,J.R.,&Hau,K.T. Anevaluationofincrementalfitindices:Aclarificationofmathematicalandempiricalproperties.InG.A.Marcoulides&R.E.Schumacker(Eds.),
Advancedstructuralequationmodeling:Issuesandtechniques(pp.315353).1996,Mahwah,NJ:LawrenceErlbaumAssociates,Inc.
Marsh,H.W.,Hau,K.T.,Balla,J.R.,&Grayson,D..Ismoreevertoomuch?Thenumberofindicatorsperfactorinconfirmatoryfactoranalysis.MultivariateBehavioralResearch,33,1998,
181220.
Marsh,H.W.andMBailey Confirmatoryfactoranalysesofmultitraitmultimethoddata:AcomparisonofalternativemodelsAppliedPsychologicalMeasurement,Vol.15,No.1,4770
(1991)
Podsakoff,P.M.andD.W.OrganSelfreportsinorganizationresearch:Problemsandprospects,JournalofManagement,12,1986, 531544
Sargan,J. Estimationofeconomicrelationshipsusinginstrumentalvariables.Econometrica67,1958,55786.
Schechtman,E.,Yitzhaki,S. AmeasureofassociationbasedonGinismeandifference.Commun.Statist.Theor.Meth.16,1987,207231.
Schneewei,H. ModelswithLatentVariables:LISRELversusPLS,in:ContemporaryMathematicsVol.112(1990),p.3340
Schneewei,H. ModelswithLatentVariables:LISRELversusPLS,in:StatisticaNeerlandica45(1991),p.145157
Schneewei,H. ConsistencyatLargeinModelswithLatentVariables,in:K.Hagen,D.J.Barthdomew,M.Deistler,StatisticalModellingandLatentVariables,Elsevier(1993),p.299320
SnedecorandCochran StatisticalMethods,8thed.1989Ames:IowaU.Press
Tanaka,J.S.Howbigisbigenough?:Samplesizeandgoodnessoffitinstructuralequationmodelswithlatentvariables.ChildDevelopment,58,1987,134146.
Tanaka,J.S.Multifacetedconceptionsoffitinstructuralequationmodels.InK.A.Bollen&J.S.Long(Eds.),Testingstructuralequationmodels(pp.1039),1993,NewburyPark,CA:Sage.
Theil,H. Estimationandsimultaneouscorrelationincompleteequationsystems.1953b,TheHague:CentralPlanningBureau.
Theil,H. EconomicForecastsandPolicy,2ndedn.1961,Amsterdam:NorthHolland.
Thomas,D.R.,Lu,I.R.R.&Cedzynski,M.Partialleastsquares:Acriticalreviewandapotentialalternative. ProceedingsoftheAnnualConferenceofAdministrativeSciencesAssociationof
Canada,ManagementScienceDivision,Toronto,2005
Velicer,W.F.,andFava,J.L. Effectsofvariableandsubjectsamplingonfactorpatternrecovery.PsychologicalMethods,3,1998,231251.
Westland,J.C.andW.K.SeeTo TheShortrunPricePerformanceDynamicsofMicrocomputerTechnologies,ResearchPolicy,Volume36,Issue5,2007,Pages591604
Wilkinson,Leland;APATaskForceonStatisticalInferenceStatisticalmethodsinpsychologyjournals:Guidelinesandexplanations".AmericanPsychologist54,1999,594604
Wold,H. "TheFixPointApproachtoInterdependentSystems:ReviewandCurrentOutlook,"inH.Wold(Ed.),TheFixPointApproachtoInterdependentSystems,1981,Amsterdam:North
Holland,135.
Wold,Herman ResponsetoD.A.Freedman,JournalofEducationalStatistics,Vol.12,No.2,1987,pp.202205
Wright,S. Correlationandcausation,JournalofAgriculturalResearch,20,1921557585
13
APPENDIXA:SAMPLEADEQUACYINASETOFECOMMERCEANDMISSEMSTUDIES
Studies Latent Indicator Sample MinimumEffect idkcorrected Samplebound Samplebound SampleSize Studysample
Variables Variables Points Observed section4 section3 lowerbound ()orabove
14
15
16
Factor Analysis
Exploratory Factor PLS-SEM through
(PCA) through
Analysis iterated OLS
iterated OLS
Lawley (1940) Wold (1978)
Wold (1966)
Model Specification Searches
Confirmatory Factor
Path Analysis LISREL-SEM
Analysis
Wright (1921) Jreskog (1969)
Jreskog (1969)
3SLS and
Systems of Linear Instrumental Variables
full-information
Equations Estimation and 2SLS
regression SEM
Koopmans (1950) Theil (1953)
Zellner (1962)
FIGURE2:DEVELOPMENTOFSTRUCTURALEQUATIONMODELESTIMATION
FIGURE2:ANEXAMPLEOFASTRUCTURALEQUATIONMODELWITHSIXLATENTVARIABLESANDFIVECORRELATIONS
17
FIGURE3:THESEMEXAMPLEINFIGURE2WITHALLPOSSIBLEPAIREDLINKSSHOWN
18
FIGURE4:SIGNIFICANCEANDPOWERFORTHEMINIMUMEFFECTTHATNEEDSTOBEDETECTED
Positive Correlation Negative Correlation
5
4
4
4
3
3
3
x[,2]
x[,2]
x[,2]
x[,2]
x[,2]
2
2
2
1
1
1
1
0
0
0
0
0
-1
-1
-1
-1
-1
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
x[,1]
x[,1] x[,1] x[,1] x[,1]
5
5
5
5
4
5
4
4
4
3
4
3
3
3
2
3
x[,2]
x[,2]
x[,2]
x[,2]
2
2
x[,2]
1
2
2
1
1
0
1
0
0
-1
-1
0
-1
-1
-2 -1 0 1 2 3 4
-2 -1 0 1 2 3 4
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
x[,1]
-2 -1 0 1 2 3 4 x[,1]
x[,1] x[,1]
x[,1]
19
1 0 and 1
Figure5:BivariateNormalScatterplotsfor with 500
2 0 1
50
40
30
frequency
20
10
0
40
30
frequency
20
10
0
20
Numberofparameters(indicators)intheSEM
NumberoflatentvariablesintheSEM
Computedsamplesizelowerbound
, and BivariateNormalrandomlatentvariables(andtheirrealization)intheSEM
,
: : orderstatisticsofthe , samplevalues;thefirstindexisrank,andthesecondissample
size
: :
: concomitant of the order statistic; : is the sample value associated with the :
samplevalueinthesamplepairs , .
Minimumeffectsizethatourcomputedsamplesizecandetect
UnknowncorrelationforabivariateNormalrandomvector ,
EstimatorofGinicorrelation
; MeanandstandarddeviationestimatorsforGinicorrelation
;1 Significanceandpoweroftest
TheidkcorrectedsignificancefordiscriminationsbetweenpossibleSEMlinkcombinationsat
aresolutionof
; Rejection bound at significance and nonrejection bound at power 1 ; we substitute
thequantilefunction(inversecumulativeNormal) for incalculations
TABLE2:NOTATIONUSEDINTHEPAPER
21