You are on page 1of 6

To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management.

Sage Publications

Factor Analysis
Introduction
Factor nalysis attempts to identify the underlying structure in a data set by defining a small number of factors that capture the variation in the collected data. Factor analysis assumes that relationships bet!een variables are due to the effects of underlying factors and that observed correlations are the result of variables sharing common causes. Describing a data set in terms of factors "or latent variables as they are sometimes called# is often useful at a theoretical level as it may identify the underlying processes !hich determined the correlations among the variables. This often allo!s a simpler and clearer interpretation of the relationships in the data. Factor analysis is also useful at a more practical level as it reduces the number of variables in the data set ma$ing model selection easier% particularly !ith respect to regression models. common e&ample of the use of factor analysis is in the identification of attitudes through 'uestionnaire research. (n a large 'uestionnaire a number of 'uestions are li$ely to address similar issues !hich !ill lead to ans!ers !hich are correlated. For e&ample% ans!ers of )agree) or )strongly agree) to 'uestions such as )(t is important to preserve one)s culture)% )( am prepared to die for my country) and )(t is good to ta$e part in our traditional festivals) may lead one to conclude that the )patriotism' factor is present. Here% patriotism is not a single measurable entity but a construct !hich is derived from the measurement of other% directly observable variables "the individual 'uestions in the 'uestionnaire#. The patriotism factor e&plains some of the relationships bet!een the variables and its identification can simplify the description of the data and help in our understanding of a comple& relationship. Postulating the e&istence of something called )patriotism) through the identification of the factor% e&plains the observed correlations bet!een responses to numerous and varied situations.

Key Features

Factor analysis attempts to identify the underlying structure in a data frame in terms of its correlational structure. Factor analysis can be used to reduce a large number of variables to a smaller number of factors. Factor analysis can help to identify the under*lying causal structure in the data set. Factor analysis can help in the further analysis of the data by reducing the number of variables and simplifying model selection

General Principles of Factor Analysis


+esearch that attempts to investigate !hy consumers select certain supermar$ets might as$ consumers to rate a number of 'uestions: For e&ample% ,Ho! important is is that the store has par$ing facilities-. ,Ho! important is a home*delivery service-. ,Ho! important is it that the store has cash*points available-. ,Ho! important is it that the store has a cafeteria-.

,Ho! important is it that the store has a petrol station-. ,Ho! important is is that the store has a friendly atmosphere-. ,Ho! important is it that the store has promotions-. ,Ho! important is it that the store offers value for money-. The multiple variables recorded in such a 'uestionnaire might actually be a reflection of a simpler underlying structure in the data. For e&ample% those consumers !ho use a car !ill more li$ely rate par$ing facilities and petrol stations as being important% !hilst rating a home delivery service and value for money as less important and having no particular preference for cafeterias and value for money. Similarly% those consumers !ith little money are li$ely to rate value for money and the availability of promotions as being important% !hilst rating a home*delivery service as being unimportant and having no particular preference for the atmosphere in the store. n individual)s response to a 'uestion is li$ely to depend on a number of underlying factors. This is represented in /'uation 0 !here the importance of car par$ing facilities is given as a function of a number of hypothetical factors. These factors are not gathered data% but are inferred from the relationships in the data set.
(mportance of a car par$ = 1 F:attitude to car use 2 F:environmental concern 3 F:life style .. . k F:monetary concern /'uation 1
!here ,(mportance of a car par$. is a measured variable% 1 to k are regression !eights and "F:# are factors derived from the data.

The response to the 'uestion ,Ho! important is it that the store has par$ing facilities-. may be related to the person)s attitude to car use% their environmental concerns% their life*style% money considerations and other factors. Some of these factors !ill be more highly related to the 'uestion that others1 information that is provided in the 1 to k coefficients. /'uation 0 sho!s a recorded variable represented as a function of unrecorded factors. (t is easy to also represent an unrecorded factor as a function of the recorded variables. This is sho!n in /'uation 2% !here a factor is defined in terms of the k 'uestions as$ed in the 'uestionnaire. F:Attitude to car use = 1 Ho! important is it that the store has par$ing facilities- 2 Ho! important is a home*delivery service- 3 Ho! important is it that the store has a cafeteria- .. . k Ho! important is it that the store offers value for money- /'uation 2
!here ,F:Attitude to car use. is a factor derived from the data% 1 to k are regression !eights and "Ho! important...# are measured variables.

The )attitude to car use) factor in /'uation 2 is identified through the relationships bet!een the variables in the data. Further factors can be identified by investigating the relationships in the data that are uncorrelated !ith those factors that have already been identified. (n practice% k factors can be derived from k variables% the difference being that the factors represent the correlational structure in the data !hereas the variables represent the responses to the individual 'uestions. The process of the initial identification of these factors "or components# is e&plained in detail in the chapter on

principal components analysis and !ill not be discussed further here. F is based on correlations% so continuous data is% technically spea$ing% needed. Ho!ever% this re'uirement is often rela&ed so that ordered data% particularly 3*point 4i$ert*scales can be used "see Hutcheson and Sofroniou% 0555% for a full discussion of this issue#. (ndeed% in management research% the use of factor analytic techni'ues is used predominantly on ordered responses collected from 'uestionnaire research. The components that represent the correlational structure in the data may be difficult to interpret meaningfully% as individual components are often associated !ith a number of variables. (t is common that an individual component !ill load highly on many of the variables in the data set !hich ma$es it difficult to identify the underlying structure. For e&ample% Table 0 sho!s t!o components that have been derived from an analysis of educational data "see Hutcheson and Sofroniou% 0555#. The table sho!s a list of 6 variables "the recorded data# and t!o components that have been derived using principal components analysis. The strength of the relationship bet!een the variables and components is sho!n in the factor loading scores. From the table it can be seen that component 0 is more highly related to all of the variables than component 2% ma$ing it difficult to identify any meaningful distinction bet!een the components. Table 0: T!o components derived from 07 correlated variables
Component 1 Variables rticulation <omprehension <oordination Dra!ing Memory Motor S$ill Sentence <ompletion =riting 7.890 7.;88 7.629 7.603 7.;85 7.8;: 7.;90 7.;58 7.:;5 7.23: *7.:50 *7.:69 7.066 *7.9:0 7.059 *7.:55 Component 2

(t is useful to illustrate the relationships sho!n in Table 0 in a graphic. Figure 0 sho!s the component loadings and clearly sho!s the variables clustering into t!o groups "identified here as physical dexterity and linguistic competence#. This is 'uite obvious in this graphic% but is not obvious from the information in the table.

Figure 0:

graphical representation of the relationship bet!een components and variables

The t!o groups of variables can be identified by redistributing the component loadings so that individual components load highly on relatively fe! variables. This can be considered as a process of rotating the a&es of the graphic in Figure 0% so that the a&es are dra!n closer to the clusters. This process is called rotation and is illustrated in Figure 2. >oth of the a&es of the graph in Figure 0 can be rotated so that they remain at 57 degrees to each other "orthogonal# or they can be rotated independently "obli'ue#. ?rthogonal rotation represents factors that are uncorrelated% !hereas obli'ue rotation represents factors that are correlated. There are a number of popular methods for identifying the rotations used including ones that minimi@e the number of variables !hich load highly on factors in order to enhance the interpretability of the factors% to methods that minimi@e the number of factors in order to provide simpler interpretations "refer to Aim and Mueller% 0559% >ro!ne "2770# and >ernaards and Bennrich "2778#% for discussions of rotation techni'ues#. lthough there are many rotation techni'ues available% in practice% the different techni'ues tend to produce similar results !hen there is a large sample and the factors are relatively !ell defined "Fava and Celicer% 0552#. pplying rotations often results in a clearer differentiation bet!een the components and enables the factors to be identified from the factor loadings. Table 2 sho!s the loadings for the e&ample above before rotation and after an orthogonal and an obli'ue rotation !ere computed.

Figure 2:

graphical representation of orthogonal and obli'ue rotation methods

Table 2: Factor loadings for unrotated components and rotated factors


Variables rticulation <omprehension Memory Sentence <ompletion <oordination Dra!ing Motor S$ill =riting Unrotated components
7.890 7.;88 7.;85 7.;90 7.629 7.603 7.8;: 7.;58 7.:;5 7.23: 7.066 7.059 *7.:50 *7.:69 *7.9:0 *7.:55

Ortho onal factors


7.69: 7.628 7.;;5 7.887 7.235 7.280 7.0;5 7.2;: 7.765 7.2;9 7.:2; 7.:26 7.633 7.69: 7.;;8 7.6:9

Obli!ue factors
7.5:8 7.63; 7.;65 7.8:0 7.706 7.723 *7.7:2 7.795 7.039 *7.782 *7.0:6 *7.036 *7.657 *7.6;8 *7.6:8 *7.685

Table 2 clearly sho!s the orthogonal and obli'ue rotation methods have identified the underlying physical de&terity and linguistic competence factors. The process of rotation has identified the clusters in the loading matri& and enabled the underlying structure in the data to be identified. (t

should be noted that !hilst the structure in these data may be 'uite obvious "particularly as restricting the analysis to t!o factors enabled simple graphics to be used#% identifying the underlying structure !hen there are greater numbers of components and factors can be difficult% ma$ing the use of factor analysis essential for comple& data structures.

Conclusion
Factor analysis is a very common techni'ue in management research and is used e&tensively in the analysis of 'uestionnaire data. (t is particularly useful in identifying the underlying structure in data sets and can contribute theoretical insights into the research area. Factor analysis can also reduce the number of variables needed to represent relationships "it is commonly referred to as a data* reduction techni'ue# and thereby provide benefits !hen the data are used for model*building "for e&ample% generalised linear models#.

Further "eadin
>ernaards% <. . and Bennrich% +. (. "2778#. Dradient ProEection lgorithms and Soft!are for rbitrary +otation <riteria in Factor nalysis. Educational and Psychological easurement! #$%$&:8;8*858. >ro!ne% M.=. "2770#. n ?vervie! of nalytic +otation in /&ploratory Factor nalysis. ultivariate "ehavioral #esearch. '#%1&:000*037. Fava% B. 4. and Celicer% =. F. "0552#. n empirical comparison of factor% image% component% and scale scores. ultivariate "ehavioral #esearch% 2(::70*:22. Hutcheson% D. D.% and Sofroniou% F. "0555#. The Multivariate Social Scientist: an introduction to generali@ed linear models. Sage Publications. Aim% B. and Mueller% <. =. "0559#. Factor Analysis: $tatistical ethods and Practical %ssues. (n M. S. 4e!is*>ec$ "editor#. Factor nalysis and +elated Techni'ues. (nternational Handboo$s of Quantitative pplications in the Social Sciences% Colume 3. Sage Publications.
Draeme Hutcheson Manchester Gniversity Fic$ Sofroniou S( +esearch% thens