You are on page 1of 6

Performing Factor Analysis

One type of analysis that you may find useful in analyzing categorical survey data is factor analysis. However, this is a more advanced topic of marketing research. It will not be necessary to perform a factor analysis for the project you do in this class. Factor analysis is a method of condensing multiple, scale-response survey questions down into a fewer number of variables that are easier to work with and perform other types of analysis on. The idea behind factor analysis is that a survey asks several different questions that may be related. Factor analysis identifies patterns of relationships among your variables and allows you to condense all of the related variables down into one new variable, or factor, that encompasses the results of all the original individual variables. For example, this may allow you to reduce the data you are working with from 100 variables down to only 10 variables that will provide the same information in your research analysis. This tutorial will present a beginning guide to conducting and analyzing factor analysis. Performing Factor Analysis Factor analysis can be performed using Microsoft Excel by a process called correlation. However, it is much easier to perform and understand a factor analysis by using SPSS. The steps for performing factor analysis using SPSS are outlined below: 1. 2. 3. 4. 5. Enter your data into an SPSS data editor as described in the SPSS Tool Kit tutorial. Click on Analyze from the list at the top of the Data Editor. Select Data Reduction from the drop down menu. Click on Factor from the drop down menu (it is the only choice on this menu). In the window that pops up, highlight the variables from the list on the left that you want to include in the analysis and click on the arrow button to move the variables to the list on the right that is labeled Variables. Note that you can highlight and move more than one variable at a time by holding down the Ctrl key on the keyboard while selecting. 6. Click on the box at the bottom of the window labeled Rotation. 7. In the window that pops up, select Varimax from the Method options and then click on Continue. 8. Click on the box at the bottom of the Factor Analysis window that is labeled Options. 9. In the window that pops up, check Sorted by Size under the Coefficient Display Format and then click on Continue. 10. Click on the OK box of the Factor Analysis window. Analyzing the Results After performing these steps, several new tables appear in the SPSS Viewer. There are only two of these tables that you need to pay attention to, the one is labeled Total Variance Explained and the other is labeled Rotated Component Matrix. First look at the table Total Variance Explained. The first column of this table is labeled Component. This column counts how many variables you entered into the analysis. This is the maximum number of factors you can have. The next column is labeled Initial Eigenvalues

and is divided into three sub-columns. The first sub-column is labeled Total. This lists the eigenvalue associated with each possible factor. The values in this column are listed in descending order, starting with the biggest number at the top to the smallest number at the bottom of the column. As a general rule, you want to choose as many factors as there are components with eigenvalues greater than one. An eigenvalue is a term from linear algebra. Dont worry about understanding what an eigenvalue is, just check to see how many components have a value greater than one. The next sub-column under Initial Eigenvalues is labeled % of Variance. This column tells you how much of the information from your original variables can be explained by each factor, or component. The third sub-column under Initial Eigenvalues is labeled Cumulative %. This column tells you how much of the information from your original variables can be explained by all of the factors chosen. For example, if you chose to have three factors, the % of Variance column would tell you how much of the information from your original variables is explained by each factor, but the Cumulative % column would tell you how much of the information from your original variables is explained by all three factors combined. A general rule of thumb is that you want to choose enough factors to have a cumulative variance of at least 75%. The rest of the columns contain information that you do not need to worry about. Example: Suppose you run a factor analysis that gives you the following Total Variance Explained table. (Note that the entire table is not shown in this example, just the part that has been explained in this tutorial.)
Total Variance Explained Initial Eigenvalues % of Variance Cumulative % 50.67 50.67 0 0 40.58 91.25 33.940 2.282 1.113 .687 .478 .214 3.245E-02 2.676E-16 -3.893E-16 -3.484E-15 4 95.19 4 97.47 6 98.58 9 99.27 6 99.75 4 99.96 8 100.00 0 100.00 0 100.00 0 100.00 0

Component 1 2 3 4 5 6 7 8 9 10 11 12

Total 6.080 4.870 .473 .274 .134 8.244E-02 5.734E-02 2.568E-02 3.894E-03 3.212E-17 -4.67E-17 -4.18E-16

Extraction Method: Principal Component Analysis.

Looking at the Total column of this table, it can be seen that only the first two components have eigenvalues greater than one. Looking at the % of Variance" column it can be seen that the first component explains 50.67% of the information from the original variables and the second component explains 40.58% of the information from the original variables. Looking at the Cumulative % column shows that choosing two components would explain 91.25% of the information from the original variables. Therefore looking at this output from a factor analysis

you would conclude to use two factors and that these two factors could still explain over 91% of the information contained in the original variables. SPSS is set by default to choose the number of factors associated with the number of components having eigenvalues greater than one. In this example, SPSS would automatically choose two factors. This is important to remember for the rest of the analysis explanation contained in this tutorial. By manually checking the Cumulative % you could see that in this example more than 75% of the variance is explained by the components with eigenvalues greater than one. Generally this will be the case, but you should still always manually check the Cumulative % value. The next step of the analysis is to look at the Rotated Component Matrix table. The first column of this table lists the names of the variables you originally entered into the analysis. The second column is titled Component. The sub-columns of this column are numbered to match the components from the Total Variance Explained table that had eigenvalues greater than one. These are the factors. Each factor has a list of numbers associated with each of your original variables. These values represent how well each of the original variables fits into each of the new factors. The values range from 1 to 1. The closer a number is to 1 or 1, the better that variable fits into that factor. A value of 1 means that factor explains 100% of the information from that variable. A value of 1 means that factor explains 100% of the information from that variable but explains the exact opposite of that variable. This will be important to remember when calculating the factor scores. A value of 0 means that factor does not explain the information contained in that variable. Generally, a value bigger than 0.5 or smaller than 0.5 means that variable fits well with that factor. Also, you typically want to include each variable with only one factor. However, this is not always true. Suppose you found one variable had scores of 0.9 or bigger for two different factors. You may want to include this variable in both factors, but that is up to you to decide by comparing that variable to the other variables that fit into each factor. Example: In the factor analysis example used earlier, suppose you got the following Rotated Component Matrix.
a Rotated Component Matrix

Component 1 2 TALK .976 2.123E-02 OUTGOING .976 2.123E-02 INFLUEN .960 -7.65E-03 DECISIVE .955 -7.47E-02 BOLD .940 -8.17E-02 INSIST .931 -.212 CAREFREE 5.688E-02 .965 OPEN -4.28E-02 .960 SPONT -8.85E-02 .959 FRIENDLY -.116 .956 HUMOR -4.45E-02 .954 EASY -7.70E-02 .881 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations.

Looking back at the Total Variance Explained table shows that there were two components with eigenvalues greater than one. As a result, the Rotated Component Matrix also contains two components, or factors. Looking at the results, you can see that the variables TALK, OUTGOING, INFLUEN, DECISIVE, BOLD, and INSIST all have values greater than 0.5 for the first factor (in fact they are all greater than 0.9). Therefore, these variables all fit into this first factor. The remaining variables have large values for the second component, indicating that the remaining six variables fit into the second factor. Calculating Factor Scores Now that you have determined how many factors you need and which of your variables fit into each factor, the final step of factor analysis is to calculate factor scores. A respondents factor score is calculated by taking an average of their scores for each of the separate variables that make up the factor. The following steps demonstrate how to calculate factor scores using SPSS. 1. Click on Transform from the list at the top of the data editor. 2. Click on Compute from the top of the drop down list that appears. A new window appears that allows you to compute the values for new variables. 3. In the field labeled Target Variable, type a factor name. Choose a factor name that describes all of the variables that make up the factor. Remember that variables in SPSS can only be eight characters long. 4. In the field labeled Numeric Expression you enter the formula for computing the factor score from the average of the scores for the separate variables. Do this by following these steps: a. From the buttons that looks like a calculator keypad click on the parenthesis button (). b. From the list of variables highlight one of the original variables that fit into this factor and click on the arrow button. This variable name now appears in between the parenthesis in the Numeric Expression field. c. Click on the + button. d. Continue steps b. and c. until all of the variables that make up the factor are included in between the parenthesis, but do not include a + sign after the last variable. e. Move the cursor outside of the parenthesis and click the divide / button. f. Count the number of variables entered into the factor and type this number after the / symbol in the Numeric Expression field. 5. Click the OK button. The data editor window will have a new variable appear that has the name of your factor. The factor score for each respondent is automatically calculated. Repeat these steps for calculating factor scores for each factor that you have. Example: From the factor analysis example discussed above, there was one factor made up of the variables TALK, OUTGOING, INFLUEN, DECISIVE, BOLD, and INSIST. Perhaps a good factor name for a variable that describes someone who is talkative, outgoing, influential, decisive, bold, and insistent, would be forceful. Suppose that your survey asked respondents

to rate themselves on a scale from one to five (five being the highest) for each of these variables. One of your respondents gave the following answers: Talkative: Outgoing: Influential: Decisive: Bold: Insistent: 4 4 5 5 5 5

After completing the steps for calculating a factor score you would get the following factor score for this respondent: Forceful: 4.67

You can calculate this yourself by averaging the responses for each of the six variables, (4 + 4 + 5 + 5 + 5 + 5) / 6 = 4.67. Another way to think of this is that on a scale from one to five (five being the highest) if this respondent were rated as to how forceful they are, they would have a response of 4.67. Using factor analysis you now have only one new variable (forceful) that represents the same information as was previously contained in the six original variables. There is one additional step you need to perform before the steps in the above list if the Rotated Component Matrix contains a variable that is negatively related to a factor. For example, suppose the score for the variable TALK in the above Rotated Component Matrix had a value of -.976 instead of .976 for the first component. The variable would still fit in with this factor, but before calculating the factor scores you need to calculate a new variable that contains the opposite information of the TALK variable. This can be done following these steps: 1. Click on Transform from the list at the top of the data editor. 2. Click on Compute from the top of the drop down list that appears. A new window appears that allows you to compute the values for new variables. 3. In the field labeled Target Variable, type a new variable name. For example, if you wanted a variable that contained the opposite information from the Talkative variable, perhaps you would call the new variable Quiet. 4. In the numeric expression field, type one plus the maximum score possible for the original variable minus the name of the original variable. For example, for the talk variable (with a maximum score of 5 for very talkative, you would enter 1 + 5 talk). 5. Click the OK box at the bottom of the window. On the data editor a new variable will appear with the name of your Target Variable. This new variable should have the exact opposite score as the original variable. For instance, if someone had responded to the TALK question with a five, indicating they are very talkative, you now would have a new QUIET variable with a score of one, indicating that respondent is not very quiet. If you were to re-run your factor analysis with the new variable instead of the original variable, you should end up with the exact same results except the new variable would have a

positive value in the Rotated Component Matrix instead of a negative value. At this point you are ready to calculate your factor scores using the procedure described above. This tutorial may make factor analysis seem like more work and trouble than it is worth. However, once you have performed factor analysis a few times you will find it is much easier to perform a factor analysis than it is to explain in writing how to perform a factor analysis. In some cases, you may find that factor analysis does not provide any clear factors for you data (unlike the example used in this tutorial). In these cases you may just have to rely on your original data. If your factor analysis does provide you with useful factors, the remainder of your data analysis will be much easier. As demonstrated by the example in this tutorial, you may be able to reduce six variables (or more) down to a single factor. This will make other types of analysis, such as crosstabs and cluster analysis, much easier to perform.