This action might not be possible to undo. Are you sure you want to continue?

# SW388R7 Data Analysis & Computers II Slide 1

Principal Component Analysis: Additional Topics

**Split Sample Validation
**

Detecting Outliers Reliability of Summated Scales Sample Problems

SW388R7 Data Analysis & Computers II Slide 2

**Split Sample Validation
**

To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified. A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results. If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication.

SW388R7 Data Analysis & Computers II Slide 3

**Misleading Results to Watch Out For
**

When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same. Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings. Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings.

We do have some options when validation fails: If the problem is limited to one or two variables. If we choose this option. we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings. we can remove those variables and redo the analysis. at least 5 to 10. We might try some different random number seeds and see if our negative finding was a fluke. . Getting one or two validations to negate the failed validation and support our findings is not sufficient.SW388R7 Data Analysis & Computers II Slide 4 When validation fails If the validation fails. Randomly selected samples are not always representative. we should do a large number of validations to establish a clear pattern.

we redo the analysis. If there is no change in communality or factor structure in the solution. SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3. If our factor solution changes. omitting the cases that were outliers. restore full data set before any further calculations . it implies that there outliers do not have an impact. If we find outliers in our analysis. we will have to study the outlier cases to determine whether or not we should exclude them.0 as outliers.SW388R7 Data Analysis & Computers II Slide 5 Outliers SPSS calculates factor scores as standard scores. After testing outliers.

60 or greater for exploratory research). we have support on the interval consistency of the items justifying their use in a summated scale. . To verify that the variables for a component are measuring similar entities that are legitimate to add together.SW388R7 Data Analysis & Computers II Slide 6 Reliability of Summated Scales One of the common uses of factor analysis is the formation of summated scales.70 or greater (0. we compute Chronbach's alpha. where we add the scores on all the variables loading on a component to create the score for the component. If Chronbach's alpha is 0.

or an incorrect application of a statistic? Assume that there is no problematic pattern of missing data. "general happiness" [happy]. True True with caution False Inappropriate application of a statistic The bold text indicates that parts to the problem that have been added this week. "condition of health" [health]. the information in these variables can be represented with 2 components and 3 individual variables. Based on the results of a principal component analysis of the 8 variables "highest academic degree" [degree]. The internal consistency of the variables included in the components is sufficient to support the creation of a summated scale. "condition of health" [health]. 1.05. and "mother's highest academic degree" [madeg]. and "attitude toward life" [life].sav.SW388R7 Data Analysis & Computers II Slide 7 Problem 1 In the dataset GSS2000. is the following statement true. 2. "father's highest academic degree" [padeg]. 3. "mother's highest academic degree" [madeg]. Validate the results of your principal component analysis by splitting the sample in two. "happiness of marriage" [hapmar]. Component 1 includes the variables "highest academic degree" [degree]. "father's highest academic degree" [padeg]. 4. Use a level of significance of 0. "spouse's highest academic degree" [spdeg]. Component 2 includes the variables "general happiness" [happy] and "happiness of marriage" [hapmar]. and "spouse's highest academic degree" [spdeg] were not included on the components and are retained as individual variables. Cases that might be considered to be outliers do not have an impact on the factor solution. . The variables "attitude toward life" [life]. using 519447 as the random number seed. false.

.SW388R7 Data Analysis & Computers II Slide 8 Computing a principal component analysis To compute a principal component analysis in SPSS. select the Data Reduction | Factor… command from the Analyze menu.

.SW388R7 Data Analysis & Computers II Slide 9 Add the variables to the analysis First. move the variables listed in the problem to the Variables list box. Second. click on the Descriptives… button to specify statistics to include in the output.

Fifth. click on the Continue button. mark the Anti-image checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. mark the Coefficients checkbox to get a correlation matrix. Second. Fourth. . one of the outputs needed to assess the appropriateness of factor analysis for the variables. Sixth. mark the Univariate descriptives checkbox to get a tally of valid cases. keep the Initial solution checkbox to get the statistics needed to determine the number of factors to extract. Third. mark the KMO and Bartlett’s test of sphericity checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables.SW388R7 Data Analysis & Computers II Slide 10 Compete the descriptives dialog box First.

. The extraction method refers to the mathematical method that SPSS uses to compute the factors or components. click on the Extraction… button to specify statistics to include in the output.SW388R7 Data Analysis & Computers II Slide 11 Select the extraction method First.

click on the Continue button. retain the default method Principal components. Second.SW388R7 Data Analysis & Computers II Slide 12 Compete the extraction dialog box First. .

.SW388R7 Data Analysis & Computers II Slide 13 Select the rotation method First. click on the Rotation… button to specify statistics to include in the output. The rotation method refers to the mathematical method that SPSS rotate the axes in geometric space. This makes it easier to determine which variables are loaded on which components.

Second. .SW388R7 Data Analysis & Computers II Slide 14 Compete the rotation dialog box First. mark the Varimax method as the type of rotation to used in the analysis. click on the Continue button.

SW388R7 Data Analysis & Computers II Slide 15 Complete the request for the analysis First. . click on the OK button to request the output.

Since some data analysts do not agree with this convention. "general happiness" [happy]. "happiness of marriage" [hapmar]. the level of measurement requirement for principal component analysis is satisfied. "condition of health" [health]. "father's highest academic degree" [padeg].SW388R7 Data Analysis & Computers II Slide 16 Level of measurement requirement "Highest academic degree" [degree]. If we follow the convention of treating ordinal level variables as metric variables. a note of caution should be included in our interpretation. "spouse's highest academic degree" [spdeg]. and "attitude toward life" [life] are ordinal level variables. . "mother's highest academic degree" [madeg].

68 1.085 FATHERS HIGHEST .848 IS LIFE EXCITING OR 100 cases.76 can be conducted on a sample that .47 . we should be cautious about its interpretation.97 1. HAPPINESS OF 1.797 DEGREE SPOUSES HIGHEST 1.532 MARRIAGE While principal component analysis CONDITION OF HEALTH 1.532 DULL Analysis N 68 68 68 68 68 68 68 68 than 50 cases.85 . but more has fewer than 1.984 DEGREE MOTHERS HIGHEST .617 set of variables is 68.96 .SW388R7 Data Analysis & Computers II Slide 17 Sample size requirement: minimum number of cases Descriptiv e Statistics Mean Std. Deviation RS HIGHEST DEGREE 1. .65 .53 .233 DEGREE The number of valid cases for this GENERAL HAPPINESS 1.

617 variables in a principal HAPPINESS OF component analysis should .085 FATHERS HIGHEST . which exceeds the requirement for the ratio of cases to variables.68 1.47 MARRIAGE be at least 5 to 1.65 .97 1.797 DEGREE SPOUSES HIGHEST 1.85 .5 to 1.53 variables.SW388R7 Data Analysis & Computers II Slide 18 Sample size requirement: ratio of cases to variables Descriptiv e Statistics Mean Std.848 IS LIFE EXCITING OR With 68 and 81. .532 1. . CONDITION OF HEALTH 1.984 DEGREE MOTHERS HIGHEST .96 .532 DULL Analysis N 68 68 68 68 68 68 68 68 the ratio of cases to variables is 8.76 .233 DEGREE The ratio of cases to GENERAL HAPPINESS 1. Deviation RS HIGHEST DEGREE 1.

677 .595 .490 1.30 are highlighted in yellow.138 .174 -.490 .410 .214 IS EX OR Correlation RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE SPOUSES HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL .208 .208 1. satisfying this requirement.008 .105 -.100 -.138 -.012 MOTHERS HIGHEST DEGREE .046 -.30 between the variables included in the analysis.514 1.246 -.000 .319 -.246 -.053 1.138 FATHERS HIGHEST DEGREE .017 -.131 -.514 . For this set of variables.282 .008 -.000 .267 .30.SW388R7 Data Analysis & Computers II Slide 19 Appropriateness of factor analysis: Presence of substantial correlations Principal components analysis requires that there be some correlations greater than 0.000 . The correlations greater than 0.174 -.392 -.267 .151 SPOUSES HIGHEST DEGREE . there are 7 correlations in the matrix greater than 0.677 1.131 -.053 -. Correlation Matrix RS HIGHEST DEGREE 1.017 -.000 .172 -.090 GENERAL HAPPINESS -.319 .046 -.000 -.000 .000 .105 -.100 .172 -.595 -.392 .410 .161 CONDITION OF HEALTH -.282 1.214 HAPPINESS OF MARRIAGE -.

086 .478 .067 -.50 -.085 -.102 -.028 -.586 .5.099 set of variables.101 DEGREE There are two anti-image MOTHERS HIGHEST matrices: the anti-image -.666 -.623 -.044 a Anti-image Covariance RS HIGHEST DEGREE FATHERS HIGHEST -.619 -.549 a -.120 -.187 -. -. the MSA for all of the -.030 a in Principal component analysis requires -.656 -.290 .014 .099 HAPPINESS OF .188 a SPOUSES HIGHEST DEGREE -.187 in the analysis.039 -.325 .052 -.014 -.067 matrix.012 -.749 -.099 .111 -.503 was greater -.028 -.623 .137 . MARRIAGE CONDITION OF HEALTH -.085 -.290 -.455 -.012 IS LIFE EXCITING OR . supporting their retention .048 -.050 -.162 DULL a.023 -.050 .030 a -.023 .638 a .076 -.028 .126 .181 .048 .108 DULL Anti-image Correlation RS HIGHEST DEGREE .087 .126 -.181 -.028 -.043 CONDITION OF HEALTH -.008 IS LIFE EXCITING OR .102 .210 .103 -.044 -.039 -.028 .210 DEGREE MOTHERS HIGHEST -.079 -.079 DEGREE covariance matrix and the SPOUSES HIGHEST -.701 a FATHERS HIGHEST -.113 -.113 MARRIAGE CONDITION OF HEALTH -.049 .018 .076 -.325 -.120 -.640 -.162 .111 . On iteration 1.024 .049 . We are interested GENERAL HAPPINESS -.024 -.058 the anti-image correlation HAPPINESS OF .058 .024 individual variables included in the-.052 IS LIFE EXCITING OR DULL .055 -.876 .102 .028 GENERAL HAPPINESS -.203 that .121 -.087 -.309 -.053 .274 anti-image correlation DEGREE matrix.161 -.203 -.137 -.108 .099 .103 -.024 .012 analysis than 0.085 for each individual variable as well as the -.734 -.503 DEGREE GENERAL HAPPINESS -.692 -.102 HAPPINESS OF MARRIAGE .018 -.274 -.008 .309 -.SW388R7 Data Analysis & Computers II Slide 20 Appropriateness of factor analysis: Sampling adequacy of individual variables Anti-image Matrices RS HIGHEST DEGREE .578 the Kaiser-Meyer-Olkin of Sampling Adequacy be greater than 0.478 -.188 -.101 .086 .043 -. Measures of Sampling Adequacy(MSA) MOTHERS HIGHEST DEGREE -.161 DEGREE SPOUSES HIGHEST -.012 Measure .511 FATHERS HIGHEST DEGREE -.055 a .085 -.476 .053 -.121 -.

.640 137. which exceeds the minimum requirement of 0. .SW388R7 Data Analysis & Computers II Slide 21 Appropriateness of factor analysis: Sampling adequacy for set of variables KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy.000 In addition. the overall MSA for the set of variables included in the analysis was 0. Chi-Square df Sig. Bartlett's Test of Sphericity Approx.640.50 for overall MSA.823 28 .

SW388R7 Data Analysis & Computers II Slide 22 Appropriateness of factor analysis: Bartlett test of sphericity KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy.640 137. Chi-Square df Sig. .000 Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity be less than the level of significance. .823 28 . Bartlett's Test of Sphericity Approx. The probability associated with the Bartlett test is <0. which satisfies this requirement.001.

487 .502 22.000 Extraction Sums of Squared Loadings Total % of Variance Cumulative % 2.149 54.272 Extraction Method: Principal Component Analysis.358 6.651 13.600 1. there were 3 eigenvalues greater than 1.469 7.888 86. .772 22.486 68.332 78.606 3.SW388R7 Data Analysis & Computers II Slide 23 Number of factors to extract: Latent root criterion Total Variance Explained Initial Eigenvalues % of Variance Cumulative % 32.333 .149 54.087 92.079 13.445 4.502 1.600 32.651 1.502 32.0. The latent root criterion for number of factors to derive would indicate that there were 3 components to be extracted for these variables.827 .631 .502 32.137 Component 1 2 3 4 5 6 7 8 Total 2. Using the output from iteration 1.161 96.772 1.394 100.079 .486 68.137 10.

469 7.137% of the total variance.358 6.502 32.394 100.772 22.600 1. .631 .079 13.079 .333 . Since the SPSS default is to extract the number of components indicated by the latent root criterion.161 96.149 54.332 78.486 68.827 . our initial factor solution was based on the extraction of 3 components.600 32.487 .000 Component 1 2 3 4 5 6 7 8 T otal 2.502 22. A 3 components solution would explain 68.272 Extracti on Sums of Squared T otal % of Vari ance Cu 2.772 1.888 86.149 1.486 Extracti on M ethod: Princi pal Com ponent Anal ysi s.445 4.502 1.SW388R7 Data Analysis & Computers II Slide 24 Number of factors to extract: Percentage of variance criterion T otal Variance Explained Initi al Ei genval ues % of Vari ance Cumul ati ve % 32. In addition. the cumulative proportion of variance criteria can be met with 3 components to satisfy the criterion of explaining 60% or more of the total variance.606 3.087 92.137 10.651 13.

715 .SW388R7 Data Analysis & Computers II Slide 25 Evaluating communalities Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE SPOUSES HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL Initial 1. so the communality value for each variable should be 0.415 Communalities represent the proportion of the variance in the original variables that is accounted for by the factor solution.000 Extraction .763 . Extraction Method: Principal Component Analysis.548 .000 1.000 1.50 or higher. .717 .000 1.000 1.000 1.000 1.711 .815 .000 1.768 . The factor solution should explain at least half of each original variable's variance.

SW388R7 Data Analysis & Computers II Slide 26 Communality requiring variable removal Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE SPOUSES HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL Initial 1.000 Extraction .000 1.415 On iteration 1.548 .715 . the communality for the variable "attitude toward life" [life] was 0. the variable should be removed from the next iteration of the principal component analysis.000 1. The variable was removed and the principal component analysis was computed again.763 .711 .815 .000 1.000 1. . Since this is less than 0. Extraction Method: Principal Component Analysis.50.415.717 .768 .000 1.000 1.000 1.

select Factor Analysis to reopen the factor analysis dialog box.SW388R7 Data Analysis & Computers II Slide 27 Repeating the factor analysis In the drop down menu. .

. highlight the life variable.SW388R7 Data Analysis & Computers II Slide 28 Removing the variable from the list of variables First. Second. click on the left arrow button to remove the variable from the Variables list box.

.SW388R7 Data Analysis & Computers II Slide 29 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis. click on the OK button. To replicate the analysis without the variable that we just removed.

477. The variable was removed and the principal component analysis was computed again.000 1.000 1.000 1. .594 . the variable should be removed from the next iteration of the principal component analysis. Extraction Method: Principal Component Analysis.623 .000 Extraction .50.000 1.477 On iteration 2.516 .638 .592 . Since this is less than 0.000 1.642 .000 1.SW388R7 Data Analysis & Computers II Slide 30 Communality requiring variable removal Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE SPOUSES HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE CONDITION OF HEALTH Initial 1. the communality for the variable "condition of health" [health] was 0.

SW388R7 Data Analysis & Computers II Slide 31 Repeating the factor analysis In the drop down menu. select Factor Analysis to reopen the factor analysis dialog box. .

click on the left arrow button to remove the variable from the Variables list box. Second. . highlight the health variable.SW388R7 Data Analysis & Computers II Slide 32 Removing the variable from the list of variables First.

click on the OK button. . To replicate the analysis without the variable that we just removed.SW388R7 Data Analysis & Computers II Slide 33 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis.

the communality for the variable "spouse's highest academic degree" [spdeg] was 0.000 1. . the variable should be removed from the next iteration of the principal component analysis.000 1.000 Extraction .741 On iteration 3.000 1.640 . Extraction Method: Principal Component Analysis.674 .719 .491.000 1.491 . Since this is less than 0.577 .000 1.50.SW388R7 Data Analysis & Computers II Slide 34 Communality requiring variable removal Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE SPOUSES HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1. The variable was removed and the principal component analysis was computed again.

.SW388R7 Data Analysis & Computers II Slide 35 Repeating the factor analysis In the drop down menu. select Factor Analysis to reopen the factor analysis dialog box.

click on the left arrow button to remove the variable from the Variables list box.SW388R7 Data Analysis & Computers II Slide 36 Removing the variable from the list of variables First. highlight the spdeg variable. . Second.

SW388R7 Data Analysis & Computers II Slide 37 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis. To replicate the analysis without the variable that we just removed. click on the OK button. .

000 1.000 Extraction . Extraction Method: Principal Component Analysis.50 have been removed from the analysis. the pattern of factor loadings should be examined to identify variables that have complex structure.000 1. If a variable has complex structure.577 .SW388R7 Data Analysis & Computers II Slide 38 Communality satisfactory for all variables Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1.000 1.782 Once any variables with communalities less than 0. it should be removed from the analysis. Complex structure occurs when one variable has high loadings or correlations (0.745 . . Variables are only checked for complex structure if there is more than one component in the solution.720 .684 .40 or greater) on more than one component. Variables that load on only one component are described as having simple structure.000 1.

Rotation Method: Varimax with Kaiser Normalization.145 -.SW388R7 Data Analysis & Computers II Slide 39 Identifying complex structure a Rotated Component Matrix RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 .031 . It is not necessary to remove any additional variables because of complex structure. none of the variables demonstrated complex structure.732 -.848 . Rotation converged in 3 iterations.872 On iteration 4.851 . a. Extraction Method: Principal Component Analysis.169 .202 .145 . .810 .

RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 -. the 2 components in the analysis had more than one variable loading on each of them.145 .848 .145 -.202 . a Rotated Component Matrix .031 . Rotation Method: Varimax with Kaiser Normalization.169 .851 No variables need to be removed because they are the only variable loading on a component.810 .872 Extraction Method: Principal Component Analysis. Rotation converged in 3 iterations. a.SW388R7 Data Analysis & Computers II Slide 40 Variable loadings on components On iteration 4. .732 .

Extraction Method: Principal Component Analysis.000 1.684 . The principal component analysis has been completed.000 1.000 1.720 .000 1.782 The communalities for all of the variables included on the components were greater than 0. we check the communalities one last time to make certain that we are explaining a sufficient portion of the variance of all of the original variables.SW388R7 Data Analysis & Computers II Slide 41 Final check of communalities Once we have resolved any problems with complex structure. .745 .577 . Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1.50 and all variables had simple structure.000 Extraction .

145 -.169 .SW388R7 Data Analysis & Computers II Slide 42 Interpreting the principal components The information in 5 of the variables can be represented by 2 components.848 . •"father's highest academic degree" [padeg]. RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 -. and •"mother's highest academic degree" [madeg]. a Rotated Component Matrix .202 . Rotation Method: Varimax with Kaiser Normalization.851 Component 2 includes the variables •"general happiness" [happy] and •"happiness of marriage" [hapmar]. Rotation converged in 3 iterations.872 Extraction Method: Principal Component Analysis.732 .810 .031 .145 . Component 1 includes the variables •"highest academic degree" [degree]. a. .

555 .SW388R7 Data Analysis & Computers II Slide 43 Total variance explained Total Variance Explained Initial Eigenvalues % of Variance Cumulative % 39.061 31.000 Extraction Sums of Squared Loadings Total % of Variance Cumulative % 1.109 70.061 39. .023 100.061 39.953 39.556 Extraction Method: Principal Component Analysis.649 .555 31.169 Component 1 2 3 4 5 Total 1.989 83.441 .169 12.977 8. The 2 components explain 70.169% of the total variance in the variables which are included on the components.953 1.953 1.820 91.061 1.109 70.158 8.401 Rotation Sums o Total % of V 1.

To compute a random selection of cases. Before we do the do the random selection. We compare the results of these two split sample analyses with the analysis of the full data set. or the cases in your two half samples will not match mine. we need to specify the starting value.SW388R7 Data Analysis & Computers II Slide 44 Split-sample validation We validate our analysis by conducting an analysis on each half of the sample. you must make certain that your data set is sorted in the original sort order. we generate a random variable that indicates which half of the sample each case should be placed in. or random number seed. . the random sequence of numbers that you generate will not match mine. To make certain your data set is in the same order as mine. Otherwise. sort your data set in ascending order by case id. and we will get different results. To split the sample into two half.

. and select the Sort Ascending command from the popup menu.SW388R7 Data Analysis & Computers II Slide 45 Sorting the data set in original order To make certain the data set is sorted in the original order. right click on the column header. highlight the case id column.

select the Random Number Seed… command from the Transform menu. .SW388R7 Data Analysis & Computers II Slide 46 Setting the random number seed To set the random number seed.

Second. . Third.SW388R7 Data Analysis & Computers II Slide 47 Set the random number seed First. click on the OK button to complete the dialog box. type in the random seed stated in the problem. click on the Set seed to option button to activate the text box. Note that SPSS does not provide you with any feedback about the change.

.SW388R7 Data Analysis & Computers II Slide 48 Select the compute command To enter the formula for the variable that will split the sample in two parts. click on the Compute… command.

the value of the formula will be 1. The uniform(1) function generates a random decimal number between 0 and 1. the SPSS numeric equivalent to false.50. into the Target Variable text box. the formula for the value of split is shown in the text box. If the random number is larger than 0. type the name for the new variable. click on the OK button to complete the dialog box.SW388R7 Data Analysis & Computers II Slide 49 The formula for the split variable First. The random number is compared to the value 0. Second. split. the formula will return a 0. . the SPSS numeric equivalent to true. If the random number is less than or equal to 0.50.50. Third.

then select the cases where split = 1. we will first select the cases where split = 0. the split variable shows a random pattern of zero’s and one’s. .SW388R7 Data Analysis & Computers II Slide 50 The split variable in the data editor In the data editor. To select half of the sample for each validation analysis.

SW388R7 Data Analysis & Computers II Slide 51 Repeating the analysis with the first validation sample To repeat the principal component analysis for the first validation sample. . select Factor Analysis from the Dialog Recall tool button.

Second. scroll down the list of variables and highlight the variable split. click on the right arrow button to move the split variable to the Selection Variable text box.SW388R7 Data Analysis & Computers II Slide 52 Using "split" as the selection variable First. .

SPSS adds "=?" after the name to prompt up to enter a specific value for split. . Click on the Value… button to enter a value for split.SW388R7 Data Analysis & Computers II Slide 53 Setting the value of split to select cases When the variable named split is moved to the Selection Variable text box.

into the Value for Selection Variable text box. click on the Continue button to complete the value entry.SW388R7 Data Analysis & Computers II Slide 54 Completing the value selection First. 0. Second. . type the value for the first half of the sample.

SPSS adds the value we entered after the equal sign. . This specification now tells SPSS to include in the analysis only those cases that have a value of 0 for the split variable. When the value entry dialog box is closed. we will request the output for the second sample before doing any comparison.SW388R7 Data Analysis & Computers II Slide 55 Requesting output for the first validation sample Click on the OK button to request the output. Since the validation analysis requires us to compare the results of the analysis using the two split sample.

select Factor Analysis from the Dialog Recall tool button. .SW388R7 Data Analysis & Computers II Slide 56 Repeating the analysis with the second validation sample To repeat the principal component analysis for the second validation sample.

SW388R7 Data Analysis & Computers II Slide 57 Setting the value of split to select cases Since the split variable is already in the Selection Variable text box. we only need to change its value. . Click on the Value… button to enter a different value for split.

into the Value for Selection Variable text box. Second.SW388R7 Data Analysis & Computers II Slide 58 Completing the value selection First. click on the Continue button to complete the value entry. 1. type the value for the second half of the sample. .

SW388R7 Data Analysis & Computers II Slide 59 Requesting output for the second validation sample Click on the OK button to request the output. SPSS adds the value we entered after the equal sign. When the value entry dialog box is closed. . This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

675 .802 .667 .647 .50.SW388R7 Data Analysis & Computers II Slide 60 Comparing communalities All of the communalities for the first split sample satisfy the minimum requirement of being larger than 0.000 Extraction .000 1.000 1.000 Extraction .000 1. . Extraction Method: Principal Component Analysis. All of the communalities for the second split sample satisfy the minimum requirement of being larger than 0. Note how SPSS identifies for us which cases we selected for the analysis.50.830 Extraction Method: Principal Component Analysis. a. a.580 .693 .000 1.000 1.000 1.807 . Only cases for which SPLIT = 0 are used in the analysis phase.000 1.618 .000 1. Only cases for which SPLIT = 1 are used in the analysis phase. a Communalities a Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1.754 RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1.

b. Only cases for which SPLIT = 1 are used in the analysis phase.SW388R7 Data Analysis & Computers II Slide 61 Comparing factor loadings The pattern of factor loading for both split samples shows the variables RS HIGHEST DEGREE. a.819 .251 . and GENERAL HAPPINESS and HAPPINESS OF MARRIAGE loading on the second component.794 . b.730 -.755 -.789 . Rotation Method: Varimax with Kaiser Normalization.219 . and MOTHERS HIGHEST DEGREE loading on the first component.049 -.183 -.154 .043 .897 . Rotation converged in 3 iterations.895 .248 -.b Rotated Component Matrix a. Rotation converged in 3 iterations. a. FATHERS HIGHEST DEGREE.893 Extraction Method: Principal Component Analysis.b Rotated Component Matrix RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 . Extraction Method: Principal Component Analysis.064 .778 . a.215 .862 RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 . .102 . Only cases for which SPLIT = 0 are used in the analysis phase. Rotation Method: Varimax with Kaiser Normalization.

b principal component Rotated Component Matrix Rotated Component Matrix This validation analysis supports a finding that the results of this represented by this data set. b. b. a. Rotation converged in 3 iterations. In effect.251 .794 .SW388R7 Data Analysis & Computers II Slide 62 Interpreting the validation results All of the communalities in both validation samples met the criteria.215 .banalysis are generalizable to the population a. a. set and remove the variables Extraction Method: Principal Component Analysis. Only cases for which SPLIT = 1 are used in the analysis phase.897 HAPPINESS OF this analysis. a.895 -.862 RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Extraction Method: Principal Component Analysis.043 DEGREE MOTHERS HIGHEST . Only cases for which SPLIT = 0 are used in the analysis phase.819 .789 .755 -. we created.049 When we are finished with .219 .102 . we have done the same analysis on two separate subsamples of cases and obtained the same results.248 -. we should select -.730 -.778 .154 . Component 1 2 . RS HIGHEST DEGREE FATHERS HIGHEST . Rotation converged in 3 iterations.893 MARRIAGE all cases back into the data Component 1 2 . and the same as the pattern for the analysis using the full sample. Rotation Method: Varimax with Kaiser Normalization. Rotation Method: Varimax with Kaiser Normalization. The pattern of loadings for both validation samples is the same.064 DEGREE GENERAL HAPPINESS .183 .

SW388R7 Data Analysis & Computers II Slide 63 Detecting outliers To detect outliers. Select the Factor Analysis command from the Dialog Recall tool button . we compute the factor scores in SPSS.

.SW388R7 Data Analysis & Computers II Slide 64 Access the Scores Dialog Box Click on the Scores… button to access the factor scores dialog box.

. Third. accept the default method using a Regression equation to calculate the scores.SW388R7 Data Analysis & Computers II Slide 65 Specifications for factor scores First. click on the Continue button to complete the specifications. click on the Save as variables checkbox to create factor variables. Second.

.SW388R7 Data Analysis & Computers II Slide 66 Compute the factor scores Click on the Continue button to compute the factor scores.

.0. It names the first factor score “fac1_1.” We need to check to see if we have any values for either factor score that are larger than ±3.SW388R7 Data Analysis & Computers II Slide 67 The factor scores in the data editor SPSS creates the factor score variables in the data editor window. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range.” and the second factor score “fac2_1.

. Second. select the fac1_1 column by clicking on its header.SW388R7 Data Analysis & Computers II Slide 68 Sort the data to locate outliers for factor one First. right click on the column header and select the Sort Ascending command from the drop down menu.

We see that none of the scores for factor one are less than or equal to -3.0.SW388R7 Data Analysis & Computers II Slide 69 Negative outliers for factor one Scroll down past the cases for whom factor scores could not be computed. .

SW388R7 Data Analysis & Computers II Slide 70 Positive outliers for factor one Scrolling down to the bottom of the sorted data set.0. we see that none of the scores for factor one are greater than or equal to +3. There are no outliers on factor one. .

Second. right click on the column header and select the Sort Ascending command from the drop down menu. select the fac2_1 column by clicking on its header. .SW388R7 Data Analysis & Computers II Slide 71 Sort the data to locate outliers on factor two First.

we see that none of the scores for factor two are less than or equal to -3.SW388R7 Data Analysis & Computers II Slide 72 Negative outliers for factor two Scrolling down past the cases for whom factor scores could not be computed. .0.

SW388R7 Data Analysis & Computers II Slide 73 Positive outliers for factor two Scrolling down to the bottom of the sorted data set. We will run the analysis excluding this outlier and see if it changes our interpretation of the analysis.0. . we see that one of the scores for factor two is greater than or equal to +3.

Choose the Select Cases… command from the Data menu.SW388R7 Data Analysis & Computers II Slide 74 Removing the outliers To see whether or not outliers are having an impact on the factor solution. To remove the outliers. we will include the cases that are not outliers. we will compute the factor analysis without the outliers and compare the results. .

.SW388R7 Data Analysis & Computers II Slide 75 Setting the If condition Click on the If… button to enter the formula for selecting cases in or out of the analysis.

0. click on the Continue button to complete the specification. The formula says: include cases if the absolute value of the first and second factor scores are less than 3. type the formula as shown. . Second.SW388R7 Data Analysis & Computers II Slide 76 Formula to select cases that are not outliers First.

. click on the OK button to complete the selection.SW388R7 Data Analysis & Computers II Slide 77 Complete the select cases command Having entered the formula for including cases.

it draws a slash through the case number.SW388R7 Data Analysis & Computers II Slide 78 The outlier selected out of the analysis When SPSS selects a case out of the data analysis. The case that we identified as an outlier will be excluded. .

SW388R7 Data Analysis & Computers II Slide 79 Repeating the factor analysis To repeat the factor analysis without the outliers. select the Factor Analysis command from the Dialog Recall tool button .

we included the specification to compute factor scores. .SW388R7 Data Analysis & Computers II Slide 80 Stopping SPSS from computing factor scores again On the last factor analysis. Since we do not need to do this again. we will remove the specification. Click on the Scores… button to access the factor scores dialog.

SW388R7 Data Analysis & Computers II Slide 81 Clearing the command to save factor scores First. Second. This will deactivate the Method options. click on the Continue button to complete the specification . clear the Save as variables checkbox.

click on the OK button.SW388R7 Data Analysis & Computers II Slide 82 Computing the factor analysis To produce the output for the factor analysis excluding outliers. .

577 . Extraction Method: Principal Component Analysis.782 Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1.50.720 .579 .000 1.720 .000 Extraction .000 1.000 1.681 . .745 .726 .000 1.771 Extraction Method: Principal Component Analysis.684 .000 1. All of the communalities for the factor analysis excluding outliers satisfy the minimum requirement of being larger than 0.000 1.000 1. Communalities RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Initial 1.000 Extraction .50.SW388R7 Data Analysis & Computers II Slide 83 Comparing communalities All of the communalities for the factor analysis including all cases satisfy the minimum requirement of being larger than 0.000 1.

157 .810 .202 .201 .145 -. a Rotated Component Matrix RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 .143 .159 -. and GENERAL HAPPINESS and HAPPINESS OF MARRIAGE loading on the second component. Extraction Method: Principal Component Analysis. a Rotated Component Matrix The factor loadings for the factor analysis excluding outliers is shown on the right. a. a. FATHERS HIGHEST DEGREE. and MOTHERS HIGHEST DEGREE loading on the first component.848 .734 -. Rotation converged in 3 iterations.837 .145 .732 -. . Rotation Method: Varimax with Kaiser Normalization. Rotation converged in 3 iterations.851 .810 .SW388R7 Data Analysis & Computers II Slide 84 Comparing factor loadings The factor loadings for the factor analysis including all cases is shown on the left. The pattern of factor loading for both split analyses shows the variables RS HIGHEST DEGREE.031 . Rotation Method: Varimax with Kaiser Normalization.846 .169 .060 .872 RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 .866 Extraction Method: Principal Component Analysis.

810 DEGREE all cases back into the data GENERAL HAPPINESS .169 . .732 -. a. our interpretation is the same.060 .157 . The pattern of loadings for both analyses is the same.810 .866 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. The outliers do not have an effect which supports their exclusion from the analysis. Rotation converged in 3 iterations. Extraction Method: Principal Component Analysis.031 .202 .143 MARRIAGE Component 1 2 .872 RS HIGHEST DEGREE FATHERS HIGHEST .734 -.145 . The part of the problem statement that outliers do not have an impact is true.201 .848 . a Rotated Component Matrix a Rotated Component Matrix RS HIGHEST DEGREE FATHERS HIGHEST DEGREE MOTHERS HIGHEST DEGREE GENERAL HAPPINESS HAPPINESS OF MARRIAGE Component 1 2 .851 .SW388R7 Data Analysis & Computers II Slide 85 Interpreting the outlier analysis All of the communalities satisfy the criteria of being greater than 0. Rotation converged in 3 iterations.846 DEGREE When we are finished with MOTHERS HIGHEST this analysis. HAPPINESS OF -. Whether we include or exclude outliers. a.159 set and remove the variables we created. we should select .50.837 .145 -. Rotation Method: Varimax with Kaiser Normalization.

SW388R7 Data Analysis & Computers II Slide 86

Computing Chronbach's Alpha

To compute Chronbach's alpha for each component in our analysis, we select Scale | Reliability Analysis… from the Analyze menu.

SW388R7 Data Analysis & Computers II Slide 87

**Selecting the variables for the first component
**

First, move the three variables that loaded on the first component to the Items list box.

Second, click on the Statistics… button to select the statistics we will need.

SW388R7 Data Analysis & Computers II Slide 88

Selecting the statistics for the output

First, mark the checkboxes for Item, Scale, and Scale if item deleted.

Second, click on the Continue button.

SW388R7 Data Analysis & Computers II Slide 89

Completing the specifications

Second, click on the OK button to produce the output.

First, If Alpha is not selected as the Model in the drop down menu, select it now.

Preferably. as it is in this case.70 or higher.SW388R7 Data Analysis & Computers II Slide 90 Chronbach's Alpha Chronbach's Alpha is located at the bottom of the output.60 or higher is the minimum acceptable level. An alpha of 0. alpha will be 0. .

this column may suggest which variable should be removed to improve the internal consistency of the scale variables.SW388R7 Data Analysis & Computers II Slide 91 Chronbach's Alpha If alpha is too small. . It tells us what alpha we would get if the variable listed were removed from the scale.

.SW388R7 Data Analysis & Computers II Slide 92 Computing Chronbach's Alpha To compute Chronbach's alpha for each component in our analysis. we select Scale | Reliability Analysis… from the Analyze menu.

. move the three variables that loaded on the second component to the Items list box. Second.SW388R7 Data Analysis & Computers II Slide 93 Selecting the variables for the second component First. click on the Statistics… button to select the statistics we will need.

Scale. . and Scale if item deleted. mark the checkboxes for Item. click on the Continue button.SW388R7 Data Analysis & Computers II Slide 94 Selecting the statistics for the output First. Second.

SW388R7 Data Analysis & Computers II Slide 95 Completing the specifications Second. . If Alpha is not selected as the Model in the drop down menu. First. select it now. click on the OK button to produce the output.

.SW388R7 Data Analysis & Computers II Slide 96 Chronbach's Alpha Chronbach's Alpha is located at the bottom of the output.70 or higher. alpha Second . An alpha of 0. as it is in this case. Preferably. click will be 0.60 or higher is the minimum acceptable level.

000 one component variable for academic degree" [madeg].651 The answer to the original question is true with caution 2 1. . We can substitute one component variable for this combination of variables in further analyses.626 40. further analyses. The components explain at least 50% of the variance in each of the variables included in the final analysis.694 17.968 68.651 40. A caution is added to our findings because of the inclusion of ordinal level variables in the analysis.169% of the total variance in the variables which are included on the components.619 1.119 27.341 85.562 14.968 68.651 40.040 100.619 Component 1 includes the variables "highest academic degree" [degree].119 27.651 1. 1. 3 . The components explain 70. Extraction Method: Principal Component 2 includes the variables "general happiness" [happy] and "happiness of marriage" [hapmar].960 "father's highest academic degree" [padeg]. and "mother's highest 4 . Ro Tot 1.626 . We can substitute this combination of Component variables in Analysis.SW388R7 Data Analysis & Computers II Slide 97 Answering the problem question Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Component Total % of Variance Cumulative % Total % of Variance Cumulative % 1 1. 40.

the support for generalizability is not as strong. Since some of the same cases will be in both validation samples.SW388R7 Data Analysis & Computers II Slide 98 Validation with small samples In the validation example completed above. When this happens. if the number of cases available for the validation is less than 100. However. When we have more than 100 cases available for the validation analysis. . then splitting the sample in two may result in a validation samples that are less than the minimum of 50 cases to conduct a factor analysis. 105 cases were used in the final principal component analysis model. we draw two random samples of cases that are both larger than the minimum of 50. an even split should generally results in 50+ cases per validation sample. but it does offer some evidence. especially if we repeat the process a number of times.

The formulas for the split variables would be: split1 = uniform(1) <= 0. For example.70 .70 or 70%.SW388R7 Data Analysis & Computers II Slide 99 Validation with small samples We randomly create two split variables which we will call split1 and split 2. using a separate random number see for each. we set the proportion of cases sufficient to randomly select fifty cases.625. the proportion we need for validation is 50 / 80 = 0. In the formula for creating the split variables. which we would round up to 0. To calculate the proportion that we need. if we have 80 valid cases.70 split2 = uniform(1) <= 0. we divide 50 by the number of valid cases in the analysis and round up to the next highest 10% increment.

the results of the validation may appear to support the analysis. . if the number of valid cases were 60.SW388R7 Data Analysis & Computers II Slide 100 Validation with very small samples When the number of valid cases in a factor analysis gets close to the lower limit of 50. The validation may appear to support the full analysis simply because the validation had limited opportunity to be different. but this can be misleading because the validation samples are not really different from the analysis of the full data set. For example. a 90% sub-sample of 54 would result in 54 cases being the same in both the full analysis and the validation analysis.

"should be international agreements for environment problems" [grnintl]. 3. Component 1 includes the variables "danger to the environment from modifying genes in crops" [genegen] and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc]. 2. is the following statement true. Use a level of significance of 0. and "America doing enough to protect environment" [amprogrn] were not included on the components and are retained as individual variables. the information in these variables can be represented with 2 components and 3 individual variables.sav. 1. The internal consistency of the variables included in the components is sufficient to support the creation of a summated scale. Cases that might be considered to be outliers do not have an impact on the factor solution. Component 2 includes the variables "claims about environmental threats are exaggerated" [grnexagg] and "poorer countries should be expected to do less for the environment" [ldcgrn]. Validate the results of your principal component analysis by repeating the principal component analysis on two 70% random samples of the data set. Based on the results of a principal component analysis of the 7 variables "claims about environmental threats are exaggerated" [grnexagg]. True True with caution False Inappropriate application of a statistic . and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc].SW388R7 Data Analysis & Computers II Slide 101 Problem 2 In the dataset GSS2000. "should be international agreements for environment problems" [grnintl]. false. "economic progress in America will slow down without more concern for environment" [econgrn]. 4.05. or an incorrect application of a statistic? Assume that there is no problematic pattern of missing data. "America doing enough to protect environment" [amprogrn]. "poorer countries should be expected to do less for the environment" [ldcgrn]. The variables "economic progress in America will slow down without more concern for environment" [econgrn]. "danger to the environment from modifying genes in crops" [genegen]. using 743911 and 747454 as the random number seeds.

694 . The communalities and factor loadings are shown below. with four of the original seven variables loading on the components.744 .756 1. Communalities Initial ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS 1.059 Extraction Method: Principal Component Analysis. a. Rotation converged in 3 iterations.207 .801 -. Extraction Method: Principal Component Analysis.861 .830 1.051 . .SW388R7 Data Analysis & Computers II Slide 102 The principal component solution A principal component analysis found a two-factor solution.229 1.000 Extraction .000 .000 .000 .691 . Rotation Method: Varimax with Kaiser Normalization.615 a Rotated Component Matrix Component 1 2 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS -.

863 ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN 2.008 EXAGGERATED HOW DANGEROUS MODIFYING GENES IN 3. We arrive at 70% by dividing the minimum sample size by the number of valid cases (50 ÷ 75 = 0. We will draw two random samples that each comprise 70% of the full sample.47 There were 75 valid cases in the final. .935 IN 5 YEARS Analysis N 75 75 75 75 analysis.SW388R7 Data Analysis & Computers II Slide 103 The size of the validation sample Descriptiv e Statistics Mean Std. 70%.953 CROPS POOR COUNTRIES LESS THAN RICH FOR 3.667) and rounding up to the next 10% increment. Deviation ENVIRONMENTAL THREATS 3.28 1.77 . The sample is to small to split in half and have enough cases to meet the minimum of 50 cases for factor analysis.11 .

To set the random number seed.SW388R7 Data Analysis & Computers II Slide 104 Split-sample validation The first random number seed stated in the problem is 743911. . so we enter this is the SPSS random number seed dialog. select the Random Number Seed… command from the Transform menu.

SW388R7 Data Analysis & Computers II Slide 105

Set the random number seed for first sample

First, click on the Set seed to option button to activate the text box.

Second, type in the random seed stated in the problem.

Third, click on the OK button to complete the dialog box. Note that SPSS does not provide you with any feedback about the change.

SW388R7 Data Analysis & Computers II Slide 106

Select the compute command

To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

SW388R7 Data Analysis & Computers II Slide 107

**The formula for the split1 variable
**

First, type the name for the new variable, split1, into the Target Variable text box. Second, the formula for the value of split1 is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.70. If the random number is less than or equal to 0.70, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.70, the formula will return a 0, the SPSS numeric equivalent to false.

Third, click on the OK button to complete the dialog box.

click on the Set seed to option button to activate the text box. click on the OK button to complete the dialog box. Note that SPSS does not provide you with any feedback about the change. type in the random seed stated in the problem. Third. . Second.SW388R7 Data Analysis & Computers II Slide 108 Set the random number seed for second sample First.

click on the Compute… command.SW388R7 Data Analysis & Computers II Slide 109 Select the compute command To enter the formula for the variable that will split the sample in two parts. .

The random number is compared to the value 0.SW388R7 Data Analysis & Computers II Slide 110 The formula for the split2 variable First. . the SPSS numeric equivalent to false.70. the value of the formula will be 1. click on the OK button to complete the dialog box. If the random number is less than or equal to 0. into the Target Variable text box. Second. Third. If the random number is larger than 0. the formula for the value of split2 is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. type the name for the new variable.70.70. split2. the formula will return a 0. the SPSS numeric equivalent to true.

. select Factor Analysis from the Dialog Recall tool button.SW388R7 Data Analysis & Computers II Slide 111 Repeating the analysis with the first validation sample To repeat the principal component analysis for the first validation sample.

.SW388R7 Data Analysis & Computers II Slide 112 Using split1 as the selection variable First. scroll down the list of variables and highlight the variable split1. click on the right arrow button to move the split1 variable to the Selection Variable text box. Second.

Click on the Value… button to enter a value for split1. SPSS adds "=?" after the name to prompt up to enter a specific value for split1. .SW388R7 Data Analysis & Computers II Slide 113 Setting the value of split1 to select cases When the variable named split1 is moved to the Selection Variable text box.

into the Value for Selection Variable text box. click on the Continue button to complete the value entry.SW388R7 Data Analysis & Computers II Slide 114 Completing the value selection First. . 1. Second. type the value for the first sample.

we will request the output for the second validation sample before doing any comparison. When the value entry dialog box is closed. . This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split1 variable. SPSS adds the value we entered after the equal sign.SW388R7 Data Analysis & Computers II Slide 115 Requesting output for the first validation sample Click on the OK button to request the output. Since the validation analysis requires us to compare the results of the analysis using the first validation sample.

.SW388R7 Data Analysis & Computers II Slide 116 Repeating the analysis with the second validation sample To repeat the principal component analysis for the second validation sample. select Factor Analysis from the Dialog Recall tool button.

click on the left arrow button to move the split1 back to the list of variables. .SW388R7 Data Analysis & Computers II Slide 117 Removing split1 as the selection variable First. Second. highlight the Selection Variable text box.

click on the right arrow button to move the split2 variable to the Selection Variable text box. Second.SW388R7 Data Analysis & Computers II Slide 118 Using split2 as the selection variable First. scroll down the list of variables and highlight the variable split2. .

SW388R7 Data Analysis & Computers II Slide 119 Setting the value of split2 to select cases When the variable named split2 is moved to the Selection Variable text box. . SPSS adds "=?" after the name to prompt up to enter a specific value for split2. Click on the Value… button to enter a value for split2.

click on the Continue button to complete the value entry. Second. . 1.SW388R7 Data Analysis & Computers II Slide 120 Completing the value selection First. type the value for the second sample. into the Value for Selection Variable text box.

This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split2 variable. SPSS adds the value we entered after the equal sign. When the value entry dialog box is closed. .SW388R7 Data Analysis & Computers II Slide 121 Requesting output for the second validation sample Click on the OK button to request the output.

SW388R7 Data Analysis & Computers II Slide 122 Comparing the communalities for the validation samples All of the communalities for the first validation sample satisfy the minimum requirement of being larger than 0.000 .000 Extraction .50.50.732 1. Communalitiesa Initial ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS 1. a.672 1.000 .648 1. a. Extraction Method: Principal Component Analysis. Only cases for which SPLIT1 = 1 are used in the analysis phase.691 1. . Only cases for which SPLIT2 = 1 are used in the analysis phase.746 Extraction Method: Principal Component Analysis. Communalitiesa Initial 1.000 .000 .773 1.000 .000 Extraction .000 .631 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS All of the communalities for the second validation sample satisfy the minimum requirement of being larger than 0.679 1.

390 . Only cases for which SPLIT1 = 1 are used in the analysis phase. have switched places. Rotation Method: Varimax Kaiser Normalization.061 Extraction Method: Principal Component Analysis. b.800 .b Rotated Component Matrix The factor loadings for the second validation analysis excluding outliers is shown on the right. Rotation analyses Method: Varimax with same pattern of variables.SW388R7 Data Analysis & Computers II Slide 123 Comparing the factor loadings for the validation samples The factor loadings for the first validation analysis including all cases is shown on the left.187 .048 . a.007 .856 .b Rotated Component Matrix Component 1 2 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS . The pattern of with factor loading for both validation shows theKaiser Normalization.829 . a. The communalities and factor loadings of the validation analysis supports the generalizability of the factor model.123 . . Rotation converged in 3 iterations.795 -.807 -. a.862 . b.859 .692 -.147 Component 1 2 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS -. Rotation converged in 3 iterations. though the first and second component a. Extraction Method: Principal Component Analysis.198 . Only cases for which SPLIT2 = 1 are used in the analysis phase.

50? No False Yes .1 The following is a guide to the decision process for answering problems about validation analysis: Is the number of valid cases greater than or equal to 100? No Yes •Set the first random seed and compute the split1 variable •Re-run factor with split1 = 1 •Set the second random seed and compute the split2 variable •Re-run factor with split2 = 1 •Set the random seed and compute the split variable •Re-run factor with split = 0 •Re-run factor with split = 1 Yes Are all of the communalities in the validations greater than 0.SW388R7 Data Analysis & Computers II Slide 124 Steps in validation analysis .

SW388R7 Data Analysis & Computers II Slide 125 Steps in validation analysis .2 Yes Does pattern of factor loadings match pattern for full data set? No False Yes True .

SW388R7 Data Analysis & Computers II Slide 126 Steps in outlier analysis .0)? No True Yes Re-run factor analysis.50? False Yes . excluding outliers Yes No Are all of the communalities excluding outliers greater than 0.1 The following is a guide to the decision process for answering problems about outlier analysis: Are any of the factor scores outliers (larger than ±3.

2 Yes No Pattern of factor loadings excluding outliers match pattern for full data set? False Yes True .SW388R7 Data Analysis & Computers II Slide 127 Steps in outlier analysis .

SW388R7 Data Analysis & Computers II Slide 128 Steps in reliability analysis The following is a guide to the decision process for answering problems about reliability analysis: Are Chronbach’s Alpha greater than 0.70 for all factors? No True with caution Yes True .60 for all factors? No False Yes Are Chronbach’s Alpha greater than 0.