Introduction to Factor Analysis, Objectives and Principal Component Analysis

and

Principal Components

By

A. Subrahmanyam

Factor Analysis

Factor analyses are performed by examining

the pattern of correlations (or covariances)

between the observed measures.

uncorrelated) from a large set of variables

(most of which are correlated to each other)

similar things (conceptually).

Uses of Factor Analysis

Factor Analysis is primarily used for data

reduction or structure detection.

The purpose of data reduction is to remove

redundant (highly correlated) variables from the

data file, perhaps replacing the entire data file

with a smaller number of uncorrelated variables.

The purpose of structure detection is to examine

the underlying (or latent) relationships between

the variables.

To construct a questionnaire to measure an

underlying variable and

Two types of factor analysis:

Exploratory

It is exploratory when you do not have a pre -

defined idea of the structure or how many

dimensions are in a set of variables.

Confirmatory

It is confirmatory when you want to test specific

hypothesis about the structure or the number of

dimensions underlying a set of variables

Important terminology

Factor loading: interpreted as the Pearson

correlation between the variable and the factor

Extraction: the process by which the factors are

determined from a large set of variables

Communality:

The sum of the squared factor loadings for all

factors for a given variable (row) is the variance

in that variable accounted for by all the factors,

and this is called the communality.

The communality measures the percent of variance

in a given variable explained by all the factors

jointly.

Important terminology

Cont..

Eigenvalues:

variance in all the variables which is accounted for

by that factor.

Eigenvalues are often used to determine how

many factors to take

Take as many factors there are eigenvalues

greater than 1

The amount of standardized variance in a variable

is 1

The sum of eigenvalues is the percentage of

variance accounted for

Principle component

Analysis

Principle component: one of the extraction

methods

A principle component is a linear combination of

observed variables that is independent

(orthogonal) of other components

amount of variance in the input data; the second

component accounts for the largest amount or

the remaining variance

Principle component

Analysis Cont..

Components are orthogonal means they are

uncorrelated

the most common used methods

principle components

Principle component

Analysis Cont..

Possible application of principle

components:

E.g. in a survey research, it is common to

have many questions to address one issue

(e.g. customer service).

It is likely that these questions are highly

correlated. It is problematic to use these

variables in some statistical procedures (e.g.

regression).

One can use factor scores, computed from

factor loadings on each orthogonal component

Principle component

What does principal components analysis do?

Analysis Cont..

Takes a set of correlated

variables and creates a smaller

set of uncorrelated variables.

These newly created variables are called principal

components.

There are two main objectives for using PCA

1. Reduce the dimensionality of the data.

In simple English: turn p variables into less than p variables.

While reducing the number of variables we attempt to keep as

much information of the original variables as possible.

Thus we try to reduce the number of variables without loss of

information.

2. Identify new meaningful underlying variables.

This is often not possible.

The principal components created are linear combinations of the

original variables and often dont lend to any meaning beyond

that.

Rotation

Objective: to facilitate interpretation

Orthogonal rotation: Done when data reduction is

the objective and factors need to be orthogonal

maintains independence of factors

more commonly seen

Ex:varimax, quartimax, equamax, parsimax, etc.

allow factors to be correlated

allows dependence of factors

can be harder to interpret once you lose

independence of factors

Oblimin and Promax

Varimax Rotation

It seeks the rotated loadings that maximize the variance

of the squared loadings for each factor; the goal is to

make some of these loadings as large as possible, and

the rest as small as possible in absolute value.

The varimax method encourages the detection of factors

each of which is related to few variables.

By default the rotation is varimax which produces

orthogonal factors.

This means that factors are not correlated to each other.

This setting is recommended when you want to identify

variables to create indexes or new variables without inter

- correlated components

Number of methods:

component

Principle axis: account for correlations between the

variables

Unweighted least-squares: minimize the residual

between the observed and the reproduced correlation

matrix

Generalize least-squares: similar to Unweighted

least-squares but give more weight to the variables with

stronger correlation

Maximum Likelihood: generate the solution that is the

most likely to produce the correlation matrix

Alpha Factoring: Consider variables as a sample; not

using factor loadings

Image factoring: decompose the variables into a

common part and a unique part, then work with the

common part

