You are on page 1of 5

Interpreting score plots

Before summarizing some points about how to interpret a score plot, let’s quickly
repeat what a score value is. There is one score value for each observation (row) in
the data set, so there are are N score values for the first component, another N for
the second component, and so on.

The score value for an observation, for say the first component, is the distance from
the origin, along the direction (loading vector) of the first component, up to the point
where that observation projects onto the direction vector. We repeat an earlier figure
here, which shows the projected values for 2 of the observations.

We used geometric concepts in another section that showed we can write: T=XP to
get all the scores value in one go. In this section we are plotting values from the
columns of T. In particular, for a single observation, for the ath component:

ti,a=xi,1p1,a+xi,2p2,a+…+xi,kpk,a+…+xi,KpK,a
The first score vector, t1,explains the greatest variation in the data; it is considered
the most important score from that point of view, at least when we look at a data set
for the first time. (After that we may find other scores that are more interesting). Then
we look at the second score, which explains the next greatest amount of variation in
the data, then the third score, and so on. Most often we will plot:

 time-series plots of the scores, or sequence order plots, depending on how the
rows of X are ordered

 scatter plots of one score against another score


An important point with PCA is that because the matrix P is orthonormal (see
the later section on PCA properties), any relationships that were present in X are still
present in T. We can see this quite easily using the previous equation. Imagine two
observations taken from a process at different points in time. It would be quite hard
to identify those similar points by looking at the K columns of raw data, especially
when the two rows are not close to each other. But with PCA, these two similar rows
are multiplied by the same coefficients in P and will therefore give similar values of t.
So score plots allow us to rapidly locate similar observations.

When investigating score plots we look for clustering, outliers, time-based patterns.
We can also colour-code our plots to be more informative. Let’s take a look at each
of these.

Clustering

We usually start by looking at the (t1,t2) scatterplot of the scores, the two directions
of greatest variation in the data. As just previously explained, observations in the
rows of X that are similar will fall close to each other, i.e. they cluster together, in
these score plots. Here is an example of a score plot, calculated from data from a
fluidized catalytic cracking (FCC) process [Taken from the Masters thesis of Carol
Slama (McMaster University, p 78, 1991)].
It shows how the process was operating in region A, then moved to region B and
finally region C. This provides a 2-dimensional window into the movements from
the K=147 original variables.

Outliers

Outliers are readily detected in a score plot, and using the equation below we can
see why. Recall that the data in X have been centered and scaled, so the x-value for
a variable that is operating at the mean level will be roughtly zero. An observation
that is at the mean value for all K variables will have a score vector of ti=[0,0,…,0].
An observation where many of the variables have values far from their average level
is called a multivariate outlier. It will have one or more score values that are far from
zero, and will show up on the outer edges of the score scatterplots.

Sometimes all it takes is for one variable, xi,k to be far away from its average to
cause ti,a to be large:

ti,a=xi,1p1,a+xi,2p2,a+…+xi,kpk,a+…+xi,KpK,a
But usually it is a combination of more than one x-variable. There are K terms in this
equation, each of which contribute to the score value. A bar plot of each of
these K terms, xi,kpk,a, is called a contribution plot. It shows which variable(s) most
contribute to the large score value.

As an example from the food texture data from earlier, we saw that observation 33
had a large negative t1 value. From that prior equation:

t33,1=0.46xoil−0.47xdensity+0.53xcrispy−0.50xfracture+0.15xhardnesst33,1=0.46×−
1.069−0.47×+2.148+0.53×−2.546−0.50×2.221+0.15×−1.162t33,1=−4.2
The K=5 terms that contribute to this value are illustrated as a bar plot, where the
sum of the bar heights add up to −4.2:

This gives a more accurate indication of exactly how the low ti value was achieved.
Previously we had said that pastry 33 was denser than the other pastries, and had a
higher fracture angle; now we can see the relative contributions from each variable
more clearly.

In the figure from the FCC process (in the preceding subsection on clustering), the
cluster marked C was far from the origin, relative to the other observations. This
indicates problematic process behaviour around that time. Normal process operation
is expected to be in the center of the score plot. These outlying observations can be
investigated as to why they are unusual by constructing contribution bar plots for a
few of the points in cluster C.

Time-based or sequence-based trends

Any strong and consistent time-based or sequence-order trends in the raw data will
be reflected in the scores also. Visual observation of each score vector may show
interesting phenomena such as oscillations, spikes or other patterns of interest. As
just described, contribution plots can be used to see which of the original variables
in X are most related with these phenomena.

Colour-coding

Plotting any two score variables on a scatter plot provides good insight into the
relationship between those independent variables. Additional information can be
provided by colour-coding the points on the plot by some other, 3rd variable of
interest. For example, a binary colour scheme could denote success of failure of
each observation.

A continuous 3rd variable can be implied using a varying colour scheme, going from
reds to oranges to yellows to greens and then blue, together with an accompanying
legend. For example profitability of operation at that point, or some other process
variable. A 4th dimension could be inferred by plotting smaller or larger points. We
saw an example of these high-density visualizations earlier.

Summary

 Points close the average appear at the origin of the score plot.

 Scores further out are either outliers or naturally extreme observations.

 We can infer, in general, why a point is at the outer edge of a score plot by cross-
referencing with the loadings. This is because the scores are a linear combination of
the data in X as given by the coefficients in P.

 We can determine exactly why a point is at the outer edge of a score plot by
constructing a contribution plot to see which of the original variables in X are most
related with a particular score. This provides a more precise indication of exactly why
a score is at its given position.

 Original observations in X that are similar to each other will be similar in the score
plot, while observations much further apart are dissimilar. This comes from the way
the scores are computed: they are found so that span the greatest variance possible.
But it is much easier to detect this similarity in an A-dimensional space than the
original K-dimensional space.

You might also like