You are on page 1of 5

# NOMINATE AND DIMENSION REDUCTION ALGORITHMS

## NOMINATE is a data dimension reduction algorithm which purports to

best preserve the differences between the voting records of a set of L legislators on a large number N of bills among the set of all possible twodimensional projections of such data.
The output of the algorithm is a set of two coordinates (xi , yi ) for each of
the legislators i = 1, 2, . . . , L which, according to its authors, correlate with
the legislators ideological preferences (with negative values of xi representing more liberal views and positive more conservative) and preferences on
a set of contemporary issues (with negative yi being more cautious about
change and positive more progressive).
The algorithm is a type of dimension reduction algorithm, the most basic
of which is called a principal component, or principal coordinate, analysis
(PCA), described below.

## 1. P RINCIPAL C OMPONENT A NALYSIS

In a set of L data points in N-dimensional space, the variance is properly
an N-dimensional quantity since each of the coordinates is in principle an independent source of variance. (In the legislators example, there
may be independent sets of differences among colleagues on each and every different vote that is taken.) In practice, however, some directions in
N-dimensional space capture more of the variance than others (a few votes,
or combinations of votes, are enough to differentiate the legislators records
from one another). Finding these most important directions is the goal of
dimension reduction algorithms.
A principal component analysis (PCA) of a set of data points describes
a method of ordering the principal directions (eigenvectors) of the covariance between L data points from that which explains the largest amount of
variance on its own (the largest eigenvalue) to the most inconsequential. Together, all N directions are needed to completely explain the variance in the
data, but in practice the 1 or 2 most important directions alone are enough to
explain a significant proportion of the variance. Geometrically, the largest
proportion of variance is realized along a direction which, if a fixed amount
of variance were distributed in N-dimensional space in the same way as the
variance in the data, would realize the largest diameter of the ellipsoid that
results.
1

## Definition. Let M be a square, symmetric, positive definite matrix. Then

the principal component of M is an eigenvector of M whose eigenvalue 1
is greatest.
2. A TWO -

## Suppose we have L = 6 legislators, from Kansas, Louisiana, Maryland,

New York, Ohio, and Pennsylvania respectively. We wish to classify them
based on their votes on N = 2 bills: the first being a bloated money-forKansas bill and the second a contentious abortion-rights bill. Each legislators up-or-down (1-or-0) vote on these bills can be recorded as a vector in
two dimensions. Lets say these are
 
 
 
 
 
 
1
0
0
0
0
0
x1 =
x2 =
x3 =
x4 =
x5 =
x6 =
0
0
1
1
0
1
In other words, the legislator from Kansas voted yes on the first bill and no
on the second; the legislator from Louisiana no on the first and no on the
second; and so on.
The variation between legislators can be quantified by a covariance matrix which shows how much variation on each bill there was from person to
person, as well as how much covariance there was between bills (i.e., how
correlated peoples votes were on both bills). This 2 2 matrix has as its
entries the dot products of each pair of the rows consisting of the vote tallies
on each bill, i.e. if
v1 = [1, 0, 0, 0, 0, 0] and

v2 = [0, 0, 1, 1, 0, 1]

are the voting records on bills one and two, then the relevant matrix is found
by first centering each vector around its mean by subtracting the average:
1 1 1 1 1 1
5 1 1 1 1 1
w1 = [1, 0, 0, 0, 0, 0] [ , , , , , ] = [ , , , , , ]
6 6 6 6 6 6
6 6 6 6 6 6
1 1 1 1 1 1
1 1 1 1 1 1
and w2 = [0, 0, 1, 1, 0, 1] [ , , , , , ] = [ , , , , , ]
2 2 2 2 2 2
2 2 2 2 2 2
The covariance matrix is obtained by multiplying the matrix A whose
rows are w1 and w2 by its transpose (i.e. the matrix whose columns are w1
and w2 ). Here we obtain
 5

12
T
6
M = AA =
3
12
2
We interpret the diagonal entries as the variance of votes on each bill: there
was much more variance, i.e. less agreement, on the second bill (3/2) than
on the first (5/6). The off-diagonal entry is the covariance of these two bills,
roughly, whether voting for one was correlated more with voting for the

## NOMINATE AND DIMENSION REDUCTION ALGORITHMS

other, or against the other. Here the negative (1/2) indicates that voting
for the Kansas-loot bill was correlated with a likelihood of voting against
abortion rights, which makes sense in the data since the only legislator to
vote for the Kansas bill also voted against abortion rights.
Intuitively, there was little disagreement over the Kansas bloat bill five
of six legislators were against it while there was a lot of disagreement on
the abortion rights bill, a 50/50 split. So if we want to explain the differences
between legislators, their vote on abortion rights should be a much more
significant indicator than their vote on the Kansas bill, on which they mostly
agreed.
To quantify this, lets find the principal component of M. Its eigenvalue/eigenvector pairs are (using technology)




0.535
1.87
1 1.77, e1
,
2 0.57, e2
1
1
Since the former has the larger eigenvalue, we say that e1 = (0.535, 1) is
the principal component of the covariance. Projecting each legislators (centered) data point onto the e1 direction will then give us a one-dimensional
viewpoint of their differences.
Graphically: if each legislators votes on these two bills (after subtracting
the average vote to center them) were plotted as a point in the xy-plane, they
would appear as the dark blue points in Figure 1.
If (x0 , y0 ) are the coordinates of each point with respect to the basis {e1 , e2 }
of eigenvectors of M, then any data set whose total variance is equal to 1
(for example) and whose covariances are behave identically to our data set
will lie within the circle x02 + y02 = 1. This can be plotted in the xy-plane
using a change of coordinates to have the equation
 
1 x
(1)
[x y]M
= 1
y
3 2
5
(2)
x + xy + y2 = 1.
2
6
This ellipse is plotted in blue in Figure 1, and represents the maximum
amount of variability in each direction, given a data set with the same covariance matrix as ours. In other words, if these 6 legislators are representative of all 535 legislators in the U.S. Congress, and we plotted the data
points for all 535, the image we get would be the same as Figure 1.
Meanwhile, the eigenvectors e1 , e2 of M are the principal axes of the quadratic form in Equation (2), with the e1 axis being the principal component
of M since its eigenvalue is larger. Geometrically, this corresponds to the
semimajor axis, or the largest diameter of the ellipse. Intuitively, if we view

## NOMINATE AND DIMENSION REDUCTION ALGORITHMS

the ellipse from a viewing angle that is perpendicular to this direction, the
ellipse will appear as wide as possible.
But, viewing the ellipse and the data points in this fashion also visualizes
the projection of each data point onto this principal axis. The coordinate
obtained in this projection is called the principal factor score of each data
point, and these may be arranged along the principal axis as shown in Figure
1. We have therefore reduced the dimension of our data set from 2 to 1,
replacing each legislators actual vote total by their factor score (coordinate
along the principal axis).
Because we have selected the maximal diameter of the ellipse to visualize, we realize the maximal spread along the axis between distinct data
points in the set. We know, for instance, that in the original 2-dimensional
data, Kansas and New York whose votes differed on both bills are much
further apart than Kansas and Louisiana (for instance) who agreed on the
abortion rights bill. This difference from the original data set is maximally
preserved in the one-dimensional factor score.
We then interpret this factor score as follows. The principal direction e1
is more closely aligned (makes a narrower angle) with the y-axis than with
the x-axis, meaning that the vote on the abortion-rights bill weighs more
heavily in the factor score hence explains more of the variance between
legislators than the vote on the Kansas bill. But the vote on the Kansas
bill is not completely omitted, since the principal direction still contains a
nonzero component in the x-direction. This presumably allows the factor
score to differentiate between the legislators from Louisiana and Ohio, who
both voted with Kansas on the abortion-rights bill, from the legislator from
Kansas who departed from them on the Kansas bill.
The factor score for each legislator is then a single number that we may
interpret as the amount to which this legislator combines opposition to
abortion rights (in large measure) with support for Kansas (in smaller measure). The Kansas legislator has the highest combination of these characteristics and so the highest factor score of all six; Louisiana and Ohio
slightly lower because they did not go along on the Kansas bill; and the
other three states much lower negative in fact since they voted in
opposition to Kansas and in support of abortion rights.
3. NOMINATE
Scaling this method up, NOMINATE: (a) combines votes on a much
larger number of issues, so uses a much larger N than we did here; and
(b) instead of selecting only the one principal component, selects the top
two principal components so as to plot each legislator on a two-dimensional
plane using two factor scores instead of one.

Principal
Direction
2

## Ellipse = One unit

of variance, distributed
equally to the data set
1

(Centered)
Vote on
Abortion 0
Rights Bill

MD/NY/PA

Yea

Nay

KS

LA/OH

-1

direction to
maximize
variance

-2

Nay

-3
-3

-2

-1

Yea
0

## (Centered) Vote on Kansas Bill

More opposition
to Kansas bill
& support for
abortion rights bill
-0.4

MD/NY/PA

-0.2

More support
for Kansas bill
& opposition to
abortion rights bill
0.0

0.2

LA/OH

0.4

0.6

## F IGURE 1. Principal component analysis on the data set in

this problem selects a viewing angle on the data set which
provides for the widest diameter of the ellipse defined by
the covariance matrix, i.e., the maximal amount of variation
between different data points. The red principal direction
line in the top diagram is exactly the red axis in the bottom
diagram.

KS