You are on page 1of 5

Fleming, 2014, plant ecology research lesson, S11 Lab 7

Lab 7: Direct Ordination

In this exercise you will learn how to use a powerful ordination tool called canonical correspondence
analysis (CCA). In math and statistics, CCA is used to simultaneously analyze two or more related data
matrices. This method allows the user to assess which environmental factors within sampled plots
explain species patterns in those same plots. You will learn how to interpret the meaning of correlations
and bi-plots, and how to determine if the results of CCA analysis are meaningful. Canonical, by the way,
means, in its simplest and most comprehensive form.

NOTE: As with the indirect ordination lab, a helpful website with glossary of terms, examples and other
useful conceptual information can be found here: (last accessed 3/31/15).
As before, your instructor will walk you through some examples before you explore data on your own.

CCA has proven to be an effective, though far from perfect, method for direct gradient analysis using
multivariate methods, meaning it examines the behavior of multiple dependent variables simultaneously,
as opposed to one at a time. CCA can help resolve the major question that indirect ordination leaves us
with: what do the ordination axes mean and how do we interpret them? CCA differs from DCA by
incorporating correlation and regression between multiple factors and entities , such as floristic data and
environmental factors WITHIN the analysis, not as an after-thought. The goal is to relate plant
community data directly to environmental data, and the results express these relationships quantitatively!
An additional advantage with CCA is that both quantitative and categorical data can be used together.

Since CCA ignores community structure in forming a sample ordination, it is wise to still conduct an
indirect ordination followed by attempts to relate to environmental variables. The two approaches are, or
should be, complementary.

Graphical output facilitates interpretation and discussion of CCA results. Species and samples are
represented as points in the ordination space, where environmental factors are shown as vectors (arrows).
As vectors will often point into a quadrant of space in the ordination graph, this means that the ordination
axes have direct, but often more than one, environmental meaning.

The input of CCA consists of two matrices, not one. The first is the species data (as you used in DCA),
but a second matrix of environmental data is also included. Direct ordination essentially works by having
the software look at one data matrix and then overlay the other matrix on top of the first and examines
all columns and rows across the two matrices for correlations to each other.

CCA is a method for finding a message concealed in a noisy, multi-dimensional message. It is as if your
e-mail contained all the right words in order, but with nonsense intermingled.


below for the actual sentence.

CCA should be used if you are primarily concerned with species responses to the environmental variables
you have measured and if you think that a uni-modal response of species to the environment describes the
situation reasonably well. CCA uses multiple regressions of gradients on environmental variables, and
Fleming, 2014, plant ecology research lesson, S11 Lab 7

this is where some danger lies. If you have many variables, results may be dubious since you can explain
anything with enough variables. The process uses multiple linear regression to select the best linear
combinations of environmental variables. BEST means that taken together, they explain the most
variation in the species scores on each axis. As in last weeks lab activity, it is important to note the CCA
is appropriate for beginning ecologists; studies intended to be published should use another direct
ordination method called redundancy analysis (RDA). RDA is more involved than CCA, so is beyond the
scope of our class. Of course, if you wish to talk with me and learn more about RDA I would be happy to
discuss it with you outside of class.


Correlation among environmental variables. Simple correlations among environmental variables can help
to understand the results. You may wish to eliminate one of two closely correlated variables. For
example, as one increases in elevation, precipitation usually also increases. Which of these two
environmental variables has more explanatory power in explaining species relationships? try eliminating
elevation and precipitation one at a time, run CCA in each case and examine bi-plot scores.

Iteration report. One way CCA works is by randomizing values within both matrices and notes if results
of these iterations are more or less likely to produce the result you get with your original, unmodified
data. Iterations stop when a stable solution is reached, based on tolerance. You can specify up to 999
iterations, but probably a couple hundred would suffice. Reasonable results are still likely if you do 999
iterations without reaching the tolerance.

Total variance in the species data (inertia). This is the total information in the species data that could be
explained, that is associated with an environmental variable. Do not expect to get high values for
explained variation. On Mount St. Helens, 20% is a good result.

Axis summary statistics. Eigenvalues represent the community variation on an axis.

Variance explained in species data is {100*(eigenvalue/total variance)} and descends as ordination

number increases. Said another way, the first ordination axis will explain the most variance in the
community data, the second axis explains the second-most variance, and so on.

Pearson correlations are standard correlation coefficients between species and environment; it is the
correlation coefficient between sample scores derived from species data (WA scores) and those that are
derived from the environment (LC scores). LC scores are the scores that result from the linear
combination of environmental data. Thus, LC scores are in environmental space. WA scores are derived
from the species scores, so they are similar to an indirect gradient analysis in that the samples are in
species space. LC scores are preferred if you are looking for environmental relationships.
The Kendal (rank) correlations are a non-parametric version of the Pearson correlation. Use these scores
in your repots.

Multiple regression results. This lists the effectiveness of the environmental variables in structuring the
ordination and describes the relationship of environmental variables to ordination axes.

Canonical coefficients are regression coefficients derived from the iterative method. They have larger
variance than a regression coefficient, so you cant use standard significance tests. These represent the
contribution of the variable to the regression solution. There are two forms: standardized coefficients
Fleming, 2014, plant ecology research lesson, S11 Lab 7

related to the centered standardized variables. The coefficients are also given for the variable in their
original units.

The R-squared (R2) value for an axis provides an estimate of how well the variables predict the location
of a sample in the ordination space.

Final scores. These are your values for each site and each species. You could superimpose them on the
same graph. You can also superimpose the vectors representing the environmental scores on the same (bi-
plot) graph.

Weights for sites and species. The weights of species on the edge of an ordination have little value and
could be ignored.

Correlation of environmental variable with ordination axes. There are two sets of site scores (LC and
WA). You want to use the intra-set correlations, derived from LC. These will be higher and will
indicate those variables most influential in determining environmental relationship.

Bi-plot scores. Significant environmental variables can be graphed as vectors radiating from the centroid
of the ordination. The tip of the vector arrow is the coordinate for the variable in the two dimensions
shown. The longer the arrow, the stronger is the relationship. These are based on the intraset correlations.
Graphs are available using the graph option in the menu. It is recommended to use Linear Combination
scores for graphing. This is the best fit of species abundances to the environmental data. It is the default
option. Ignore the inter-set correlations for now.

Monte Carlo tests. Because CCA does not generate parameters that meet normal statistical assumptions,
the only way to determine if there is anything but noise in the relationship is to run simulations with the
data scrambled. There are two null hypotheses:

H0a: No relationship between matrices (e.g. environment does not control communities). The rows of the
environmental data matrix are assigned at random within this matrix.

H0b: No structure in the main matrix, thus no relationship between matrices. The elements of the main
matrix are randomly reassigned within columns, thus destroying the relationship among species.

For our purpose, the first null hypothesis is the norm. Several tables show how the observed eigenvalues
and species-environmental correlations compare to the random result, and p-values are calculated.

Specify at least 30 runs and compare the random eigenvalues to the actual eigenvalue. Hopefully, this will
be low and the actual eigenvalue will suggest true structure.

What to report

The nature of the matrices (species, environmental variables, etc.); how many axes used; scaling option;
Monte Carlo method, whether LC or WA scores used, bi-plots and other interpretive aides; eigenvalues;
canonical coefficients, intra-set correlation.

From the PC-ORD menu, select the main file (species in plots) and the secondary file (environmental data
in plots). Then select CCA from the Ordination menu. You will be asked some questions.

1. Rows and Column scores standardized by:

Fleming, 2014, plant ecology research lesson, S11 Lab 7

a. Centering & normalizing (default) provides bi-plot scaling to a mean of zero and unit
variance. This is the normal method.
b. Hills scaling preserves aspects of DCA scores along the axes.

2. Scaling of Ordination Scores. The choices to control scaling of species versus sites:
a. Optimize rows (sites); then the program weights the site scores mean species scores
b. Optimize columns (species); then species scored are weighted by sites scores. This choice
allows a direct interpretation of the relationship between species and environment.
c. The compromise option can be used.

Working in small groups, use computers with PC-ORD installed.

We will use foods.xls that we used for the DCA activity last week, and a second file called
foods_second_matrix.xls which describes certain characteristics of the same countries from the food
survey file.

You will have an opportunity to run several analyses, create graphs and interpret results. The menus are
fairly simple and will be explained during an example lab.

Practice running and interpreting CCA ordinations. In all cases, print a graph of your ordination and
report eigenvalues and variance explained for ordination axes 1 and 2. Try the following different options
to see the effects of modifying the original data set:

1. Run CCA on unaltered data. I will walk you through this example.
2. Try transforming the food data by using the square root of all raw values. Run CCA and graph the
output. Have relationships or variance explained changed? If so, how? Now use the square root
of all the country data and rerun. What changed?
3. Use the unaltered data and remove one country (you choose, but remember to do it on both the food
and country files!). Run CCA and graph the output. Have relationships or variance explained
changed? If so, how?
4. Use the unaltered data and remove three countries (you choose, but remember to do it on both the food
and country files!). Run CCA and graph the output. Have relationships or variance explained
changed? If so, how?
5. Try a transformation of raw values AND removing one or three countries. Run CCA and graph the
output. Have relationships or variance explained changed? If so, how?
6. Imagine you are a country of one. Add a row in the food file that represents you and fill in each cell
with a value from 0-100 that represents your preference for that food. Assume 0 = I hate it and
100 = if I could only ever eat this for the rest of my life Id step over my own mother to do it.
Be sure to add a row in the country file and add in some values for your new countrys
characteristics. IMPORTANT: remember to change the number of columns in your excel file
before importing to PC-ORD! Run DCA and graph the output. Have relationships or variance
explained changed? If so, how?

Now that you have some familiarity with DCA, use the file sample_speciesv2.xls and
sample_environmental_characteristics.xls to determine which (if any) environmental factors strongly
explain the plant community makeup. Note any strongly covarying environmental factors (values < -0.8
and > 0.8) and omit each one at a time and rerun to see if results change. Use different numbers of runs
for the Monte Carlo test and see if your final solution changes. Try modifying data such as square-root-
Fleming, 2014, plant ecology research lesson, S11 Lab 7

transforming and see if your results change. Most of all, just like learning how to drive a car, practice a