Professional Documents
Culture Documents
September 2016
General
Hierarchical cluster analysis is a multivariate procedure than can be used to detect patterns in ecological
data. It can work with abundance data or with binary presence-absence data.
Software
The material in this guide is based on the procedures carried out using PAST (Hammer, Harper, and Ryan,
2001). This is free statistical software (http://folk.uio.no/ohammer/past/) that was originally designed for
paleontological analyses but that is also very well-suited to ecological applications. At the time of writing,
the current version was 3.13 (August 2016). The software is updated frequently.
Procedure
Preparing the data
(1) Prepare a data matrix with sites/locations in rows and species in columns. In the case
of presence/absence data (as in BIO3060 Practical 1), the occurrence of a species should be
indicated by a ‘1’ and its absence by a ‘0’. The matrix shown in Figure 1 shows the occurrence of
15 species of endemic plants in 20 locations around the Maltese Islands (Galea, 2016). The matrix
was originally edited in Microsoft Excel and subsequently copied into PAST.
(2) To enter data in PAST, please tick the ‘Row attributes’ and ‘Column attributes’ boxes on the
left- hand side of the top ribbon and paste the copied data into the cell labelled ‘Name’ in both
rows and columns (Figure 2). After the data has been entered, the ‘Row attributes’ and ‘Column
attributes’ boxes should be unticked.
Figure 1: Presence-absence data for endemic plants of the Maltese Islands. Species names are in columns and site names are in
rows. Matrix compiled by Christine M Galea (2016).
1
BIO3060 2016: Practical 1
Figure 2: Data screen in PAST v.3.13. The boxes that need to be ticked, and the cell where data should be pasted from another
editor are indicated.
2
BIO3060 2016: Practical 1
3
BIO3060 2016: Practical 1
Figure 4: Dendro gram showing results of cluster analysis (Paired-Group linkage; Jaccard similarity) on the data in the matrix in
Figure 1.
(1) In the PAST data matrix, please tick the ‘Row attributes’ and ‘Column attributes’ boxes
and insert a new column before the first species column. Use the ‘Edit>Insert more columns ...’
command from the top menu to do this. The new column will be labelled ‘c1’ by default.
(2) With the newly-added column ‘c1’ selected, click on the cell in the ‘Type’ row. This should
bring up a drop-down menu with the options ‘Group’, ‘Ordinal’, ‘Nominal’, and ‘Binary’. Please
select the ‘Group’ option, indicating that the values in this column are grouping variables (indicating
membership of a cluster).
(3) Enter the cluster within which each site has been classified in column c1 (Figure 5).
(4) Select all the data by clicking on the cell in the top left-hand corner of the spreadsheet.
(5) Select the ANOSIM procedure by selecting ‘Multivariate>Tests>One-way ANOSIM’ from the
top menu.
(6) Set the ‘Similarity Index’ to ‘Jaccard’ in the pop-up window that appears. Leave the number
of permutations at the default value. Click the ‘Recompute’ button.
(7) The results will appear in a window as shown in Figure 6.
(8) The important results are the R value and the p value. The R value varies from 0 to +1, with
values closer to +1 indicating higher dissimilarity between the clusters being compared, and
values closer to 0 indicating greater similarity. The p value indicates whether this separation is
statistically significant. In the case if this analysis, the R value (0.63) suggests a distinct dissimilarity
between the species composition of the sites in Clusters I, 2, and 3. The p value of 0.0002 indicates
that this dissimilarity is also statistically-significant.
4
BIO3060 2016: Practical 1
(9) Clicking on the ‘pairwise’ tab shows a matrix comparing each pair of clusters with each other.
The R values and p values associated with each comparison can be shown (Figure 7, Figure 8). In
this case the very high R values of the comparisons between Cluster 1 and Cluster 3 (R=0.9949)
and between Cluster 2 and Cluster 3 (R=0.8386) suggest that the species composition for the sites
in Cluster 3 is very different when compared to those in Clusters 1 and 2.
5
BIO3060 2016: Practical 1
Figure 5: Data matrix in PAST showing the addition of a grouping variable indicating cluster membership for each site.
6
BIO3060 2016: Practical 1
(1) Select all the data in the matrix by clicking on the cell in the top left-hand corner of the
PAST spreadsheet.
(2) Select the ‘Multivariate>Tests>SIMPER’ command from the top menu. A results window
similar to the one shown in Figure 9 should appear.
(3) Set the ‘Distance/similarity measure’ to ‘Euclidean’ from the drop-down menu, and
select the two clusters to be compared (1 v 2, 1 v 3, or 2 v 3, in this case). Click on the
‘Recompute’ button when the required options have been selected. The results will change
depending on the new options selected.
(4) The SIMPER results (Figure 9) list the taxa (in this case, species) that are causing the
difference between the selected clusters.
(5) The third column (‘Contrib.%’) shows the contribution of each species listed to the
differences between the selected clusters. In the example shown in Figure 9, where Cluster 1
and Cluster 2 are being compared, the species Chiliadenus bocconei is contributing to 26.96% of the
difference between the two clusters.
(6) The fifth and sixth columns (‘Mean 1’ and ‘Mean 2’) show the proportion of sites in
which a species is present for a given cluster. For the data shown in Figure 9, Chiliadenus
bocconei was not present in any of the five sites in Cluster 1 (‘Mean 1’ =0) but was recorded from
11 out of the 12 sites in Cluster 2 (‘Mean 2’ =0.917). Similarly, Anthemis urvilleana was not
present in any of the Cluster 1 sites and recorded from half the Cluster 2 sites (‘Mean 2’ =0.5).
(7) From the ‘Contrib.%’, it can be seen that A.urvilleana contributes 14.71% of the
difference between Cluster 1 sites and Cluster 2 sites. The cumulative contribution of the first
two species to the difference between Clusters 1 and 2 (fourth column: ‘Cumulative %’) is
therefore 41.67%.
(8) Taken together, these results suggest that the principal differences between Cluster 1 and
Cluster 2 sites arise from the presence of C.bocconei and A.urvilleana in the Cluster 2 sites whereas
these species were absent from all of the Cluster 1 sites.
References
Galea, C.M. (2016). Trait characteristics of endemic plants of the Maltese Islands. Unpublished Bachelor
of Science dissertation. Faculty of Science, University of Malta: xiv+66pp.
Hammer, Ø., Harper, D. A. T., & Ryan, P. D. (2001). Paleontological Statistics Software: Package for
Education and Data Analysis. Palaeontologia Electronica 4(1): 9 pp.