You are on page 1of 6

Variability of Northwest Florida Soils by Principal Component Analysis

F. A. OVALLES AND M. E. COLLINS*

ABSTRACT represented by the distance between them, and among


Twenty soil properties from 151 pedons were evaluated (i) to select other individuals, by their relative positions on the
the statistically-important ones influencing soil variability and (ii) line. Therefore, a way of dealing with multivariate data
to test if these were actually differentiating properties. Approxi- is to arrange the individuals along one or more axes
mately 83% of the pedons studied were Ultisols. Three sets of data (Webster, 1977). This can be done using PCA.
were used for the statistical analysis of soil properties (i) weighted- The PCA is a method used to reduce the number
averages in individual pedons, (ii) data from the first A (A, Ap or of soil properties without losing important informa-
Al) horizon, and (iii) data from Albany, Dothan, and Orangeburg tion. In general, such analysis determines the principal
soil series. The latter data were tested by a nested analysis of var- axes of a multidimensional configuration as well as
iance to find out if the properties selected were actually differen- the coordinates of each individual in the population
tiating properties. Principal component analysis (PCA) was used as relative to those axes. Subsequently, data can be rep-
an unbiased method to make the selection. The first five principal resented in a few dimensions by projecting the point
components (PCs) explained more than 73% of the total variance. orthogonally onto the principal axes. The basic idea
Analyses of eigenvectors, collinearity, and correlation coefficients of PCA is to create new variables called PCs (Ann and
between soil properties and PCs were also employed. Total sand, Clark, 1984). Each new variable is a linear combina-
fine sand, clay, and organic carbon contents were selected by the tion of the X variables and can be written as
PCA as the important properties. These properties except fine sand
were validated by a nested analysis of variance. The analysis of PC = An Xt + A12X2 + ... Afj Xj [1]
variance indicated these properties had a large variation among soil
series and/or among horizons within the same soil series. where PC = principal component, Ati = coefficient
(eigenvector), and Xj = variable.
Coefficients of these linear combinations are chosen
to satisfy the following requirements
T HE MULTIVARIATE CHARACTER of Soil is Well TCC-
ognized in that a large set of measurements of
soil properties (morphological, chemical, biological,
1. Variance PC, > variance PC2 > ... variance
n-
physical, and mineralogical) can be derived from a 2. The values of any two PCs are uncorrelated.
single sample. The complete set of available data is 3. For any PC the sum of the squares of the coef-
not always used for analyses because the selection of ficients is one.
soil properties depends on the objectives of the study The PCA derives a small number of PCs that ex-
and also reflects the constraints imposed by cost, time, plain as much of the total variance as possible. The
effort, and access. data selected by PCA can be used with a numerical
There is no doubt that logically-correlated variables classification to decide if such a classification grouped
are generally so highly covariant that one or the other data satisfactorily (Cuanalo and Webster, 1970; Bur-
should not be included in the analysis. Consequently, rough and Webster, 1976; Williams and Rayner, 1977).
in the process of selecting soil properties for study, Richardson and Bigler (1984) applied PCA to relate
there is an important question to answer; are the soil soil properties to soil development and plant growth
properties selected the most important to represent in wetlands of North Dakota. Norris (1972) used PCA
the variability of the complete set of data? to study variation in soil morphological properties. He
When a soil property is measured for a set of in- concluded that PCA served as a summary of the soil
dividual sampling units, the measured values can be variation because PCs accounted for a known per-
represented by their positions along a single line. The centage of the soil variation and were defined in terms
relationship between any pair of individuals can be of the properties used to describe the soil. PCA also
has been used to grade the chemical potentiality of
M.E. Collins, Soil Science Dep., G-159 McCarty Hall, Univ. of Flor- soils and, thus the fertility status among soils (Kyuma
ida, Gainesville, FL 32611. F.A. Ovalles, FONAIAP-CENIAP, Sec- and Kawaguchi, 1973). Edmonds et al. (1985) em-
cion de Suelos, Aptdo. 4653, El Limon, Maracay 2101, Aragua,
Venezuela. Contribution of the Soil Genesis and Characterization ployed PCA as a first step in using cluster and dis-
Program, Inst. of Food and Agric. Sciences, Univ. of Florida, criminant analyses to study taxonomic variation
Gainesville, FL 32611. Florida Agric. Exp. Stn. Journal Series no. within three map units. Webster and Burrough (1972)
8402. Received 4 Nov. 1987. ""Corresponding author.
used PCA to establish that the soil properties deter-
Published in Soil Sci. Soc. Am. J. 52:1430-1435 (1988). mined in the field (e.g. soil color, total penetrable soil
OVALLES & COLLINS: VARIABILITY OF NORTHWEST FLORIDA SOILS 1431

Table 1. Order, great group, and number of pedons studied within Pedon location, description, and sampling were per-
each great group. formed by soil scientists from the USDA-SCS and the Soil
Order Great group No. of pedons studied Characterization Lab., Soil Science Dep., Univ. of Florida.
Physical and chemical analyses of the soils were made by
Alfisols Hapludalfs 2 personnel of the University's Soil Characterization Lab. Pro-
Ochraqualfs 2 cedures used for the chemical and physical analyses are out-
Entisols Quartzipsamments 5 lined by Calhoun et al. (1974) and by Carlisle et al. (1978;
Fluvaquents 2
1
1981; 1985).
Inceptisols Dystrochrepts
Humaquepts 1
Spodosols Haplaquods 2 Data Sets
Ultisols Hapludults 10 Two sets of data were used for the PCA. One set included
Paleudults 97 the weighted-average of selected soil properties in the indi-
Paleaquults 15
Albaquults 1 vidual pedons. Horizon thickness was the weighting crite-
Ochraquults 2 rion. The second set of data was composed of the selected
Nondesignated series! 11 soil properties from the first A (A, Ap, or Al) horizon.
Total 151 Only data from the Albany (loamy, siliceous, thermic
Grossarenic Paleudults), Dothan (fine-loamy, siliceous,
t These pedons have not been classified. thermic Plinthic Paleudults), and Orangeburg (fine-loamy,
siliceous, thermic Typic Paleudults) soil series were used in
depth) were closely correlated and were represented the analysis of variance. Eight pedons were selected for each
in the first PC. Properties measured in the laboratory series.
[e.g. cation exchange capacity (CEC), pH] contributed
most to the second PC. Statistical Analyses
A large soil data base is available in Florida, but Statistical Analyses System software (SAS Institute, Inc.,
limited research has been conducted in order to sta- 1982a,b) was used. The PCA was employed for the selection
tistically study the effects of individual soil properties of soil properties important for explaining the soil variabil-
on soil variability. For this reason 151 pedons located ity. Because the soil properties studied had different mea-
in northwest Florida were selected to (i) determine surement units, there was a risk of having heterogeneous
which properties most strongly influenced soil varia- variances. An important assumption in this analysis in-
bility and (ii) test if the properties selected were ac- volves the homogeneity of variances (Afifi and Clark, 1984).
tually differentiating properties in the study area. Therefore, soil property values were standardized with mean
equal to 0 and variance equal to 1. As a result, all PCs were
derived from the correlation matrix instead of from the co-
MATERIALS AND METHODS variance matrix. Eigenvalues (variances) and eigenvectors
Description of Study Area (coefficients) of PCs were obtained using the PRINCOMP
procedure.
The area studied lies in the Coastal Plain Province in The number of PCs was determined by using a rule of
northwest Florida (Fig. 1). Topography varies from nearly thumb (Afifi and Clark, 1984) that the PCs selected are those
level to 35% slopes. Elevations range from 16 to 114m above that explain at least 100/.P percent of the total variance, where
sea level. P is the number of variables. Therefore, the PCs had a ei-
The climate of the area is characterized by long, warm genvalue that represented at least 5% of the total variance.
summers and short, mild winters (Bradley, 1974). The av- Eigenvectors for each PC were selected on the basis of hav-
erage annual temperature is approximately 21 °C. Maxima
of about 38 °C occur in June to August and minima of about
—10 °C occur in January and February. The average annual Jackson
rainfall ranges between 1400 and 1660 mm with most of the
rainfall occurring during the summer months.
Soils are mainly underlain by the Citronelle Formation,
the Crystal River Formation, and by undifferentiated Mio-
cene and Oligocene sediments (Fernald, 1981). The soils have
a low level of natural fertility (Duffee et al., 1979; 1984;
Sanders, 1981; Sullivan et al., 1975; Weeks et al., 1980).
Approximately 83% of the soils used in this study were clas-
sified in subgroups of Ultisols (Table 1).
Data Source
Data from 151 pedons (Calhoun et al., 1974; Carlisle et
al., 1978; 1981; 1985; Institute of Food and Agricultural Sci-
ences, Soil Characterization Lab., unpublished data) were
used for the study. In total, 20 soil properties [horizon thick-
ness; very coarse, coarse, medium, fine, and very fine sand
fractions; total sand, silt, and clay contents; pH in water
(pHl) and in KC1 (pH2); organic carbon (OC) content; ex-
tractable Ca, Mg, Na, and K; total bases; extractable acidity;
CEC; and base saturation] were selected. The criterion for
selection was that these soil properties had been measured
for each horizon of the 151 pedons. The number of horizons
per pedon varied between 4 and 7. There were a total of
19 820 observations. Fig. 1. Location of counties from which pedons were selected.
1432 SOIL SCI. SOC. AM. J., VOL. 52, 1988

ing a value larger than the value calculated using the rela- genvectors may show a clearer contribution to one of the
tionship PCs.
SC = 0.5/(PC eigenvalue)1/2 [2] Each PC is a linear combination of standard variables
having the eigenvectors as coefficients. Therefore, collinear-
where SC = selection criterion. ity between variables can be a problem. The use of highly
The PLOT procedure was used to graphically display the correlated variables produces estimates with high standard
eigenvectors. A varimax rotation by the FACTOR proce- errors (SAS Institute Inc., 1982b). These estimates are very
dure (orthogonal rotation of axes) was performed because sensitive to slight changes in the data. The REG procedure,
some of the eigenvectors did not show a clear contribution least-squares estimates to linear regression models, with the
to a particular PC. With a rotation of axes some of the ei- option COLLIN was used for the analysis of collmearity.

i.oo t CEC
TB
EXT
0.75 •
Ca

- 0.50 '
t-
LJ

P? 0.25 • BS

O
0
-£ o.oo
o
SE -0.25
Q.

-0.50 ••

See Table 3 for abbreviations


-0.75

-0.5 -0.3 -O.I 0.0 O.I 0.3 0.5 0.7 0.9

CEC
EXT
TB
0.3 -.

0.2 --

O.I -

Z ,
Q
0.0 --

O
PH2
^ H
CL
O
-0.2 --

-0.3 -•

-0.4 .-

-0.5 -.
See Table 3 for abbreviations
-t-
-0.3 -0.1 0.0 O.I 0.3

PRINCIPAL COMPONENT 2
(Eigenvectors)
Fig. 2. Location of standardized weighted-average values of soil properties of pedons a) in the plane of the first two principal components and
b) in the plane of the first two PCs following orthogonal axis rotation. Legend for the abbreviations is in Table 3.
OVALLES & COLONS: VARIABILITY OF NORTHWEST FLORIDA SOILS 1433

Also, variables with a tolerance lower than 0.01 were not Table 2. Proportion of total variance explained by each principal
considered in the analysis (Afifi and Clark, 1984). Tolerance component for standardized weighted-average pedon and A
is denned as horizon data.
T = 1- R [3] Cumulative
where T = tolerance, and R = coefficient of multiple cor- Eigenvalue Proportion proportion
relation. component Pedon A horizon Pedon A horizon Pedon A horizon
Finally, the correlation coefficient between the PCs and ————— V,j
the soil properties was computed using the equation
1 5.9119 5.5308 29.56 27.65 29.56 27.65
ry = a, (VAR PC)1/2 [4] 2 3.0450 3.8348 15.23 19.17 44.79 46.82
3 2.5310 2.6524 12.65 13.26 57.44 60.08
where r,j = correlation coefficient, au = eigenvector, and 4 1.9153 1.6807 9.57 8.40 67.01 68.48
VAR PC = PC eigenvalue. 5 1.2385 1.0856 6.19 5.43 73.20 73.91
A nested analysis of variance (Freund and Littell, 1981; 6 0.8040 0.9913 4.02 4.96 77.22 78.87
Montgomery, 1976) was also utilized to create a clearer un-
derstanding of the variation among series, pedons within
series, and horizons within pedons in the Albany, Dothan,
and Orangeburg soil series.
Table 3. Eigenvectors of the correlation matrix for standardized
weighted soil properties of pedons and A horizons.
RESULTS AND DISCUSSION
Principal component):
One measure of the amount of information con- Soil
veyed by each PC is its variance. For this reason, the propertyt 1 2 3 4 5
PCs are commonly arranged in order of decreasing HT -0.4341
variance. The most informative PC is the first, whereas VC -0.3695
-0.3346
the last is the least informative. The first five PCs were C -0.4635
selected for further analysis (Table 2). Each of them -0.2945 -0.3872
explained more than 5% of the total variance and in M 0.2394 -0.3233
aggregate they explained 73% of the cumulative var- -0.2206
iance. F 0.4836
Plots effectively display the relationships between 0.4110
soil properties and PCs. The association between the VF 0.4017
-0.2672 -0.5020
first two PCs and the 20 soil properties are shown in TS -0.3437
Fig. 2. These plots help to identify the important soil -0.3654
properties [large absolute values located close to the sat 0.4111
axis of the PC, e.g., CEC and fine sand (F) with prin- -0.3093
cipal component 1 (PCI); and base saturation and very Clay 0.3109
0.3280
fine sand (VF) with principal component 2 (PC2)] to pHl 0.3867
be evaluated further in other statistical analyses. 0.3058 0.3290
Some properties did not have a clear contribution pH2 0.4156
to an individual PC, such as coarse (C) and medium 0.3251
(M) sand fractions and extractable Mg content (Fig. OC 0.5700
2a). A varimax rotation was performed to show that 0.2990
those soil properties with initially no clear contribu- Ca 0.2654 0.3372
0.2397 0.2909
tion to an individual PC were closer to the axis of Mg 0.2552
PCI (Fig. 2b). 0.2157 0.2739
Eigenvectors were also calculated for each PC (Ta- Na 0.2590
ble 3). Selected eigenvectors for the PCI had an ab- 0.6454
solute value larger than the SC value. Soil properties K 0.2577
selected as important constituents of PCI, based on 0.2722 0.4142
TB 0.3007 0.3353
the SC value for the individual pedons, included me- 0.4085
dium and total sand contents; clay content; extracta- Ext 0.3131
ble Ca, Mg, Na, and K contents; total bases; extract- 0.3312
able acidity; and CEC. Eigenvectors for PC2, principal CEC 0.3755
component 3 (PC3), principal component 4 (PC4), and 0.3238
principal component 5 (PCS) and soil properties se- BS 0.4214
0.2971
lected by this analysis for pedons and for all PCs for sc§ 0.2056 0.2865 0.3143 0.3613 0.4493
the A-horizpn data are listed in Table 3. 0.2126 0.2553 0.3070 0.3857 0.4605
Collinearity among variables may be a problem for t HT = Horizon thickness; VC = very coarse sand; C = coarse sand; M
those soil properties previously selected by the eigen- = medium sand; F = fine sand; VF = very fine sand; TS = total sand;
vectors. Soil properties selected using eigenvectors but pHl = pH in water; pH2 = pH in KC1; OC = organic carbon; Ca, Mg,
with a Tolerance (T) < 0.01 were considered to be Na, and K = extractable Ca, Mg, Na, and K, respectively; TB = total
bases, Ext = extractable acidity, CEC = cation exchange capacity, BS
highly intercorrelated and were also excluded (Tables = base saturation, SC = selection criterion. HT is expressed in cm; VC,
4 and 5). PC5 was not included in the analysis of col- C, M, F, VF, TS, Silt, Clay, OC, and BS are expressed as percent; Ca,
linearity for the individual pedons because all eigen- Mg, Na, K, TB, EXT, and CEC are expressed as cmol/kg.
t Upper number refers to pedons. Lower number refers to A horizons.
vectors had an absolute value smaller than SC, but § SC = 0.5/(principal component eigenvalue)1'2. All values shown have an
PCS was included for the A horizon analyses. absolute value larger than the corresponding SC value.
1434 SOIL SCI. SOC. AM. J., VOL. 52, 1988

According to the T criterion, extractable Ca, Mg, A large proportion of the properties had a large var-
Na, and K contents—total bases, extractable acidity, iation among pedons belonging to the same soil series
and CEC were all highly intercorrelated for PCI pe- (pedon source of variation) and/or a large unexplained
dpns and therefore excluded from further analyses. variability (error source of variation). Total sand and
Similar reduction of variables was applied to other clay contents had large variability because of differ-
PCs (Table 4). Only A horizon total sand was highly ences among soil series (e.g., Grossarenic and Typic
intercorrelated for all PCs (Table 5) and, therefore, subgroups) and among horizons (e.g. E and Bt hori-
eliminated for further analysis. zons) within pedons belonging to the same soil series.
A final reduction was made by calculating correla- Organic carbon content variability was explained by
tion coefficients between soil properties and PCs. A the differences among horizons (e.g. A and E hori-
large correlation coefficient (>|0.75|) was selected ar- zons). Other soil properties had a large variation be-
bitrarily as the criterion to reduce even further the tween soil series and/or between horizons (e.g. ex-
number of soil properties. Based on the correlation tractable Na and Mg contents) but, according to the
coefficients, fine sand (f = 0.77), total sand (-0.84), PCA, they were not important properties for explain-
clay (0.76), and OC (0.79) contents were selected. Other ing the total variability of the study area.
soil properties also had large coefficients, but had been Fine sand was selected by the PCA, but according
previously eliminated because of small eigenvectors to the nested analysis of variance, a large part of its
or low Ts. Only OC (0.76) and clay (0.77) contents variability was explained by the differences among pe-
were selected as important soil properties for explain- dons within soil series. Therefore, fine sand was not
ing the variability of A horizons. considered an important differentiating property.
Analysis of Variance
Several soil properties were statistically selected as CONCLUSIONS
important to explain soil variability in the area stud- Total sand, fine sand, clay, and OC contents were
ied. But we were also interested in determining if the selected by the PCA as properties to explain total var-
properties selected are actually differentiating prop- iance in selected northwest Florida soils. The nested
erties. Theoretically, a large variation may occur analysis of variance demonstrated that, for both pe-
among soil series and among horizons within pedons don and A-horizon data, most of the soil properties
belonging to the same soil series. For this reason a selected by the PCA (except fine sand) were important
nested analysis of variance was employed using data as differentiating properties among soil series and/or
from the Albany, Dothan, and Orangeburg series where horizons. In addition, the nested analysis of variance
soil series, pedons within soil series, and horizons results also validated the use of a large correlation
within each pedon were considered as sources of var- coefficient between PCs and soil properties. Large coef-
iation (Table 6). ficients allowed the selection of those properties with
large variability among soil series and among horizons
Table 4. Tolerance (T) of standardized weighted soil properties by within the same soil series.
principal component (pedons). The statistically-selected soil properties are impor-
Principal component tant to determine specific soil potentials, for example
1 2 3 4 fertility and irrigation. Thus, the variability of the se-
lected soil properties affects the accuracy of the pre-
Soil Soil Soil Soil dictions for these specific performances.
property! n property T property T property T
M 0.69 pHl 0.61 VC 0.45 HT 0.90
TS 0.11 pH2 0.54 C 0.22 Silt 0.73 Table 6. Nested analysis of variance of studied soil properties
Clay 0.14 Ca 0.08 M 0.36 OC 0.73 within and among the Albany, Dothan, and Orangeburg series
TB 0.08 F 0.78 and among horizons within the same series.
BS 0.58
Source of variation, %
t See abbreviations Table 3. Soil
t 71 = 1 - R (R - coefficient of multiple correlation). propertyf Soil series PedonJ Horizon Error
HT 4.1 0.0 3.5 92.4
Table 5. Tolerance (T) of standardized weighted soil properties by VC 0.6 64.2 3.1 32.1
principal component (A horizons). C 2.3 79.5 0.2 18.0
M 0.0 83.7 8.3 8.0
Principal component F 8.7 57.3 18.2 15.8
VF 0.0 91.1 3.1 5.8
TS 51.8 0.0 34.0 14.2
Soil Soil Soil Soil Soil Silt 7.0 34.6 21.5 36.9
property t T$ property T property T property T property 7* Clay 36.7 0.0 43.1 20.2
pHl 7.2 36.9 4.5 51.4
M 0.69 C 0.61 VC 0.27 K 0.64 VF 0.99 pH2 0.0 26.0 9.2 64.8
Silt 0.09 pHl 0.38 C 0.24 TB 0.64 Na 0.99 OC 1.4 0.0 95.0 3.6
Clay 0.03 pH2 0.36 F 0.73 Ca 12.7 0.0 77.5 9.7
OC 0.23 Ca 0.23 pHl 0.96 Mg 26.7 0.0 61.5 11.8
Ca 0.28 Mg 0.39 Na 7.0 0.0 88.8 4.2
Mg 0.35 Ext 0.36 K 0.0 30.2 25.6 44.2
K 0.56 TB 5.7 23.9 27.9 42.4
CEC 0.14 Ext 0.0 0.0 81.0 19.0
BS 0.32 CEC 0.0 37.3 48.5 14.2
BS 9.8 22.4 23.4 44.3
t See abbreviations Table 3.
j T = 1 — R (R = coefficient of multiple correlation). t See abbreviations Table 3. t Pedon within soil series.
POTTER ET AL.: PHYSICAL PROPERTIES OF CONSTRUCTED AND UNDISTURBED SOILS 1435

This study showed that, because of the multivariate


character of soils, the selection of variables must be
based on some quantitative method. Otherwise, a
biased selection of variables can introduce a large
source of error in the results. Conversely, the use of
the complete set of data would add more complexity
to the analysis.
One use of the information obtained in this inves-
tigation would be in a geostatistical study. Semivari-
ances could be determined and weighted-average
kriged data and standard errors (reliability) of kriged
data could be mapped.

You might also like