You are on page 1of 4

Leaf traits

differentiating between
evergreen and deciduous plants

Ecological data analysis 2020


Final assignment
Analysis of “evergreen”

xxxxxxxxxxx
Student no.: xxxxxxxxx
1. Choice of statistical analysis
The aim of this analysis was to assess which leaf trait (or combination of traits) best differentiates between
evergreen and deciduous plants. The response variable is binary: the plant is either evergreen or deciduous.
Therefore, I used logistic regression to model the probability of being an evergreen plant based on several
(continuous) leaf traits.
Because there are many leaf traits that could be of importance, I performed model selection to evaluate which
of the measured traits, nitrogen content (%), phosphorus content (%), specific leaf area (cm2/g) and lignin content
(%), contribute to an optimal model for predicting being evergreen or deciduous that is best supported by this data.

2. Statistical hypotheses
H0: The probability of being evergreen is not associated with any of the leaf traits (or combination of traits); the
slope of the regression line(s), b1, equals 0.

3. Analysis
Before the analysis I verified that the response variable was indeed binary; evergreen plants were coded as 1
and deciduous plants as 0. I also checked whether all possible predictor variables were numerical, which was the
case.
For a logistic regression to be valid, collinearity between the predictor variables should be avoided. However,
the predictor variables nitrogen content (N) and phosphorus content (P) are rather strongly correlated (r = 0.74),
which could make a model including both unreliable. All other possible predictor variables did not show strong
correlations (all r < 0.5).
I started with fitting a full model with all predictor variables; there were no convergence issues. For the model
selection procedure, I employed forward selection whilst comparing the models using Akaike information criteria
(AIC), which take both the fit and complexity of the model into account. First, I fitted the four simplest models with
only one of the predictors each. The model with P had the lowest AIC score, followed by the model with N. However,
when adding them together in a slightly more complex model, their respective coefficients changed more than 10%,
which indicates confounding factors. As mentioned, N and P were strongly correlated, so I decided against using
both. I also concluded that the full model was not to be trusted.
The one-predictor model with P had the lowest AIC score, so I continued without adding N to the model. I tried
adding lignin content (lignin), which did not improve the model fit according to the AIC score. Next, I added specific
leaf area (SLA), which improved the model fit. Adding lignin to this model did again not lower the AIC score.
Lastly, I added the interaction between P and SLA and this reduced the AIC score again, after which I took this
model (with P and SLA and their interaction as predictor variables) as my optimal model. I used ‘summary’ to look at
the estimates and p-values of the predictors and interaction. The significance criterion was set at 0.05.
Visual inspection of diagnostic plots showed there were no concerning outliers.

4. R script
See additional file Script_evergreen_xxxxxxxxx.R.
5. Result section
Forward model selection pointed towards a generalized linear model including the plant traits phosphorus
content (P) and specific leaf area (SLA) and their interaction as the optimal model to differentiate between
evergreen and deciduous plants.
The logistic model revealed a significant interaction effect between phosphorus content (P) and specific leaf
area (SLA) on the probability of being an evergreen plant (b = 2.072, SE = 0.809, p < 0.05), see Table 1. Therefore, the
effect of P on the probability of being an evergreen plant depends on the level of SLA. As depicted in Figure 1, the
interaction shows that the negative effect of P is especially strong for low SLA, but comes less pronounced for higher
SLA.

Table 1: Model output for the generalized linear model with


phosphorus content (P) and specific leaf area (SLA) predicting being
evergreen.
Estimate Std. error z-value p-value
Intercept 7.188 1.792 4.010 < 0.001
P -67.760 15.322 -4.423 < 0.001
SLA -0.130 0.115 -1.136 0.256
P*SLA 2.072 0.809 2.561 0.010

Figure 1: The
effect of green leaf phosphorus content on the probability of being evergreen, modulated by the specific leaf area. The y-axis shows the
probability of being an evergreen plant, where 1 is evergreen and 0 deciduous. The x-axis shows the green leaf phosphorus content in
percentage. The blue dots and lines (with standard errors) represent different levels of specific leaf area in cm2/g.
6. Conclusion
The present analysis revealed that green leaf phosphorus content and specific leaf area, and in particular their
interaction, are traits that can be used to differentiate between evergreen and deciduous plants. The negative effect
of increasing phosphorus content on the probability of being evergreen depends on the thickness of the leaf. That
higher phosphorus content is associated with a higher probability of being a deciduous plant appears to be especially
the case for thick leaves (low specific leaf area). On the other hand, for thinner leaves, the probability of being
evergreen is only moderately negatively influenced by the green leaf phosphorus content.
The combination of green leaf phosphorus content and specific leaf area can be used to predict whether a plant
is evergreen or deciduous.

7. Issues for interpreting results


The estimates of the model are hard to interpret, since they are based on the change for one unit of x, while the
measured phosphorus content was always well below one. So I just looked at the directions of the effects for the
interpretation. Also, looking at Figure 1, it becomes clear that predicting from thinner leaves (high SLA) with high
phosphorus content is not reliable, since the standard errors are extremely large.

You might also like