You are on page 1of 13

Published March 20, 2015

Research

Analysis of Series of Cultivar Trials with


Perennial Grasses for Subdivided Target Regions
Thomas Eckl and Hans-Peter Piepho*

T. Eckl, Bavarian State Research Center for Agriculture, Freising, Ger-


Abstract many. H-P. Piepho, Biostatistics Unit, Institute of Crop Science, Univ.
Field trials with perennial grasses may often of Hohenheim, Germany. Received 28 Apr. 2014. *Corresponding
be conducted at several locations with differ- author (piepho@uni-hohenheim.de).
ent starting years. A key issue in the analysis of
Abbreviations: AIC, Akaike Information Criterion; BLUE, best lin-
such trials is the distinction between effects of
ear unbiased estimation; BLUP, best linear unbiased prediction; CS,
calendar years, which are associated with exter-
compound symmetry.
nal environmental variation, and harvest years,
which represent internal yield formation pro-
cesses of the perennial crop. Furthermore, anal-
ysis of field trials with perennial grasses needs to
account for serial correlation of observations on
the same plot from consecutive harvest years.
T he Bavarian State Research Center for Agriculture (Freising,
Germany) routinely conducts cultivar trials to provide crop
production advice and support to farmers. Field trials with peren-
Such analyses are conveniently implemented nial crops such as ryegrass (Lolium perenne L.) involve repeated
using mixed models. Here, we consider series measurements taken on the same plot on several occasions. Such
of trials when the target region is subdivided into yield trials are frequently conducted at multiple locations to eval-
several zones. We show how cultivar yield means
uate newly released ryegrass cultivars with annual yields recorded
per zone can be estimated borrowing strength
per plot for three consecutive years. In Germany, each federal
across zones. The proposed mixed models are
illustrated using simulated data generated by
state such as Bavaria used to present results of its conducted cul-
employing variance component estimates from tivar trials separately. Thus, yield estimates were limited to trials
real experiments. It is shown in simulations that within administrative borders. To increase the precision and effi-
best linear unbiased prediction (BLUP) can pro- ciency of these trials, agroecological zones have recently been
vide more precise zone-specific mean estimates defined across Germany, which differ in their habitat features and
than alternative methods. their relevance for cultivar-specific reactions.
Trial locations are chosen to be representative of target zones
and thus are assigned to exactly one zone. The associated analysis
method allows combining yield trials of neighboring zones (Atlin
et al., 2000; Michel et al., 2007; Piepho and Möhring, 2005) to
provide yield estimates with high precision on a zonal basis. Up
to now, these types of multienvironment trials have only been
analyzed across zones for annual crops. The implementation and

Published in Crop Sci. 55:597–609 (2015).


doi: 10.2135/cropsci2014.04.0327
© Crop Science Society of America | 5585 Guilford Rd., Madison, WI 53711 USA
All rights reserved. No part of this periodical may be reproduced or transmitted in any
form or by any means, electronic or mechanical, including photocopying, recording,
or any information storage and retrieval system, without permission in writing from
the publisher. Permission for printing and for reprinting the material contained herein
has been obtained by the publisher.

crop science, vol. 55, march– april 2015  www.crops.org 597


adaptation of this method has not yet been performed for MATERIALS AND METHODS
perennial forage crops. The statistical demands for peren- The derivation of the simulated dataset is described in the
nial crops are more intricate than for annual crops due following section. Thereafter, mixed models for unbiased esti-
to the additional complexities arising from the repeated mation of cultivar means across zones are developed. We show
the complexity of a single-stage model can be reduced by a
measurements and the partial temporal overlap of trials
stagewise approach of the analysis. For this purpose, we pres-
with different starting years. ent appropriate two-stage and three-stage models, at first for a
A crucial point in the statistical analysis of trials with simpler case with only one trial, followed by the more complex
perennial crops is that environmental influences have a case with several trials. At the end of our model development,
more permanent effect on perennial crops than they have we show how to modify the models to increase the precision
on annual crops. Crop yields from consecutive harvest of the estimated means by combining information of the target
years are not independent from the conditions of the pre- zone and the neighboring zones.
vious years. One further characteristic of perennial crops
is that a trial comprises multiple calendar years as well as The Data
multiple harvest years but calendar year and harvest year Empirical Data
are not identical factors. Effects of calendar years repre- We used data from ryegrass cultivar trials conducted at 16 loca-
sent the interaction of weather conditions and agricultural tions divided into six zones in southern Germany.
management whereas effects of harvest years are based on In total, 48 cultivars were tested between 2001 and 2011,
factors like aging of the sward and gaps in the plots. with first harvest years in 2001 to 2009. All trials were laid out
The main goal of the analysis is to obtain the best esti- as randomized complete block designs. Depending on location
mate of differences among cultivars for each given zone and and growing conditions, the number of cuts ranged from three
to seven per year. The yield of all cuts for a given trial and har-
as a result gain well-founded information to support crop
vest year were summed by plot and used as the response variable.
production advice without having to increase trial costs. The resulting dataset was heavily unbalanced (i.e., there
Particular emphasis needs to be given to modeling the serial were cultivars in each trial series, which could only be found
correlation among consecutive measurements on the same in a few trial years). The same also applied to trial locations.
experimental unit as well as to separating the effect of cal- Furthermore, some cultivars were targeted for specific zones
endar year from harvest year in case of staggered starts. and hence were grown in only a subset of the trials. For a sin-
This paper presents a general method to analyze trials gle-stage analysis including every starting year, location, and
adapted for the division of a target region into zones cultivar as shown in [24] on page 6 of Supplemental Appendix
(Hartmann, 2010; Kleinknecht et al., 2013; Michel et al., S1, we ran into computational problems regarding computation
2007), considering the repeated-measures structure aris- time, memory space, and convergence.
ing in trials with perennial crops such as grasses. For more
complex data, we describe a three-stage analysis in which Simulated Data
cultivar means per trial and harvest year computed in the Our main goal was to develop methods of analysis for which
complex data do not cause any of these problems. Single-stage
first stage are analyzed across environments in two fur-
analysis can be considered as a gold standard against which to
ther stages. Our approach to separating harvest year from compare stagewise methods (Piepho et al., 2012). The newly
calendar year effects is similar in spirit to that used by developed methods should produce results similar to a full
Casler (1999) and Conaghan et al. (2008). To exemplify single-stage analysis. Because of the extreme imbalances and
the models and methods, we created simulated data using the resulting computational difficulties with the real data, we
variance component estimates from a dataset on series of decided to use simulated data for evaluation of alternative
regional yield trials with ryegrass. approaches. Thus, we used variance component estimates from
Using these simulated data, the developed mixed the real data to create a simulated dataset that was smaller and
models for two-stage and three-stage analysis are compared more balanced than the real dataset at hand. This gave us the
according to the resulting adjusted means and their pair- possibility to obtain results for a single-stage analysis and to
wise correlation between analyses. At the end of our model compare the developed two-stage and three-stage methods with
development, we show how to modify the models, by mod- single-stage analysis and try different covariance structures.
To determine parameter values for the data simulation, we
eling within-zone cultivar effects as random, to increase the
had to reduce the original dataset. A core set of four cultivars
precision of the estimated means by combining information which were tested in most of the trials was included, yield-
of the target zone and the neighboring zones. Supplemental ing a dataset of cultivar trials conducted within 10 yr at seven
Appendix S1 contains the SAS (Littell et al., 1998) program locations split into three zones, with four cultivars laid out as a
code for implementation of our proposed models. randomized complete block design with three replicates. Using
the reduced data, the covariance parameter estimates needed
for the simulation were obtained with the single-stage mixed
model shown in Model [24] in Supplemental Appendix S1.

598 www.crops.org crop science, vol. 55, march– april 2015


With the covariance parameters estimates obtained from Table 1. Structure of factors zone (ZON) and location (LOC).
the reduced, original dataset, we simulated a new, more bal- ID Explanation
anced dataset for further analysis. This simulated dataset was
Response variable
generated applying the method described below. An Excel file
YLD Cumulative yield per plot and year
with the simulated data is provided as Supplemental Appendix (sum of all cuts in a given harvest year)
S2. The dataset simulates trials conducted within 5 yr in four Treatment factor
zones, each containing four locations with 10 cultivars laid out GEN Cultivar
as a randomized complete block design with three replicates. We Blocking factors
simulated a response for each plot. Observations are correlated ZON Zone; see Fig. 1
due to the random effects of the assumed linear mixed models. LOC Location; nested within ZON
The general form of the model is given by TRL Trial; this identifies an experiment running up to
three consecutive years nested within LOC
y = Xb  Zu  e, YR Calendar year (i.e., year a given trial was established)
BLK Block of randomized complete block design;
where X is the design matrix for the fixed effects, Z is the design nested within TRL
matrix for the random effects, b is the vector of fixed effects PLT Plot, nested within BLK
(associated with factors GEN, HAR, and ZON), u is the vector Repeated factor
of random effects, and e is a residual error term. Random effects HAR Harvest year (i.e., current year a trial is harvested)
(u, associated with factors YR, LOC, TRL, and BLK) and errors
(e) are assumed to be independent with zero mean vectors and For clarity, a brief description of the factors is given here.
variance–covariance matrices G and R, respectively. Hence, the A trial (TRL) with a perennial crop corresponds to a single
data y follows a multivariate normal distribution with mean Xb experiment at a single location, which, in our case, is conducted
and variance–covariance matrix V = ZGZT + R. during three harvest years. Thus, a trial with a perennial crop
To create correlated data y for the assumed design follow- is introduced as a multi–calendar year (YR) entity consisting of
ing the single-stage mixed model shown in the Supplemental consecutive harvest years (HAR) and is nested within locations
Appendix S1 in Model [24], we made use of the Cholesky (LOC). A harvest year typically comprises three cuts within the
decomposition of the variance-covariance matrix of the data current calendar year. The number of cuts sometimes goes up
(V). For this procedure, we first estimated the variance compo- to seven per year. The yields of all cuts in a year are summed
nents from the reduced dataset described above. These estimates per plot and used as response variable (YLD) in our models. At
were then used as parameter values (PARMS statement in SAS) a location, several trials may be run simultaneously in the same
in a single-stage mixed model to calculate a variance-covariance calendar year. These trials usually are staggered in the sense that
matrix V of the data for the assumed design of the simulated they have different start years. Trial locations are chosen to be
trials. The computed matrix V was then factored uniquely into representative of target zones, assigned to exactly one zone and
a product V = CCT with the Cholesky decomposition of V, thus nested within zones (ZON).
where C, an upper triangular matrix, is called the Cholesky
root and CT is the transpose of C. Model Derivation for Unbiased Estimation
A random vector z T = {z1 ...z n }, z ~ N (0, I ) of indepen- of Cultivar Means per Zone
dent standard normal deviates was generated to calculate the In this section we develop models that allow estimation of cul-
correlated vector of standard normal deviates x = Cz so that x tivar means per zone, taking the cultivar effects as fixed. These
~ N(0,V) with Var(Cz) = CICT. Consequently, the simulated models essentially exploit only the information per zone for
data y = x + 1μ, where μ is the intercept, are normally distrib- estimating cultivar yields per zone. The method of estimation is
uted with the mean E(y) = 1μ, where 1 is a vector of ones. best linear unbiased estimation (BLUE). In the next section, we
consider a modification of this approach, based on random cul-
Definition of Variables for Analysis tivar effects, that allows borrowing strength from other zones.
The simulated dataset contains the variables listed below in The method of estimation used for that approach will be BLUP.
Table 1, which will be used to formulate mixed models in the Here, we first consider the case of a single trial conducted
subsequent sections. We classified the variables as treatment, at several locations and then extend it to the case of several
block, and repeated factors (Piepho and Eckl, 2014). Treatment trials with different start years. In either of these cases, we first
factors are selected by the experimenter to study and compare present a two-stage approach and then a three-stage approach.
their effect on given response variables. Its levels are randomly
assigned to experimental units. Block factors, on the other hand, Several Locations, One Trial
are used to identify experimental units and arrange them into For the development of a suitable model, we first consider the
homogenous groups or blocks (Milliken and Johnson, 1992, simple case, where trials are started in the same year and thus also
Section 4.2). The block factors can be used to set up a model have the same first and subsequent harvest years. For now, we
that holds when there are no treatment effects. This model may consider only one trial per location. A separation of the effects
also be referred to as block model (Piepho et al., 2003) or design of calendar year and harvest year is not yet necessary in this case.
structure (Milliken et al., 2010). Repeated measurements taken When the trials are completed, in our case after three calendar
on the same experimental unit are characterized by a repeated years, a new set of trials with new cultivars could begin.
factor such as harvest year (Piepho et al., 2004).

crop science, vol. 55, march– april 2015  www.crops.org 599


Figure 1. Division of the target region, Deutschland-Mitte-Süd, into zones: 6, sommertrockene Lagen (dry summer zone); 7, günstige
Übergangslagen (favorable transition zone); 8, Hügelländer Süd (hilly country–South); 9, Mittelgebirgslagen West (uplands–West); 10,
Mittelgebirgslagen Ost (uplands–East); and 11, Voralpengebiet (prealpine region).

A straightforward analysis may be performed when all trials Table 2. Structure of factors zone (ZON), location (LOC)
have the same start year and the same number of harvest years. nested within zone as indicated by consecutive numbering,
The total yield across years may then be fitted by a mixed model calendar year (YR), and harvest year (HAR) when trials with
analyzing the sum of all yields per plot, avoiding any modeling start year 2008 are considered.
of serial correlation of repeated measures. However, when the Calendar year (YR)
number of harvest years is not constant in all trials and some trials ZON LOC 2008 2009 2010
have staggered starts, as will be considered in more detail later, 1 1 HAR = 1 HAR = 2 HAR = 3
more complex modeling is needed. The simulation assumes trials 1 2 HAR = 1 HAR = 2 HAR = 3
conducted in four zones each comprising four trial locations. 4 15 HAR = 1 HAR = 2 HAR = 3
All trials were simulated as a randomized complete block design 4 16 HAR = 1 HAR = 2 HAR = 3
with three complete blocks (BLK). We assumed the first harvest
year to be 2008, and there were three consecutive harvest years
(Table 2). A total of 10 cultivars (GEN) were simulated and the
YLD = GEN + FER + GEN·FER: BLK +
simulated yield per plot was used as response variable.
BLK·GEN·FER,
To formulate a mixed model, we used the framework
described by Piepho et al. (2003, 2004). We want to point
out that we find the used model syntax helpful because mixed where the fixed effects are stated before the colon and random
models are described with a notation that is close to that used effects after the colon. The effect BLK·GEN·FER is under-
with linear mixed model statistical software packages. More- scored to indicate that it is the residual. The dot operator is
over, the structure of random effects with a repeated-measures used to define crossed effects. It corresponds to the asterisk (*)
correlation structure can be specified more clearly. operator in SAS.
As an example, a linear two-way factorial mixed model Thus, our syntax is close to the statements needed for this
with yield coded as YLD, cultivar coded by GEN, fertilizer model in linear model packages such as the MIXED procedure
coded by FER, and random block coded by BLK written as of the SAS system:

600 www.crops.org crop science, vol. 55, march– april 2015


Model YLD = GEN FER GEN*FER; To complete the model, correlation structures for random
effects associated with repeated factors need to be accommo-
Random BLK; dated. In repeated-measures models, subjects correspond to
statistically independent units on which repeated measures
Note that the residual BLK·GEN·FER will be fitted automatically. are taken. In our case, the random effects ZON·LOC and
From theoretical considerations, single-stage analysis ZON·LOC·GEN of the block model for a single level of the
is regarded as preferable and is considered the gold standard. repeated factor now become subjects. These secondary subjects
Nevertheless, because of both its simplicity and computational correspond to means across several plots, the plots being the
efficiency (Piepho and Michel, 2000) and because single-stage primary subjects of the design. The important point here is that
and stagewise analysis often yield very similar results (Möhring taking averages per cultivar across plots generates a new type of
and Piepho, 2009; Piepho and Eckl, 2014), we also consider subject at a higher level of aggregation, which inherits any serial
models for two-stage and three-stage analysis. correlation on the primary subjects. Different observations on
In Piepho and Eckl (2014), we compared several stagewise the same subject taken in different harvest years represent the
approaches to account for the repeated-measures structure, same treatment whereas levels of treatments can change only
including modeling serial correlation at the plot level, which between subjects. Errors from different secondary subjects
appears to be the most obvious method. However, plot level (means across plots) (ZON·LOC·GEN) are independent and
modeling (Method b in Piepho and Eckl, 2014) turned out to need to be distinguished from the part of the effects identify-
be computationally much more demanding than modeling the ing the different observations from the same experimental unit.
serial correlation at the level of cultivar means per harvest year. These effects pertaining to the repeated measurements (HAR)
As both methods were very highly correlated with each other may be serially correlated (Piepho and Eckl, 2014).
and were both highly correlated with single-stage analysis To clarify the structure of random effects with a repeated-
results, we prefer to model serial correlation at the cultivar- measures correlation structure, we expand this model by
mean level for computational efficiency (Method a). For this concatenating the interaction ZON·LOC·GEN as well as the
reason, we here only present results based on that method. interaction ZON·LOC with the repeated factor HAR using the
dot operator (·) and fully crossing the remaining (fixed) effects
Two-Stage Analysis with HAR using the crossing operator (×). Each random effect of
A two-stage analysis is the standard procedure in the analysis the block model and the interaction is modeled as a serially cor-
of official cultivar trials in Germany since it has the practical related effect. Subjects are boldfaced, the repeated factor HAR in
advantage that trials can be analyzed individually in the first these effects is boldfaced and italicized (Piepho and Eckl, 2014).
stage. This makes it easier to consider any specific features In our model, there is a single observation (mean) per culti-
of both the experimental design and the preferred model for var–location combination for every harvest year. Consequently,
within-trial analysis. The two-stage method to analyze sev- the random effect ZON·LOC·GEN represents the lowest-level
eral locations and several trial years where serial correlation is “subject” on which repeated measures on HAR are available
modeled in stage two as presented in Piepho and Eckl (2014, and the effect will thus be written as ZON·LOC·GEN·HAR.
Electronic Appendix S1, Fig. 14) can easily be expanded to The same reasoning applies to the effect ZON·LOC·HAR,
account for the division of a target region into zones. which absorbs the serially correlated block effects (Piepho and
Model for First Stage Analysis. In the first stage, we Eckl, 2014). We fit the following model in the second stage:
compute cultivar means for each harvest year in separate analy-
ses for each trial. A model for first-stage analysis laid out in a GEN × HAR × ZON:
randomized complete block design with fixed effects for cultivar
and block can be found in Supplemental Appendix S1 on page ZON·LOC·HAR + ZON·LOC·GEN·HAR [2]
3 (also see Piepho and Eckl, 2014). This is fitted separately for
every year. We then use a mixed model, to be described below, Resolving the factorial structure, this expands as
to analyze and estimate the serial correlation between cultivar
means in different harvest years in the same trial in stage two. GEN + HAR + ZON + GEN·HAR +
Model for Second Stage Analysis. To develop a model GEN·ZON + ZON·HAR + GEN·ZON·HAR:
for the second stage, we start with the model that would be used
ZON·LOC·HAR + ZON·LOC·GEN·HAR [3]
to analyze a single level of the repeated factor (Piepho et al.,
2003). The mixed model can be formulated as follows, using
the framework described in Piepho et al. (2003, 2004): Three-Stage Analysis. The two-stage approach sug-
gested above can be extended to three stages in a straightforward
way (Piepho et al., 2012). It should be stressed, however, that
GEN + ZON + ZON·GEN:
serial correlation will need to be taken into account throughout
all three stages and thus models used in stage two and three
ZON·LOC + ZON·LOC·GEN, [1]
must be as close as possible to the Model [19] used for single-
stage analysis described in the Supplemental Appendix S1 on
where fixed effects (GEN + ZON + ZON·GEN) precede random
page 3; otherwise, important variances and covariances will not
effects (ZON·LOC + ZON·LOC·GEN) and are separated from
be taken into account.
these by a colon. The dot (·) is an operator for concatenating fac-
tors or effects (corresponding to the asterisk * in SAS).

crop science, vol. 55, march– april 2015  www.crops.org 601


To set up a three-stage analysis, the hierarchy of the design Table 3. Classification of factors trial (TRL), calendar year
needs to be split in three stages. The first stage is the same as (YR), and harvest year (HAR) when trials with several start
before, such that we first compute the cultivar means per trial years are considered.
and harvest year. Since we are mainly interested in estimating the Entries Calendar Year (YR)
Trial
cultivar means, we will be averaging across environment effects No. Common Unique 2008 2009 2010 2011 2012
in the second stage. Care should be taken that the environmental
1 1–3 804–810 HAR = 1 HAR = 2 HAR = 3
covariance and the genetic covariance do not get intermingled
2 1–3 904–910 HAR = 1 HAR = 2 HAR = 3
and remain separable in the third stage of the analysis. There-
3 1–3 1004–1010 HAR = 1 HAR = 2 HAR = 3
fore, in the second stage, we estimate GEN·HAR-means within
zones, which fulfills the conditions stated above. For each zone
(ZON), we fit the following model in the second stage: we identify the individual trials using the factor TRL. This
leads to having both a main effect for YR and a main effect
GEN·HAR: for TRL, which are fundamentally different (Piepho and Eckl,
LOC·HAR + LOC·GEN·HAR [4] 2014). First, we look at the model across zones within a single
harvest year. We fit the following model in the second stage:
The computed GEN·HAR-means can then be evaluated across
zones in stage three with the following model. (GEN + ZON + GEN·ZON:

GEN + HAR + ZON + GEN·HAR + YR + GEN·YR + ZON·YR + ZON·GEN·YR +


GEN·ZON + ZON·HAR: ZON·LOC + ZON·LOC·YR +
ZON·LOC·GEN + ZON·LOC·TRL +
ZON·GEN·HAR [5] ZON·LOC·GEN·YR + ZON·LOC·TRL·GEN [6]

In the previous section, we expanded each random effect of the


Several Locations, Several Trials block model and the interaction as a serially correlated effect for
Yield trials for perennial crops such as ryegrass conducted by the the repeated factor HAR. The factor YR, although part of the
block model, does not correspond to a physical unit on which
Bavarian Research Institute have staggered starts. Specifically,
repeated measures are taken. Therefore, any effects involving
new trials are laid out each year to accommodate new culti-
year will not be expanded as serially correlated.
vars entering the system and thereby being able to give timely Expanding the model by concatenating the random block
advice to producers. To compare cultivars that entered trials in effects as well as the interactions with HAR and fully crossing
different years, a joint analysis is needed. Therefore, we now the remaining effects with HAR leads to:
consider the case of several trials at the same location that do not
necessarily have the same first harvest year. For every one of the GEN + ZON + GEN·ZON:
four zones of the previous simulated dataset, we simulated two
YR + GEN·YR + ZON·YR + ZON·GEN·YR
additional trials for starting years 2009 and 2010. For every new + ZON·LOC·YR + ZON·LOC·GEN·YR)
trial, a core set of three cultivars was kept in each trial, whereas × HAR + ZON·LOC·HAR +
the remaining seven cultivars were replaced by seven new cul- ZON·LOC·GEN·HAR + ZON·LOC·TRL·HAR +
tivars. The core set is comprised of Cultivars 1, 2, and 3. The ZON·LOC·TRL·GEN·HAR [7]
seven cultivars tested in 2008 are numbered 804 to 810. Culti-
vars for 2009 and 2010 are numbered accordingly (Table 3). This When several harvest years are analyzed, the effects
approach makes it possible to combine the information on the ZON·LOC·YR·HAR and ZON·LOC·TRL·HAR
core set of cultivars across the three trials and also to compare and also the effects ZON·LOC·GEN·YR·HAR and
cultivars not tested in the same trial. This staggered arrangement ZON·LOC·GEN·TRL·HAR become fully confounded (Piepho
of trials has the advantage that the performance of the tested and Eckl, 2014). This means that the effects ZON·LOC·YR·HAR
cultivars can be assessed within a short period of time. However, and ZON·LOC·TRL·HAR produce identical columns in the
besides the modeling of the serial correlation among consecutive design matrix. Thus the two effects are indistinguishable and
measurements, effects for calendar year and harvest year now cannot be separated. To resolve the problem, we drop one of
each of these pairs of confounded terms from the model. It will
need to be separated, which requires a more refined approach.
not make any difference which one is dropped.
In the following, we present models for two-stage and
Resolving the factorial structure and dropping the con-
three-stage analysis. A suitable model for single-stage analysis is
founded effects, the model expands as:
presented in the Supplemental Appendix S1 on page 6.
Two-Stage Analysis. Again, to develop the model for
the second-stage analysis, we initially focus on a single level
of the repeated factor. Several trials can now simultaneously
be run at a location. Moreover, these trials may usually have
different start years and so the same harvest year can occur in
several years. To distinguish block effects from different trials,

602 www.crops.org crop science, vol. 55, march– april 2015


GEN + ZON + HAR + GEN·HAR + GEN·ZON + Resolving the factorial structure and taking the calendar year
ZON·HAR + GEN·ZON·HAR:
(YR) as a random factor, this expands as:
YR + GEN·YR + YR·HAR + GEN·YR·HAR
+ ZON·YR + ZON·GEN·YR + ZON·YR·HAR GEN + HAR + GEN·HAR + ZON + GEN·ZON +
+ ZON·GEN·YR·HAR + ZON·LOC·YR + HAR·ZON + GEN·HAR·ZON: YR +
ZON·LOC·GEN·YR + ZON·LOC·HAR + YR·HAR + YR·GEN + YR·GEN·HAR +
ZON·LOC·GEN·HAR + YR·ZON + YR·ZON·HAR + YR·GEN·ZON +
ZON·LOC·TRL·HAR + YR·GEN·ZON·HAR [12]
ZON·LOC·TRL·GEN·HAR [8]
Model Derivation for Prediction of Cultivar
Three-Stage Analysis. Similar to the three-stage approach Means per Zone
Our main goal of the analysis is to be able to compute the best
for only one trial, the model for several trials can be derived
estimate for the cultivar yield within every zone. Up to now,
from the two-stage method shown above. As before, we first
we have taken the cultivar factor (GEN) as fixed. Applying this
compute the cultivar means per trial and harvest year in the first model with fixed cultivar effects will lead to a BLUE of cultivar
stage of the analysis and average across environment effects in means in a zone using only data from that specific zone. These
the second stage. Now greater caution needs to be taken not to estimates, however, do not use the information of the neighbor-
mix up the environmental covariance and the genetic covari- ing zones. If there is a high correlation in the ranking of the
ance between zones in the third stage of the analysis. cultivars between the zones, several methods exist to increase
As mentioned before, we cannot simply estimate cultivar the precision of the estimated zone-specific means by borrow-
means within zones because doing this would make the cultivar– ing strength across zones. To optimize the estimation for the
year effects inseparable from genetic covariance between zones cultivar yield within one zone, the information of the target
in the next step. We therefore estimate GEN·YR·HAR-means zone and the neighboring zones need to be combined. An easy
within zones in the second stage, which keeps calendar year way to do this is to compute an unweighted mean for every
cultivar across the zones. It is an unweighted mean in the sense
(YR) as well as harvest year (HAR) separate from each other and
that each zone contributes the same amount of information to
thus avoids any commingling in the third stage. For each zone
the cultivar mean across zones. Such an unweighted mean can
(ZON), we fit the following model in the second stage: be calculated with a model that handles GEN as a fixed factor.
Another approach is to combine the observations of the target
GEN + HAR + GEN·HAR: zone and the neighboring zones by a weighted estimator. The
YR + YR·HAR + GEN·YR + GEN·YR·HAR problem to solve is to find the optimal weights for the model. It
+ LOC·YR + LOC·GEN·YR + LOC·YR·HAR is easily noticed that some zones, especially neighboring zones,
+ LOC·GEN·YR·HAR + LOC·HAR + are often agroecologically more similar to the targeted zone than
LOC·GEN·HAR + LOC·TRL·HAR + other zones. Moreover, the means computed per zone may differ
LOC·TRL·GEN·HAR [9] in precision between zones (e.g., depending on the number of
trials per zone). This suggests that the derived two-stage and
Now that we estimate GEN·YR·HAR-means, the effect three-stage models should be changed in a way such that the
genetic correlation between zones, as well as the precision of
GEN·YR·HAR needs to be taken fixed and the random
means per zone, is used to calculate weights for a weighted mean
effects YR, YR·GEN, and YR·HAR can be dropped because
across zones. It is shown in Piepho and Möhring (2005) that
they are contained in the fixed three-way effect. Further- this can be accomplished by using cultivar BLUPs (Searle et al.,
more, dropping the confounded effects LOC·YR·HAR and 1992) based on a suitable mixed model. The modification will
LOC·GEN·YR·HAR leads to the model: lead to an optimal weighted mean which minimizes the error of
prediction for that zone and requires that cultivar is modeled as
GEN·YR·HAR: a random factor in stage three. In the preceding stages, however,
LOC·YR + LOC·GEN·YR + LOC·HAR all cultivar effects not involving other random factors must still
+ LOC·GEN·HAR + LOC·TRL·HAR + be modeled as fixed (Piepho et al., 2012).
LOC·TRL·GEN·HAR [10] The assumption of random cultivars may appear difficult to
justify for cultivars that are the subject of selection. While in a
The computed GEN·YR·HAR-means can then be evaluated Bayesian framework, essentially all parameters are random vari-
ables, in a frequentist setting, the notion of randomness involves
across zones in stage three with the following model.
the assumption of random sampling from some parent popula-
tion. In the context of cultivar testing, we may consider the
GEN × HAR ×YR × ZON [11]
cultivars under test as a random sample from the potential set of
cultivars that could have been obtained by repeatedly running
the same kind of breeding programs that generated the entries
under consideration (Piepho and Möhring, 2005). We would
also like to point out that the assumption of random cultivars is

crop science, vol. 55, march– april 2015  www.crops.org 603


standard in the routine analysis of cultivar trials in some coun- Three-Stage Analysis. Since in the second-stage, estimates
tries, including Australia (Smith et al., 2005). The assumption are produced by zone, the adaptation for estimating weighted
of random genotypic effects is also becoming very common in means need only be performed for the third stage of the analy-
the analysis of plant breeding trials (Bernardo, 2010), where the sis. Effects HAR and ZON can be dropped and only the fixed
continuing selection raises the same questions with respect to effect with the highest-order interaction, ZON·HAR, remains.
the definition of random sampling. For a further discussion and We fit the following model in the third stage:
motivation of the use of mixed models for cultivar evaluation
under selection, see Piepho and Möhring (2006). ZON·HAR:

Several Locations, One Trial GEN + GEN·ZON + GEN·HAR


As before, we first consider trials with the same first harvest + GEN·ZON·HAR [15]
year (Table 2). We will start with the derivation of the model
for the two-stage analysis. The analysis uses the genotypic cor- Several Locations, Several Trials
relations of the investigated zones, which is a measure for the Again we consider trials which do not necessarily have the same
similarity of the zones, to calculate the optimal weights. first harvest year (Table 3). The derivation of the model will
Two-Stage Analysis. We will refer back to the two-stage be similar to the case with only one trial, such that all effects
Model [3] (see Model derivation for unbiased estimation, Sev- involving the factor GEN are regarded as random and all fixed
eral locations one trial, Two stage analysis) for several locations effects except the effect with the highest order interaction can
and one trial where cultivar BLUEs are calculated. Now, to be dropped because the higher-order fixed effect absorbs all
estimate cultivar BLUPs, all effects in model [3] involving the contained lower-order effects.
factor GEN are regarded as random. We fit the following model Two-Stage Analysis. To derive the model for estimat-
in the second stage: ing cultivar BLUPs, we will change the stage-two Model [8]
for several locations and several trials where cultivar BLUEs are
HAR + ZON + ZON·LOC + ZON·HAR+ calculated and take all effects in the model involving the factor
ZON·LOC·HAR: GEN as random. We fit the following model in the second stage:

GEN + GEN·HAR + GEN·ZON + ZON·LOC·TRL·HAR:


GEN·ZON·HAR + ZON·LOC·GEN +
ZON·LOC·GEN·HAR [13] GEN + GEN·HAR + GEN·ZON +
GEN·ZON·HAR + GEN·YR + GEN·YR·HAR
+ GEN·YR·ZON + GEN·YR·ZON·HAR +
With the effects GEN, GEN·ZON, and GEN·HAR, GEN·YR·ZON·LOC + ZON·LOC·GEN +
GEN·ZON·HAR taken as random, this model implies a com- ZON·LOC·GEN·HAR + ZON·LOC·GEN·TRL+
pound symmetry (CS) structure for the genetic correlation ZON·LOC·GEN·TRL·HAR [16]
between zones. That way, a constant genetic correlation between
zones is assumed for composite effects (GEN + GEN·ZON) Again, we keep the simpler CS variance–covariance structure
and (GEN·HAR + GEN·ZON·HAR). Using the same syntax and model all cultivar-zone effects with simple random effects
as with repeated measures, we could express this correlation and drop all fixed effects except the effect with the highest-
structure taking ZON as the repeated factor and GEN and order interaction ZON·LOC·TRL·HAR.
GEN·HAR as subjects (i.e., we would represent these effects by Three-Stage Analysis. The adaptation for estimating
GEN·ZON and GEN·HAR·ZON). One could use more com- weighted means need only be performed for the third stage
plex structures like UN or UN(1) to model the genetic correla- of the analysis. For the second stage, Model [10] (see Model
tion between zones, allowing for heterogeneity between zones. derivation for unbiased estimation, Several locations, several
In our experience, however, more complex covariance struc- trials, Three stage analysis) can be used. While taking GEN
tures often do not converge or result in similar results as data and its interactions as random, all effects not involving GEN are
gets sparse (Kleinknecht et al., 2013), so we here stick with the regarded as fixed. Dropping all fixed effects except the effect
simpler CS model and retain the simple random effects specifi- with the highest order interaction, ZON·HAR·YR, will result
cation in [13] for simplicity and computational efficiency. in the model shown below.
Furthermore, all fixed effects except the effect with the
highest-order interaction can be dropped since they will be ZON·HAR·YR:
absorbed by the latter effect. This leads to the following model. GEN + GEN·HAR + GEN·YR + GEN·HAR·YR +
GEN·HAR·ZON + GEN·YR·ZON + GEN·ZON +
ZON·LOC·HAR: ZON·HAR·GEN·YR [17]
GEN + GEN·HAR + GEN·ZON +
GEN·ZON·HAR + ZON·LOC·GEN + Again, for the reasons stated above, complex structures for cor-
ZON·LOC·GEN·HAR [14] relation between zones are not used for this model. The CS
structure is modeled with simple random effects.

604 www.crops.org crop science, vol. 55, march– april 2015


RESULTS Table 4. Model fit statistics for simulated data with several
locations and a single trial with homogeneous variances
In accord with the structure in the Materials and Methods between locations.
section, we present results for the developed mixed models
Model† Akaike Information Criterion
for unbiased estimation of cultivar means per zone and for
CS 2884.7
prediction of cultivar means per zone. We make compari-
AR(1) 2891.3
sons among the stagewise approaches as well as between
TOEP(1) 2959.7
the two models for unbiased estimation and prediction of †
AR(1), first order autoregressive structure; CS, compound symmetry structure;
cultivar means according to the resulting adjusted means TOEP(1), banded Toeplitz matrix structure with off-diagonal bands set to zero.
and their pairwise correlation between analyses.
Table 5. Second stage variance parameter estimates and
Unbiased Estimation of Cultivar Means their standard errors for the two-stage analysis of simulated
per Zone data for a single trial and several locations.
Several Locations, One Trial Effect Estimate SE
Two-Stage Analysis. There is a wide range of covariance ZON·LOC 166.37 93.18
structures that can be chosen to model serial correlations. ZON·LOC·HAR 157.18 46.79
Thus, it is important to explore different variance–covari- ZON·LOC·GEN 46.13 8.64
ance structures when using mixed models. Penalized fit Residual 48.93 4.71
statistics, which include a term for penalizing overfitting
such as the Akaike Information Criterion (AIC), can indi- GEN + HAR + ZON + GEN·HAR +
cate how well each model fits the data compared with GEN·ZON + ZON·HAR + GEN·ZON·HAR:
other models. The model with the smallest AIC value is
considered to have the best tradeoff between the goodness ZON·LOC + ZON·LOC·HAR +
of fit and the complexity of the model and is therefore ZON·LOC·GEN + ZON·LOC·GEN·HAR [18]
preferred. For our data, we chose to compare three differ-
ent structures. The TOEP(1) structure (banded Toeplitz For the analysis we used Model [18], which has a homogeneous
matrix structure with off-diagonal bands set to zero) is compound symmetry structure, between harvest years, imple-
used for specifying the same variance component for each mented by fitting simple random effects. Variance parameter
harvest year and no serial correlation between harvest estimates and their standard errors are shown in Table 5.
years. The AR(1) model (first-order autoregressive struc- This two-stage analysis, however, requires long com-
ture) is popular in longitudinal data analysis with equally puting times and high memory usage as datasets get large.
spaced time intervals in cases where serial correlation We therefore introduce a three-stage analysis, which
decreases with distance in time (Potthoff and Roy, 1964). entails a second level of approximation (Piepho et al., 2012)
With only three harvest years, the pattern of covariances but makes it possible to analyze rather bulky datasets in a
or correlations for different time lags may be nearly con- straightforward way and still deal with unbalanced data.
stant (Piepho and Eckl, 2014). The CS structure assumes Three-Stage Analysis. For analysis, the simulated
that the serial correlations are constant for all time lags. In data were assumed to have a homogeneous compound
SAS, the structure of the covariance matrix can be speci- symmetry structure, implemented by fitting simple
fied using the “TYPE = covariance-structure” option. random effects. The variance parameter estimates and
The CS model, however, can also be implemented by fit- their standard errors for the analysis with models [4] and
ting simple random effects (see Supplemental Appendix [5] are shown in Table 6.
S1). Model fits for several locations and a single trial (first
harvest year 2010) are shown in Table 4. Several Locations, Several Trials
In contrast to the TOEP(1) structure, the AR(1) and Two-Stage Analysis. When we tried this model in a
CS structures model a serial correlation between har- two-stage analysis using all the zones, locations, and trials
vest years. Thus, the high AIC values in Table 4 for the of the dataset gained from the German cultivar trials, our
TOEP(1) structure indicate serial correlation between computer, a desktop PC with an Intel Core2 Duo E8400
harvest years. Since the AIC values suggest that the processor and 4GB of RAM, operating on Windows XP,
homogenous CS model fits best, we prefer this model for ran out of memory. To evade such problems, we use the
computing cultivar means. This model has the additional three-stage Model [17].
advantage that it can be fit by simple random effects. As an For the simulated data, the variance parameter esti-
example, we rewrite Eq. [3] with simple random effects, mates and their standard errors for the two-stage analysis
which will result in the following expression. are shown in Table 7. The model assumes a homogeneous
compound-symmetry structure, implemented by fit-
ting simple random effects. For comparison, the variance

crop science, vol. 55, march– april 2015  www.crops.org 605


Table 6. Variance parameter estimates and their standard Table 8. Variance parameter estimates and their standard
errors for the three-stage analysis of simulated data for a errors for the three-stage analysis of simulated data with
single trial and several locations. three trials and several locations.
Second stage results Simu- Second-stage results
Effect Zone 1 SE Zone 2 SE lated
Effect Data Zone 1 SE Zone 2 SE
LOC 56.37 134.85 283.60 256.25
LOC·YR 168 404.09 174.00 146.10 78.85
LOC·HAR 263.12 154.71 75.96 46.10
GEN·YR·LOC 16 30.77 8.47 14.01 7.27
LOC·GEN 68.90 23.34 30.61 12.10
LOC 227 2.56 91.13 243.32 240.78
Residual 48.29 9.29 38.74 7.46
LOC·HAR 6 10.51 11.37 8.02 20.59
Effect Zone 3 SE Zone 4 SE LOC·GEN 18 29.79 13.10 8.02 8.98
LOC 156.63 146.78 168.87 213.42 LOC·GEN·HAR 12 1.48 5.24 11.01 6.30
LOC·HAR 49.27 31.75 240.38 141.78 TRL·LOC 65 35.23 28.03 37.71 40.40
LOC·GEN 43.45 17.37 41.55 16.36 TRL·LOC·HAR 41 9.75 9.70 32.88 31.59
Residual 56.81 10.93 51.89 9.99 GEN·TRL·LOC 30 19.61 9.50 34.89 10.19
Third stage results Residual 24 13.30 8.19 15.55 6.56
Effect Estimate SE Effect Zone 3 SE Zone 4 SE
Residual 44.70 8.60 LOC·YR 168 69.54 39.94 159.50 74.05
GEN·YR·LOC 16 15.34 6.44 29.94 13.06
LOC 227 140.22 151.39 15.24 65.76
Table 7. Variance parameter estimates and their standard LOC·HAR 6 5.89 14.19 2.25 7.88
errors for the two-stage analysis of simulated data with three LOC·GEN 18 0 . 20.51 13.94
trials and several locations. LOC·GEN·HAR 12 22.45 6.27 18.51 12.62
Single stage Two-stage TRL·LOC 65 67.36 51.46 74.63 49.01
Simu-
lated analysis analysis TRL·LOC·HAR 41 24.82 23.30 7.63 6.93
Effect Data Estimate SE Estimate SE GEN·TRL·LOC 30 50.25 10.45 17.99 9.99
YR 219 305.86 318.84 304.39 319.01 Residual 24 14.24 6.75 7.17 4.14
GEN·YR 18 52.46 38.77 52.19 38.77 Third-stage results
HAR·YR 3 17.68 33.13 17.59 33.11 Estimate SE
GEN·HAR·YR 21 27.06 20.85 26.94 20.85
YR 219 314.08 320.96
YR·ZON 429 303.05 152.07 301.58 152.07
GEN·YR 18 53.04 38.30
GEN·YR·ZON 13† 0 . 0 .
HAR·YR 3 15.42 29.54
HAR·YR·ZON 17 7.21 12.81 7.17 12.81
GEN·HAR·YR 21 24.39 20.11
GEN·HAR·YR·ZON 12 17.55 6.53 17.51 6.53
YR·ZON 429 352.65 150.51
YR·ZON·LOC 168 193.96 44.73 192.99 44.73
GEN·YR·ZON 13† 0 .
GEN·YR·ZON·LOC 16 18.01 3.94 18.21 3.94
HAR·YR·ZON 17 9.16 11.04
ZON·LOC 227 104.72 68.89 104.22 68.89
Residual 29.08 6.86
HAR·ZON·LOC 6 6.87 7.02 6.80 7.02

Variance component estimated to be zero. Hence, no standard error could be
GEN·ZON·LOC 18 12.08 6.46 11.79 6.46 computed.
GEN·HAR·ZON·LOC 12 16.77 3.78 16.87 3.78
TRL·ZON·LOC 65 48.30 18.62 50.69 18.62
TRL·HAR·ZON·LOC 41 18.38 8.75 18.40 8.75 The AIC values of model fits for different covariance
GEN·TRL·ZON·LOC 30 32.39 5.90 32.50 5.90 structures assuming homogeneity of variance among loca-
Residual 24 28.31 4.99 13.29 3.42 tions are shown in Table 9. For this dataset, we found the

Variance component estimated to be zero. Hence, no standard error could be model fit to be similar for the AR(1) model and the CS model,
computed.
indicating small heterogeneity in correlations for different
time lags. It is valuable to model serial correlation between
components used for the simulated dataset are included in harvest years, as can be seen from the high AIC values for
the table. the TOEP(1) structure. This is expected, as we simulated the
Three-Stage Analysis. For analysis of the simulated data to be correlated according to a CS structure.
data, we assumed a homogeneous compound symmetry
structure between harvest years, implemented by fitting Comparison of Single-Stage, Two-Stage,
simple random effects. Table 8 shows variance parame- and Three-Stage Analysis
ter estimates and their standard errors for the three-stage It was mentioned before that two-stage analysis requires
analysis of the simulated data as well as variance compo- long computing times and high memory usage when data-
nents used for the simulated dataset for comparison. sets get large. Besides the important advantage that trials
can be analyzed individually, two-stage analysis compared

606 www.crops.org crop science, vol. 55, march– april 2015


Table 9. Second-stage Model [10] fit statistics for each zone Table 11. Correlation of true values, best linear unbiased
(ZON) for three-stage analysis using simulated data with sev- predictions (BLUPs), and best linear unbiased estimations
eral trials and several locations applying different covariance (BLUEs) GEN·ZON means for simulated data with one trial
structures. Bold indicates the lowest Akaike Information and several locations. Results were obtained by a three-
Criterion (AIC) value of the applied covariance structures in stage approach.
each zone.
BLUP Simulated
AIC BLUE 0.6209 0.5453 b†
Model †
Zone 1 Zone 2 Zone 3 Zone 4 BLUP 0.7462 a
CS 2145.2 2126.4 2168.1 2154.5 †
Mean correlations (based on 100 simulation runs) in this column followed by a
AR(1) 2139.9 2128.8 2171.8 2156.1 common letter are not significantly different according to a paired t test at the
5% level using a linear model with fixed effects for simulation run and estimation
TOEP(1) 2203.2 2190.2 2227.9 2193.4 method.

AR(1), first order autoregressive structure; CS, compound symmetry structure;
TOEP(1), banded Toeplitz matrix structure with off-diagonal bands set to zero. effects for the simulation and thus obtained the true culti-
var values for every zone. The advantage of knowing the
Table 10. Correlations for GEN·HAR mean estimates for sim- true cultivar values in the simulation is that not only were
ulated data with three trials using single-, two- and three- we able to compare the values of BLUE and BLUP but we
stage analysis assuming homogeneity of variance between
could also correlate these estimators with the true values.
locations.
For the simulated data with one trial and several locations,
Method we used Models [4]/[5] to obtain GEN·ZON BLUEs. The
Method Two-stage Three-stage
BLUPs of GEN·ZON were estimated using Models [4]/
Single-stage 1.0000 0.9997
[15] by a three-stage approach.
Two-stage 0.9997
The results show that BLUPs are more highly corre-
lated with the true (simulated) values than BLUEs.
with a single-stage analysis can save some memory space
and computing time as well. For the simulated data with Several Locations, Several Trials
three trials and several locations, the running time for the The comparison of the performance of BLUE and BLUP
single-stage analysis with PROC MIXED was 5 h and 40 estimators was performed analyzing the simulated data
min and 4 h and 36 min for the two-stage analysis using with the added random cultivar effects using a three-stage
a desktop PC with an Intel Core2 Duo E8400 processor approach. For the data with several locations and several
and 4GB of RAM, operating on Windows XP. Since plot trials, we used Models [10]/[12] to obtain GEN·ZON
data is modeled in the first stage, a bigger dataset could BLUEs. Estimating GEN·ZON BLUPs with Models [10]/
have been used for the two-stage analysis, whereas the [17] was not that straightforward this time. Due to the
single-stage analysis was at its limits concerning memory small number of cultivars in the dataset and the replace-
space. Our main goal was to develop methods of analysis ment of 7 out of 10 old cultivars for newer ones every
for which computation time, memory space, and con- new trial, the necessary estimates for cultivar variances
vergence will not be problematic for complex data. The were lacking in precision. Imprecise variance estimates
computing time for the three-stage analysis with PROC result in poor estimates for cultivar BLUPs. An analysis
MIXED was 15 s. The division of the analysis into three using BLUPs to estimate cultivar performance is therefore
parts makes the individual models smaller and less com- recommended only when having a substantial number of
plex. Single-stage and two-stage mean estimates (BLUE) cultivars in the dataset. To compensate for this drawback
are very highly correlated. The correlation of adjusted of our data and to explore the potential gain in precision
GEN·HAR means was also high between single-stage by BLUP when variance components are estimated with
and three-stage analysis and also between two-stage and good precision, we ran an additional analysis in which
three-stage analysis (Table 10). the genotypic variances in Model [17] were fixed to the
value of the true genotypic variances we simulated the
Prediction of Cultivar Means per Zone data with. To confirm the difference of estimates com-
All analyses presented in this section are based on a three- puted with BLUE and BLUP, we created 100 simulated
stage approach. datasets and calculated GEN·ZON BLUPs and BLUEs for
each dataset accordingly. The correlations between the
Several Locations, One Trial true values and the three estimators, BLUE, BLUP with
We wanted to compare the performance of BLUE and fixed genetic variances, and BLUP, were then compared.
BLUP as estimators of the cultivar effect per zone and To do so, we set up a linear model (PROC GLM) with
see what methodology for our derived three-stage models effects for replicate and method and used Tukey’s test for
will compute the better estimate for the cultivar–zone multiple comparisons between the three estimators.
yields (Table 11). To do so, we added random cultivar

crop science, vol. 55, march– april 2015  www.crops.org 607


Table 12. Correlation of true values, best linear unbiased pre-
combinations of the three-stage, two-stage, and single-
dictions (BLUPs), best linear unbiased estimations (BLUEs),
and BLUPs with fixed genetic variances for simulated data stage analysis without running into convergence prob-
with several trials and several locations. Results were lems, we created simulated data using variance component
obtained by a three-stage approach. estimates from a dataset on series of regional yield trials
BLUP BLUP fixed Simulated
with ryegrass. We demonstrated that a three-stage analysis
BLUE 0.6835 0.6889 0.5274 c†
leads to results (BLUE) similar to single-stage or two-stage
BLUP 0.9871 0.7115 b analysis. Computing time and memory (RAM) required
BLUP fixed 0.7378 a were substantially smaller for the three-stage analysis. In

Mean correlations (based on 100 simulation runs) in this column followed by a addition, convergence issues were not as problematic for
common letter are not significantly different according to a Tukey-test at the 5% complex data. We therefore recommend the three-stage
level using a linear model with fixed effect for simulation run and estimation method.
method to analyze perennial crop trials on a zonal basis.
There is a wide range of covariance structures that can be
We found significant differences in the correlation chosen to account for serial correlation that arises in peren-
averages among the three methods (Table 12). The mean nial crop trials. We compared different variance–covariance
correlation using BLUE was considerably smaller com- structures, such as the TOEP(1) structure (banded Toeplitz
pared with an analysis with BLUP. Although estimates matrix structure with off-diagonal bands set to zero), the
for BLUP with estimated variance components and BLUP AR(1) structure (first-order autoregressive structure), and
with fixed variances are highly correlated, fixing variances the CS structure, using AIC. The CS model, which we pre-
leads to estimates closer to the true genotypic values. This ferred for our dataset, can be fitted by simple random effects
shows that BLUP can lead to more precise mean estimates and could be used in place of more complex modeling when
than using BLUE, especially when cultivar variances can explicitly fitting repeated measures covariance structures.
be estimated precisely. In our analyses, we found the CS model to perform well.
Results also indicate that BLUP outperforms BLUE. When applying our framework in other areas, fit of alter-
Also, BLUP with fixed variance components is slightly native covariance structures should always be explored. In
more closely correlated with the true genotypic values particular, it is appropriate to consider models that allow for
than is BLUP when variance components are estimated heterogeneity of variance between years, including CSH
from the same data. and ARH(1) (both are available in SAS).
In this paper, at first we have taken the cultivar factor as
Discussion fixed, which will lead to an unbiased estimation of cultivar
This research was initiated as a result of efforts to adapt meth- means (BLUE). Our main analysis goal was the ability to
ods for combining information across regional yield trials compute the best estimate of cultivar yield within every
from several zones in Germany (Michel et al., 2007) to the zone. If there is a high correlation in ranking of cultivars
repeated-measures structure as arising in trials with peren- between zones, the precision of the estimated means may
nial crops such as grasses. A previous article (Piepho and increase when the information of the target zone and the
Eckl, 2014) focused on the analysis of the repeated-measures neighboring zones are combined. If the number of cultivars
structure. The present paper now accounts for the division is large, we recommend using cultivar BLUPs, which use
of a target region into zones, extending earlier proposals for the genetic correlation between zones, as well as the preci-
annual crops (Michel et al., 2007). Such an extension is the sion of means per zone to calculate weights for a weighted
subject of current research (Hartmann, 2010). mean across zones. Therefore, the models with fixed effects
This paper presented different methods to analyze trials were modified to calculate BLUPs by regarding cultivar
with perennial species adapted for the division of a target effects as random. It is shown that BLUPs lead to a better
region into zones. Methods for the analysis of a repeated- estimate for the cultivar yield if estimates of variance com-
measures structure arising in trials with perennial crops ponents for cultivars are precise. In the case for the simu-
were already discussed in Piepho and Eckl (2014), where lated dataset, the number of cultivars was small and esti-
the main challenge was to handle the complexity of the mates of variance components for cultivars turned out to
datasets. For the situation when several trials with staggered be poor, adversely affecting the performance of BLUP. We
starts were performed side by side, problems with comput- fixed the genotypic variances in the applied model to the
ing time and memory space arose with single-stage and value of the true genotypic variances the data was simulated
two-stage models. A three-stage analysis, in which cultivar with. The results affirmed our suggestion that the method
means per trial and harvest year computed in the first stage of choice for trials with a large enough number of cultivars
are analyzed across environments in two further stages, was is to use BLUP. In trials with ryegrass cultivars, the number
derived to cope with computational problems. of cultivars is notoriously small. For making use of BLUP,
To be able to compare the variance component one might also consider using long-term variance estimates
estimates and adjusted means for cultivar–harvest year based on a larger number of cultivars, which may be more

608 www.crops.org crop science, vol. 55, march– april 2015


stable and precise than estimates based on a small number and SAS in a regionalized trial system. In: H.P. Piepho and
H. Bleiholder, editors, Agricultural field trials—Today and
of cultivars tested in a given series of trials.
tomorrow. Proceedings of the International Symposium,
Stuttgart-Hohenheim, Germany. 8–10 Oct. Verlag Grauer,
Supplemental Information Available Beuren, Germany. p. 136–141.
Supplemental information is included with this article. Milliken, G.A., and D.E. Johnson. 1992. Analysis of messy data.
Supplemental Appendix S1. We here describe the Volume 1: Designed experiments. Chapman and Hall, London.
implementation of our proposed models and methods Milliken, G., J. Willers, K. McCarter, and J. Jenkins. 2010.
using the SAS procedure MIXED. Designing experiments to evaluate the effectiveness of preci-
sion agricultural practices on research fields: Part 1 concepts
Supplemental Appendix S2. This comprises a SAS file
for their formulation. Oper. Res. 10:329–348. doi: 10.1007/
and Excel file. s12351-009-0072-4
Möhring, J., and H.P. Piepho. 2009. Comparison of weighting
Acknowledgments in two-stage analyses of series of experiments. Crop Sci.
The authors wish to thank the Arbeitskreis Mitte-Süd for con- 49:1977–1988. doi:10.2135/cropsci2009.02.0083
ducting the trials and the working group IPZ 4b of the Bavar- Piepho, H.P., A. Büchse, and K. Emrich. 2003. A hitchhiker’s
ian Research Institute (Freising, Germany) for coordinating guide to the mixed model analysis of randomized experi-
the trials and collating the data. This work was funded by the ments. J. Agron. Crop Sci. 189:310–322. doi:10.1046/j.1439-
Bavarian State Ministry of Food, Agriculture and Forestry 037X.2003.00049.x
(A/11/03). We would like to thank three anonymous reviewers Piepho, H.P., A. Büchse, and C. Richter. 2004. A mixed model-
for very helpful comments. ling approach to randomized experiments with repeated mea-
sures. J. Agron. Crop Sci. 190:230–247. doi:10.1111/j.1439-
037X.2004.00097.x
References Piepho, H.P., and T. Eckl. 2014. Analysis of series of variety tri-
Atlin, G.N., R.J. Baker, K.B. McRae, and X. Lu. 2000. Selection als with perennial crops. Grass Forage Sci. 69:431–440.
in subdivided target regions. Crop Sci. 40:7–13. doi:10.2135/ doi:10.1111/gfs.12054
cropsci2000.4017 Piepho, H.P., and V. Michel. 2000. Überlegungen zur regionalen
Bernardo, R. 2010. Breeding for quantitative traits. 2nd ed. Stemma Auswertung von Landessortenversuchen. Informatik, Biome-
Press, Woodbury, MN. trie und Epidemiologie in Medizin und Biologie 31:123–136.
Casler, M.D. 1999. Repeated measures vs. repeated plantings in peren- Piepho, H.P., and J. Möhring. 2005. Best linear unbiased predic-
nial forage grass trials: An empirical analysis of precision and tion for subdivided target regions. Crop Sci. 45:1151–1159.
accuracy. Euphytica 105:33–42. doi:10.1023/A:1003476313826 doi:10.2135/cropsci2004.0398
Conaghan, P., M.D. Casler, D.A. McGilloway, P. O’Kiely, and Piepho, H.P., and J. Möhring. 2006. Selection in cultivar tri-
L.J. Dowley. 2008. Genotype × environment interac- als—Is it ignorable? Crop Sci. 46:192–201. doi:10.2135/crop-
tions for herbage yield of perennial ryegrass sward plots in sci2005.04-0038
Ireland. Grass Forage Sci. 63:107–120. doi:10.1111/j.1365- Piepho, H.P., J. Möhring, T. Schulz-Streeck, and J.O. Ogutu. 2012.
2494.2007.00618.x A stage-wise approach for analysis of multi-environment tri-
Hartmann, S. 2010. A system to optimize forage crop variety trials als. Biom. J. 54:844–860. doi:10.1002/bimj.201100219
for regionalized Recommended Lists in Germany. Grassland Potthoff, R.F., and S.N. Roy. 1964. A generalized multivari-
in a changing world. Grassland Sci. Eur. 15:317–319. ate analysis of variance model useful especially for growth
Kleinknecht, K., J. Möhring, K.P. Singh, P.H. Zaidi, G.N. Atlin, curve problems. Biometrika 51:313–326. doi:10.1093/
and H.P. Piepho. 2013. Comparison of the performance of biomet/51.3-4.313
BLUE and BLUP for zoned Indian maize data. Crop Sci. Searle, S.R., G. Casella, and C.E. McCulloch. 1992. Variance
53:1384–1391. doi:10.2135/cropsci2013.02.0073 components. New York: John Wiley & Sons.
Littell, R.C., P.R. Henry, and C.B. Ammerman. 1998. Statistical Smith, A.B., B.R. Cullis, and R. Thompson. 2005. The analysis of
analysis of repeated measures data using SAS procedures. J. crop cultivar breeding and evaluation trials: An overview of
Anim. Sci. 76:1216–1231. current mixed model approaches. J. Agric. Sci. 143:449–462.
Michel, V., A. Zenk, R. Graf, J. Möhring, A. Büchse, and H.P. doi:10.1017/S0021859605005587
Piepho. 2007. The Hohenheim-Gülzow method for analysis
of series of trials as basic procedure for PIAF and PIAFStat

crop science, vol. 55, march– april 2015  www.crops.org 609

You might also like