You are on page 1of 4

Introduction GWAS:

The traditional genetic linkage method used to identify QTLs/genes is very time-consuming
because it requires a large bi-parental mapping population and genotyping. More recently,
genome-wide association study (GWAS), as a powerful approach, has been widely used to
dissect a much broader genetic variability for complex traits in plants (Huang et al, 2010; Zhao et
al, 2011; Morris et al, 2013; Kang et al, 2016; Liu et al, 2017). Compared to the traditional
mapping method, GWAS generally employs more diverse natural populations and high-density
single nucleotide polymorphism (SNP) markers, which helps in identifying marker loci more
close to the candidate genes as well as in exploring favorable alleles of agronomic traits among
natural varieties (Huang et al, 2010; Brachi et al, 2011; Zhao et al, 2011). In rice, many
QTLs/genes related to grain quality, agronomic performance, biotic and abiotic stress have been
characterized with GWAS (Huang et al, 2010; Famoso et al, 2011; Zhao et al, 2011; Kang et al,
2016; Zhu et al, 2016).

Rice was the first crop plant to be fully sequenced (International Rice Genome Sequencing
Project, 2005; Yu et al., 2002). Availability of whole genome sequence al- lows genome-wide
association studies (GWASs) to be carried out in order to understand the genetic basis of
complex traits, such as grain quality (Huang et al., 2010, 2012; Zhao et al., 2011). Huang et al.
(2010) have identified w3.6 million single nucleotide polymorphisms (SNPs) by sequencing 517
rice landraces and performedGWASs for 14 agronomic and grain-related traits in the population
of indica subspecies. The candidate genes qSW5 for grain width, GS3 for grain length, ALK
(starch synthase IIa, SSIIa) for gelatinization temperature, Waxy (Wx) encoding granule-bound
starch synthase (GBSS1) for amylose content, and Rc for pericarp color have been identified.
When a larger and more diverse sample of 950 worldwide rice accessions, including the Oryza
sativa indica and Oryza sativa japonica subspecies were used to performGWAS, a total of 18
new loci associated with 10 grain-related traits were identified, and a total of 30 candidate genes
in the peak SNP sites (or adjacent to these sites) were predicted for these 18 associated loci
(Huang et al., 2012). Xu et al. (2016a) investigated the genetics of 10 rice eating and cooking
quality parameters by GWAS in the whole panel, including 227 nonglutinous rice accessions and
four derived panels. In addition to the known gene for eating quality, such as Wx, some new loci
that locate close to starch synthesis-related genes were identified. Two quantitative trait loci
(QTLs) (chr.9: 15417525e15474876; 17538294e18443016) for several starch paste viscosity
properties detected in four panels were close to the iso- amylase 3 gene, one QTL (chr.1:
30627943e31774927) for consistency viscosity detected in three panels was close to the starch
synthase IV-1 gene. Other GWAS for eating and cooking quality also identified the Wx and
SSIIa as major candidate genes as well as other novel QTLs (Wang et al., 2017).

Low-coverage sequencing approach As an important crop, rice has long been of great interest to
simultaneously map multiple agronomic traits in diverse varieties using the GWAS approach.
Rice is a selfing species with large collections of germplasm, which make it a good candidate for
GWAS. In rice, a candidate– gene association analysis approach had previously been used to
investigate the functional effect of 18 starch synthesis-related genes on controlling rice eating
and cooking quality [Tian et al., 2009]. To develop an effective platform to accomplish GWAS
in rice, the scientific community needs to first prepare a wealth of resources, including a number
of diverse rice accessions, a dense genomic variation map and a set of high-quality genotype and
phenotype data. NGS permits the creation of a genome variation map and the generation of
genotypes calls in the GWAS panel concurrently.

In the experimental design, determining the sample number and sequencing coverage are key
issues. Large samples can increase GWAS power generally and also enable most allelic variants
to be identified. High cover- age means that most genomic regions have multiple reads covered.
However, with a limited budget, one must make a balance between sequencing coverage and
sample number. In a recent work on rice GWAS, the low-cover- age sequencing strategy was
adopted. About 1000 diverse rice accessions in a worldwide collection of rice germ- plasm were
sequenced by NGS technology, each to ?one fold genome coverage [Huang, X et al., 2010,
Huang, X et al., 2011??, Huang, X et al., 2012?]. Millions of SNPs were identified with the
exclusion of singleton SNPs, which capture most common sequence variation in culti- vated rice.
Because of the complexity and repetitiveness of the rice genome, strict filter procedures were
applied in sequence alignment and SNP calling.

Cultivated rice (Oryza sativa L.), which is grown worldwide and is one of the most important
cereals for human nutrition, is considered to have been domesticated from wild rice (Oryza
rufipogon) thousands of years ago1–4. The differences between O. sativa and O. rufipogon are
reflected in a wide range of morphological and physiological traits5–9. Despite the fact that rice
is a major cereal and a model system for plant biology, the evolutionary origins and
domestication processes of cultivated rice have long been debated. The puzzles about rice
domestication include: (Evans, L. T. (1989) where the geographic origin of cultivated rice was,
(Khush, G. S. (1997) which types of O. rufipogon served as its direct wild progenitor, and
(Cheng et al., 2003) whether the two subspecies of cultivated rice, indica and japonica, are
derived from a single or multiple domestications.

Materials and Methods:

We had performed the GWAS Analysis with R (GAPIT)) software statistical package. The
recent advances in R statistical environment free software (https://www.r- project.org/) provide
many useful packages for performing GWAS. The genome association and prediction integrated
tool (GAPIT) is a useful R package that performs GWAS and genomic selection. The main
advantages of GAPIT are: it can handle a large amount of data (SNPs and genotypes) and it
reduces computational time without compromising statistical power [A.E.Lipka et al., 2012]. The
package includes many statistical methods such as MLM, population parameters previously
determined (P3D), and efficient mixed-model association (EMMA) but in our experiment we
had used MLM. The results of GWAS results can be illustrated by Manhattan plots, quantile-
quantile plots and a table, including p-value, minor allele frequency, sample size, phenotypic
variance explained by markers R2 and adjusted P-value following a false discovery rate [ Storey
and Tibshira., 2003]. Similarly, the results of GS are presented in a heat map and a table.
Moreover, heritability estimates and likelihood function can be produced in graphs at different
compression levels. Due to the aforementioned features, GAPIT becomes the most powerful and
useful tool for association analysis in barley [Bellucci, A., 2017, A.M. Alqudah et al., 2020] or
other cereals like wheat [Alomari et al., 2018].

The output results of GWAS Each software program gives slightly different parameters as output
results for GWAS. The main output can be presented in the Manhattan plot that illustrates, on a
genomic scale, the P-values of all markers used in GWAS. The x-axis represents the genomic
order by chromosome and position on the chromosome, while, the y-axis represents the −log10
of the P-value of each marker (equivalent to the number of zeros after the decimal point plus
one). The associated significant SNP (lowest significant p values), representing QTL tend to
show up as a strong signal on the Manhattan plot. The threshold of –log10 (p-value) can be fixed
at a confidence value of which –log10 ≥3 is the most common and reliable value. For further
analysis, the threshold can be recalculated using the multiple comparison analysis that makes the
p-value of SNP more robust and trustworthy (Fig.3A). Another important graph in GWAS is the
quantile-quantile (QQ) plot which illustrates the relationship between the observed and expected
p values. It depicts the deviation of the observed P-value of each SNP from the null hypothesis.
The QQ plot can be used to compare the observed vs, expected values among GWAS statically
models to show how well the model used in GWAS considering the population structure and
familial relatedness and then can be a pplied, for instance, MLM compared to GLM or CMLM
models. The diagonal or standard line shows whether the points are matched perfectly or
deviated which reflect the distribution. Gray area shows 95% confidence region for values. It is
expected that most of the data points in the QQ plot will lie on the diagonal line since they are
not associated with the trait. Whereas the deviations from this line suggest that the model does
not sufficiently control the population structure which can be interpreted as spurious
associations. There are three main possible QQ plots, each with its own meaning:

(1) the observed values correspond to the expected values, all points (observed vs. expected p
values) are very near or on the diagonal line and within the confidence interval, the gray
highlighted region.

(2) the significant SNPs (observed p values are highly and significantly different from expected p
values under the null hypothesis) move towards the y-axis.

(3) If there is an early separation of the points or unclear trend, this means that the results could
be due to an unaddressed population structure or/and poorer quality of the phenotypic data. In
this case, most of the highly deviated SNPs are represented as a false association and other
considerations (e.g. correction of population structure, phenotypic data correction) are required

It is implausible that GWAS will completely explain the heritable proportion of complex traits,
but, it can explain a large proportion. The difficulty in detecting small effects by rare variants or
very small effects by common alleles makes it impossible.( A.M. Alqudah et al., 2020).

You might also like