Professional Documents
Culture Documents
Abstract
Advances in sequencing technologies have aided the discovery of millions of genome-wide DNA
polymorphisms such as single-nucleotide polymorphisms (SNPs) and insertion–deletions (InDels)
which are an invaluable resource for marker-assisted breeding. Presently available bioinformatics tools
assist the discovery of polymorphisms between target genotypes and the reference genome for a range
of species. The discovery of polymorphisms between two genotypes within a breeding program is
complicated by several factors such as bias in the number of reads from each genotype and residual
heterozygosity within each genotype. In this chapter, we describe a novel approach where polymor-
phisms between a pair of genotypes are discovered from whole-genome re-sequencing data.
1 Introduction
Robert J. Henry and Agnelo Furtado (eds.), Cereal Genomics: Methods and Protocols, Methods in Molecular Biology,
vol. 1099, DOI 10.1007/978-1-62703-715-0_24, © Springer Science+Business Media New York 2014
287
288 S. Gopala Krishnan et al.
Fig. 1 A schematic overview of the steps involved in SNP discovery from whole-
genome re-sequencing data in comparison with a high-quality reference genome
sequence
2 Materials
2.2 Software 1. Software such as CLC Genomics Workbench for the assembly
of reads and detection of SNPs (see Note 1).
2. A spreadsheet program such as Microsoft Excel with the option
for filtering and eliminating duplicate values.
3 Methods
3.1 Assembly of 1. Use CLC Genomics workbench to assemble the trimmed and
Reads and Detection high-quality reads from each of the genotypes (e.g., Genotype
of SNPs 1 and Genotype 2) individually to the reference genome
(Fig. 1, steps 1–3).
2. Then use the SNP detection tool to discover SNPs in the
assembled contigs in comparison to the high-quality reference
genome (Fig. 1, steps 4–5).
3. CLC Genomics workbench is used to assemble the trimmed
reads from both genotypes in combination to the reference
genome (Fig. 1, steps 1–3).
4. The SNP detection tool is then used to discover SNPs in com-
parison to the high-quality reference genome (Fig. 1, steps
4–5) (see Note 2).
5. The three assemblies (one combined assembly of Gentoypes 1
and 2 and the two individual assemblies of Genotype 1 and
Genotype 2) give rise to three possible situations (Fig. 2).
Situation 1: In this case (Fig. 2a), even though an SNP
(C compared to T at reference position no. 7,936,303) is discov-
ered in the combined assembly and the individual assemblies, the
allele (C) is the same in both the Genotype 1 and Genotype 2, and
so there is no polymorphism between the gentoypes. Situation 2:
In this case (Fig. 2b) two alleles (G/A, G in the reference sequence
at position no. 7,941,982) are discovered in the combined assem-
bly; in the individual assemblies of Genotype 1 and Genotype 2;
both Genotype 1 (4G/4A) and Genotype 2 (8G/4A) are hetero-
zygous at this position. Therefore, even though there appears to be
two alleles in the combined assembly, the polymorphism is due to
residual heterozygosity at this position which is shared by both
Genotype 1 and Genotype 2. Situation 3: In this case (Fig. 2c),
two alleles (C/T, C in the reference sequence at position no.
7,936,320) are discovered in the combined assembly where one
allele (T) is from Genotype 1 and the other allele (C) is from
291
Fig. 2 Three assemblies (one combined assembly of Genotypes 1 and 2 and the
two individual assemblies of Genotype 1 and Genotype 2) can give rise to three
possible situations: (a) A C/T SNP where the C allele is same in both the Genotype
1 and Genotype 2; (b) a G/A SNP and both Genotype 1 and Genotype 2 are het-
erozygous at this position; (c) a C/T SNP where one allele (T) is from Genotype 1
and the other allele (C) is from Genotype 2
292 S. Gopala Krishnan et al.
3.2 Discovery 1. In the first step, eliminate heterozygous loci from each of the
of SNPs Between genotypes. This is achieved in a spreadsheet by filtering SNPs
Two Genotypes from non-repetitive regions discovered from separate individ-
ual assemblies of Genotype 1 and Genotype 2 with the follow-
ing parameters: coverage > 4, count of variant 1 >0, and count
of variant 2 >0. The SNPs obtained after this step will be SNPs
which are unique to each individual assembly (see Note 3).
2. In the next step, identify SNPs in the combined assembly. This
is done by filtering the SNPs from non-repetitive regions dis-
covered from combined assembly of Genotype 1 and Genotype
2 with the following parameters: coverage > 9, count of variant
1 >4, and count of variant 2 >4 (see Note 4).
3. In the final step, eliminate spurious SNPs from the pool of
putative SNPs identified between Genotype 1 and Genotype 2
in the combined assembly. This is achieved by two substeps:
First, compare SNPs from individual assembly of Genotype 1
with the SNPs identified in the combined assembly one chro-
mosome at a time. Remove the SNPs with the same reference
position in both the assemblies by using remove duplicates
option (see Note 5). Second, compare SNPs from individual
assembly of Genotype 2 with the SNPs identified in the com-
bined assembly, one chromosome at a time. Then remove the
SNPs with the same reference position in both the assemblies
by using remove duplicates option.
4. Retain the set of SNPs from the combined assembly after the
elimination of duplicates in the above steps constituting the
true pairwise SNPs between the Genotype 1 and Genotype 2
(see Note 6).
4 Notes
Acknowledgements
References
1. Gopala Krishnan S, Waters DLE, Henry RJ 2. Henry RJ, Edwards K (2009) New tools for
(2012) Genome-wide variations between elite single nucleotide polymorphism (SNP) discov-
lines of indica rice discovered through whole ery and analysis accelerating plant biotechnol-
genome re-sequencing. In: Rangasamy SRS ogy. Plant Biotechnol J 7:311
et al (ed) 100 years of rice science and looking 3. Gopala Krishnan S, Waters DLE, Katiyar SK,
beyond. Proceedings of the International sym- Sadananda AR, Satyadev V, Henry RJ (2012)
posium held at Tamil Nadu Agricultural Genome-wide DNA polymorphisms in elite
University, Coimbatore, Tamil Nadu, India, indica rice inbreds discovered by whole-genome
9–12 January 2012, pp 118–119 sequencing. Plant Biotechnol J 10:623–634
294 S. Gopala Krishnan et al.
4. McCouch SR, Zhao K, Wright M, Tung CW, Molecular markers for plants derived from large
Ebana K, Thomson M et al (2010) Development scale sequencing. J Biosci 37:829–841
of genome-wide SNP assays for rice. Breed Sci 7. Lakdawalla A, Schroth GP (2010) Mutation
60:524–535 discovery with the Illumina genome analyzer.
5. Edwards M, Henry R (2011) DNA sequencing In: Meksem K, Kahl G (eds) The handbook of
methods contributing to new directions in plant mutation screening—mining of
cereal research. J Cereal Sci 54:395–400 natural and induced alleles. Wiley-VCH
6. Henry RJ, Edwards M, Waters DLE, Gopala Verlag GmbH & Co. KGaA, Weinheim, pp
Krishnan S, Bundock P, Sexton TR et al (2012) 103–120