This action might not be possible to undo. Are you sure you want to continue?
Recessive and Autosomal Dominant Genetic Traits through SNP Microarray Analysis
Chromosoft LLC, P.O. Box 281, Colts Neck, NJ 07722, USA *email@example.com, Phone: 1-732-858-1622, Fax: 1-732-758-0214 Keywords: Genetics, Linkage, Chromosome Mapping Word Count:
and an . pre-implantation. two specific analysis methods (one IBS and one IBD) and corresponding programs (HaplOlap and NucleOlap) are presented. However.INTRODUCTION Single nucleotide polymorphism (SNP) genotype data presents an opportunity to identify shared chromosomal segments between individuals with recent common ancestry. IBS can be used to determine allele sharing as well. a consistent lack of IBS-0’s as well as IBS1’s indicates that the individuals share two haplotypes in this region. Thus with high-density data. For example. In this way. mapping uncharacterized high-penetrance. mapping meiotic recombinations. The identification of common regions has applications in determining degree of relationship. then the maternal chromosome with A at this position was transmitted. IBD and IBS methods can be used separately or in conjunction in assessing family data or group data to determine important alleles shared.g. Haplotype information for multiple siblings can then be used to assess allele sharing. and 0’s will exist between any two individuals. If the child is genotype CC. It can be expected that a certain proportion of IBS-2’s. or BB and BB). While a prerequisite of IBD analysis is data from the nuclear family and knowledge of the pedigree structure. IBS analysis is capable of determining allele-sharing given high enough SNP density without any prior knowledge of a relationship between individuals. Here. it is possible to determine which of the two maternal chromosomes was transmitted to the child at that locus. AA and BB). As previously described [Roberson and Pevsner]. Continuing this logic. The value of zero applies when the genotypes are completely different (e. A value of one is given when only one of the alleles is shared between the two (e. and a value of two is given when both alleles are shared (e. or 2. IBD analysis requires knowledge of pedigree structure to determine haplotypes inherited within nuclear families. and diagnosing pre-conception. a consistent lack of IBS-0’s over an extended range indicates that these two individuals must share a haplotype for this given region. If the child is genotype AC. AA and AB). at each locus the sharing status of two individuals can be assigned one of three “IBS values. a situation where one parent’s genotype is heterozygous and the other parent’s genotype is homozygous can provide phase-informative data for the child’s haplotype received from the heterozygous parent. or pre-natal disease risks. AB and AB. AA and AA. then the maternal chromosome with a C at this position was transmitted. single-gene disorders. IBS provides a clear advantage when no pedigree information is available. 1. given maternal and paternal genotypes AC and CC. Two distinct approaches to determining shared genomic regions from high-density SNP data are identity by descent (IBD) and identity by state (IBS) analyses.g. 1’s.g. IBD and IBS analysis methods provide different advantages when analyzing consanguineous or nuclear family samples.” 0.
.analysis is undertaken to demonstrate which techniques are most appropriate and accurate for varying sets of genetic data.
4) eliminate or maintain candidate regions depending on affected/unaffected status and shared/unshared status between siblings. . The NucleOlap and HaplOlap programs were written in Java and compiled on the most recent release of the Java Development Platform. Fig 1. and at least two offspring. informative SNPs are circled. 1. 2) use informative SNPs to phase progeny data. Processing of the data occurred with a Windows Vista 64-bit machine equipped with 6GB RAM running an Intel® Core i7 920 CPU (2. one of which must be affected. Inference of offspring alleles from parental data has been previously discussed at length (Qian and Beckman). In the example above. The data was formatted and curated using Microsoft Excel. Genotyping took place at offsite labs contracted by the respective institutions.67 GHz). Re-construction of progeny haplotypes from parental informative SNPs. The algorithm employed by NucleOlap for autosomal recessive disorders proceeds as follows: 1) Identify Informative SNPs from parental data. Identification of informative SNPs: SNPs where one parent is homozygous and the other parent is heterozygous provide phase informative information for the offspring. Up to ten offspring can be analyzed at once with the graphical user interface. 2. It requires data from parents. 3) compare progeny haplotypes to determine whether alleles are shared or not and to eliminate miscalls. IBD Analysis Method NucleOlap was written to implement IBD methods for identifying candidate regions responsible for autosomal recessive or autosomal dominant genetic disorders. Identification of Informative SNPs from Parental Data Phase informative SNPs were identified based on situations where one parent is heterozygous and the other parent is homozygous.METHODS SNP Microarray genotype data was obtained through analysis of either blood (Bonei Olam) or buccal (23andMe) samples.
3. Autosomal recessive analysis proceeds as follows: 1. Note: these are post-meiotic recombination haplotypes. 2. transmitted haplotypes can be constructed. Based on the genotype of the child at each informative locus.Based on the informative SNPs identified from the parents. . 4. Elimination of regions depending on status of affected/unaffected progeny haplotypes Overlap status from haplotype comparisons are then assessed to determine which regions are candidate regions for the phenotype of interest. Fig 2. Comparison of progeny haplotypes Progeny haplotypes are compared to determine if the same or different haplotypes were inherited for each region of their maternal and paternal chromosomes. Unaffected children cannot share both haplotypes in the same region as affected children. regions where any unaffected child shares both haplotypes with the affected children are removed from the list of candidate regions. each offspring’s genotypes can be reduced to parentally inherited haplotypes. any region where the affected children do not have the same haplotypes is discarded. all affected children must share both haplotypes in the same region.
Regions are only considered candidate regions if all affected children have inherited the same haplotype and no unaffected children share this haplotype. more than two individuals can be analyzed at once by taking the total identity by state (TIBS) for each locus between all the individuals. For this to occur a pair-wise comparison of every combination of individuals must take place. which increases the computational requirements of HaplOlap in an exponential fashion for each individual added.Fig 3. and any regions that are candidate regions where unaffected children share the same diplotype as the affected children are discarded. However. Calculation of TIBS . the algorithm is half as complicated. Examples of such regions are circled above. For autosomal dominant disorders. Affected children are compared to determine the regions where they all share both maternal and paternal haplotypes. The childrens’ haplotypes are then compared to one another to determine where they inherit the same haplotype. Fig 4. Transmitted haplotypes are only assessed for the affected parent. Discarded regions are shown in gray. IBS Analysis Method The IBS Analysis proceeds as previously described [Roberson and Pevsner]. Unaffected children are compared to the candidate regions from analysis of affected progeny (Fig 3). Regions where they do not all share the same diplotype are discarded as candidate regions for causing the phenotype of interest.
the lowest IBS call between any two members of the group is taken. This is illustrated in the table below. The programs require a separate text file for each individual with four columns summarizing their genotypes: rsid. and 2 calls have no common alleles between the individuals. Regions with TIBS 0. and require only the latest version of the Java Runtime Environment to run.org). genotype. HaplOlap and NucleOlap: GUI Implementation HaplOlap and NucleOlap are both written in the Java programming language and are compiled with Graphical User Interfaces (GUI) created using Netbeans. For example in a group of 9 individuals if all individuals share IBS-1 with the exception of two of them (who share IBS-0). Both are available through Chromosoft (http://www. chromosome. IBS analysis occurs normally as previously described [Roberson and Pevsner]. 1.To determine TIBS. They are platform independent. and regions lacking both TIBS-0 and TIBS-1 calls have two common alleles shared by all the individuals. the TIBS will be IBS-0. . IBD is able to narrow down the possible loci responsible for a genetic trait more than IBS for nuclear families. Table 1. HaplOlap can handle up to 10 input files: any 10 individuals being studied. Sample Determination of TIBS Individual 1 AA AA AA AB Individual 2 AA AA AB AB Individual 3 BB AA AA AB TIBS 0 2 1 2 Once the TIBS calls are determined. In its current implementation. Regions lacking TIBS-0 calls have one common allele between all individuals.chromosoft. A major difference between IBS and IBD methods is that IBD can use unaffected individuals to further narrow down the candidate genomic regions (within a nuclear family). then for that locus. position. IBS cannot do this. NucleOlap can handle up to 12 input files: 2 parents and 10 children. For this reason. although customization of input files is easily achieved.
90 The four children were compared to find the length (in base pairs) of their autosomes for which they shared both parental haplotypes (diplotypes). The third analysis compared each program’s ability to identify candidate regions responsible for a dominant disease inherited in a nuclear family.064 693.380.21 34. Analysis with NucleOlap (IBD) Data obtained from 23andMe’s genetic testing service for a family of four siblings was used to test adherence to the 25:50:25 haplotype sharing ratio. Shared Diplotypes between Siblings Agree with Expected Proportion of Sharing In classical models of genetic inheritance it is expected that. The second analysis compared each program’s ability to identify candidate regions responsible for a recessive disease inherited in a nuclear family.RESULTS To compare NucleOlap (IBD) and HaplOlap (IBS) analysis methods.355 992.01 3. four approaches were taken.03 28.340. and no parental haplotypes in 25% of their genome.80 28.65 24. Child 4 Mean SD % of Autosomes 26. The final analysis compared each program’s ability to identify candidate regions for a dominant mutation inherited by three second cousins. on average. Table 2.161 822.Diplotype Sharing Between Siblings Length of Shared Diplotype (BP) 745.70 29.754. over the course of multiple samples we expect to approach this ratio of genome sharing.169 850.765. The first approach was to verify that the expected amount of diplotype sharing between siblings was identified by each program. Child 3 Child 1 v. Using both NucleOlap and HaplOlap. Child 3 Child 2 v.081. one parental haplotype in 50% of their genome. Child 4 Child 3 v.348 710. we are able to quantify the percentage of the genome for which pairs of siblings share both parental haplotypes. The total length of all autosomes is .476. two siblings share both parental haplotypes in 25% of their genome. NucleOlap . Table 2 summarizes these results.70 24.415 802. While individual deviations from this model are to be expected.738 Children Compared Child 1 v.451. Child 2 Child 1 v.413. Child 4 Child 2 v.919 111.
Given the low sample number.04 22. 500 SNPs at a time.15 28. Thus.922 base pairs (build 36.255.97%. On average. or 3.919 base pairs. HaplOlap.46% of their genome with a standard deviation of 3. Child 3 Child 1 v.973.278.48 27.602. or 28. Child 4 Child 2 v. NucleOlap’s identification of shared diplotypes between two siblings seems to agree with expected values.549 658.214 113.97 The four children compared showed approximately the same pattern of diplotype sharing when using the IBS method. Table 3. .222.776. The children shared diplotypes in 26. the consistency in the pattern and the proximity to 25% means that there is general concordance between the two methods.135.738 base pairs. the siblings shared 802.366 953. Child 4 Child 3 v. Child 3 Child 2 v. Child 4 Mean SD % of Autosomes 24.765.210 654. Analysis with HaplOlap (IBS) The same data were analyzed using the IBS program.90%.119.284 Children Compared Child 1 v.28 22. HaplOlap .413 777.46 3.84 33. Child 2 Child 1 v.Diplotype Sharing Between Siblings Length of Shared Diplotype (BP) 701.972 803. Although the data does not match NucleOlap precisely.306.776 757.413.1). the degree of allele sharing was assessed between every combination of the siblings.2. this average shared percentage of diplotypes between siblings is not significantly higher than the expected 25%.01% with a standard deviation of 111.98 26.864.
The right column of ideograms shows the graphical output of HaplOlap (IBS). Green regions are regions where affected children all share the same diplotype (thes are candidate regions for A). B: analysis of an autosomal dominant mutation in a nuclear family. The left column of ideograms shows the graphical output of NucleOlap (IBD).Fig 5. Ideograms showing the results from the three experiments. A: analysis of an autosomal recessive mutation in a nuclear family. Red regions are regions where affected children all share one haplotype (these are the candidate regions for . C: analysis of an autosomal dominant mutation in an extended family. Green regions indicate candidate regions and black regions indicate regions that have been ruled out.
the average expected proportion of the genome that should remain as a candidate for causing the recessive trait in a family with two affected children is: 1*0. or 406. only 14. Analysis with HaplOlap (IBS) In the IBS recessive model.25 or 25% After analyzing the two affected children with HaplOlap.25= 0. Again.287 base pairs. the linked region on chromosome 20 was contained within one of these candidate regions.312 base pairs. remains with 28 candidate regions (green regions. Summary of the Recessive Model in a Nuclear Family . for every affected child after the first. NucleOlap was tested for its ability to: 1) narrow down the candidate regions within the genome that could be responsible for this trait. remains with 34 candidate regions (green regions.18% of the autosomal genome.75 = 0. the expected proportion of the genome that should remain as a candidate for causing our recessive trait after analysis is: 1*0. 2) retain the linked chromosome 20 region as one of the candidate regions. For all ideograms.1875 or 18.878.B and C). or 520. Two of the three progeny were affected with a genetic disorder that showed linkage to chromosome 20.19% of the autosomal genome. Therefore. and for every unaffected child. Moreover. Confirmation of a Previously Identified Recessive Mutation within a Nuclear Family Anonymous data from a family with three children was used to demonstrate each program’s ability to narrow down the candidate regions for a recessive mutation. the candidate genomic regions should be narrowed down by approximately 75%. a blue dot indicates the position of the gene/region where the deleterious mutation has been previously mapped via other methods. figure 5-a-ii). for every affected child after the first. Analysis with NucleOlap (IBD) In the IBD recessive model. the linked region was contained within one of the two candidate regions identified on chromosome 20. figure 5-a-i).75% After analyzing these genomes with NucleOlap. the candidate genomic regions should be narrowed down by approximately 25%. the candidate genomic regions should be narrowed down by approximately 75%.037.25*0. only 18. Therefore. There is no further region elimination for unaffected children.
siblings are only assessed for this parent’s haplotype. With the IBD analysis.287 18. 2) identify the linked region within one of the candidate regions.360 base pairs. figure 5-b-i). NucleOlap and HaplOlap were tested for their ability to: 1) narrow down the candidate regions within the genome that could be responsible for this trait. The expected proportion of the autosomes that should remain as a candidate for causing the dominant trait after analysis is: 1*0. Table 4 summarizes their performance. it was found that 1. NucleOlap v.037. Three of the children suffered from a autosomal dominant genetic disorder that has previously been linked to a region of chromosome 3.NucleOlap and HaplOlap both succeeded in narrowing down the potential causative candidate loci for this family with two affected children and one unaffected child.18% 28 520. .312 14. remained with a total of 4 candidate regions (green regions. HaplOlap – Analysis of a Nuclear Family with an Autosomal Recessive Mutation Criteria Linked Region Retained? Autosomal Length Remaining (base pairs) Percentage of Autosome Remaining Number of Regions Remaining (discreet) NucleOlap Yes HaplOlap Yes 406.19% 34 Confirmation of a Previously Identified Dominant Mutation Deidentified data was used to demonstrate NucleOlap’s ability to properly identify candidate loci for an autosomal dominant disease. For every affected or unaffected child after the first.55 = 0.948. A family of six siblings was used in this test. one parent is known to be affected. Analysis with NucleOlap (IBD) In our dominant model. or 39. The linked region (blue circle) was contained within one of the small regions identified on the third chromosome showing that NucleOlap succeeded in both narrowing down the regions responsible for causing this trait while maintaining the proper locus within the candidate regions. the candidate genomic regions should be narrowed down by 50%.39% of the autosomes. Table 4.03125 or 3.13% After analyzing the genomes.878.
there are 79 candidate regions (red regions. the candidate region is narrowed down by 25%.75*0. the expected portion of the genome remaining is: 1*0.39% 4 876.348 base pairs. if the affected parent is also analyzed.2109 After analysis. it narrows down the candidate regions by another 50%. The major drawback. For our data. On average. Table 5. Therefore.Analysis with HaplOlap (IBS) While a dominant model assumes that one parent is affected.174. only the three affected children are used in this analysis.58% of the genome.58% 79 Confirmation of a Previously Identified Dominant Mutation Found Between First Cousins A deidentified dataset of three first cousins who share a dominant genetic disorder linked to chromosome 11 was used for the final experiment. once again. Along with analyzing SNPS from affected children. Both programs did so while maintaining the actual mutation within the candidate loci. NucleOlap v. the average total genomic overlap (regions of the genome where all share the same genes) expected to exist between 3 first cousins is: .75*0.948. The results are summarized in table 5.5 = 0. Therefore. In total. a pair of first cousins is expected to share 25% of their genome.174. it is found that 30. Summary of the Dominant Model in a Nuclear Family NucleOlap and HaplOlap both reduced the percentage of the genome that could cause the dominant mutation affecting this nuclear family.360 1. figure 5-b-ii).75*0. or 876. IBS does not rely on pedigree structure and can analyze siblings without the genotypes for the parents. is that the candidate regions are not narrowed down by analyzing unaffected siblings. remains as a candidate for causing this dominant mutation. The linked region is properly identified within one of the regions on the third chromosome (blue circle).348 30. For each child. HaplOlap – Analysis of a Nuclear Family with an Autosomal Dominant Mutation Criteria Actual Mutation Retained? Autosomal Length Remaining (base pairs) Percentage of Autosome Remaining Number of Regions Remaining (discreet) NucleOlap Yes HaplOlap Yes 39.
. There is no parental data for these cousins. Summary of the Dominant Model in an Extended Family In the case of an extended family without pedigree structure.150.414 base pairs coming from 4 regions (red regions. Here.150. when employed. HaplOlap and NucleOlap are presented for the simple and rapid analysis of familial and consanguineous high-density SNP microarray datasets. the linked region is properly located within the region identified on chromosome 11 (blue circle). HaplOlap – Analysis of an Extended Family with an Autosomal Dominant Mutation Criteria Actual Mutation Retained? Autosomal Length Remaining (base pairs) Percentage of Autosome Remaining Number of Regions Remaining (discreet) NucleOlap Yes HaplOlap Yes 2. efficient method of analysis that can aid in many different types of genetic studies. We do not know the precise pedigree structure for these cousins. NucleOlap v. Therefore.00% 22 41.414 1. Table 6.255. and analysis interpretation. is a quick.25 = 0. The results are summarized in table 6. data visualization.44% 4 DISCUSSION Analysis of high-density SNP data presents many challenges which include data management. Yet SNP microarray data. data formatting. Analysis with HaplOlap (IBS) Following analysis.44% of the autosomes.25*0.922 100. remain as a causative candidate for this autosomal dominant genetic trait. or 41. figure 5-c-ii). only HaplOlap (IBS) is able to narrow down the candidate genomic regions for causing the trait of interest. 2. Moreover.1*0.0625 or 6. computational intensity. there is no reduction in the amount of the genome that may cause this dominant trait on the basis of IBD analysis. memory allocation.25% Analysis with NucleOlap (IBD) Analysis with NucleOlap is not feasible given two factors: 1. it is found that 1.864.
9 and 26. and each program achieved varying levels of success. Confirmation of Previously Identified Mutations To assess their value and validity in identifying candidate loci for novel genetic mutations.Shared Diplotypes between Siblings Agree with Expected Proportion of Sharing HaplOlap and NucleOlap both found diplotype sharing between siblings (28. Since the HaplOlap heuristic employed in this analysis analyzes 500 SNPs at a time and then makes an IBS call based on the proportion of IBS-0s. Therefore. NucleOlap uses Identity by Descent to identify meiotic recombination events through the analysis of “informative snps” (figure 1). three different tests were employed.01% ± 3. A difference in the actual proportion of shared diplotypes between the two programs has to do with different analysis methods. likely due to its ability to incorporate unaffected individuals and the fact that its algorithm does not rely on error-prone heuristics. and 2s in the region. The number of SNPs used in the HaplOlap heuristic is editable from a user input panel in the GUI.97 respectively) to be acceptably close to the expected 25%.46% ± 3. in the event that pedigree structure is unknown or parental data is unavailable. it is subject to error if the 500 SNPs spans regions which contain an actual meiotic recombination event. 1s. Table 7 summarizes which algorithms are more appropriate for various types of datasets. However. the Identity by State method is the better option since IBD cannot be performed. The Identity by Descent method (NucleOlap) proves to be superior whenever analyzing nuclear family data. [Table 7 to be constructed] Other Programs for Familial and Consanguineous Studies Merlin SNPDuo++ Homozygosity Haplotype ACKNOLEDGEMENTS REFERENCES . the NucleOlap method is currently more accurate in estimating the precise location of recombinations. whereas HaplOlap uses the Identity by State method to identify regions based on proportion of genotypes shared within a selected region.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.