You are on page 1of 5

Estimating an Age of Splice-site Mutation Causing

Hypotrichosis with Juvenile Macular Dystrophy
Disease in Pakistani Population
Adeel Ahmed1, Khalid Saleem2
1 2
` Department of Computer Science
Quaid-i-Azam University
Islamabad, Pakistan

Abstract—The splice-site mutation at the locus on human
chromosome 16q22.1 causes hypotrichosis with juvenile macular
dystrophy (HJMD) disease in Pakistani population [1]. Different
statistical methods have been used to estimate the age of
mutations. We have analyzed a CDH3 gene dataset which consists
of genotypes of the Pakistani family in order to compare two
methods for estimating an age of IVS10-1 G ĺ T mutation that
occurrs in some affected members of a family. The first method is
DMLE+ [2] that is a genealogy based method and estimate the
mutation age with population growth rate and genotype data. A
second method [3] that we used is based on allele frequency. We
have found that the mutation age vary with population size,
growth rate and mutant allele frequency. Estimates of IVS10-1
GĺT mutation based on DMLE+ [2] are 232, 238 and 226
generations with three simulated runs and 95% credible interval,
respectively. A method [3] gives an estimated time of 138
generations in units of 2N generations.
Keywords-Gene genealogy; Mutation; Population genetics;
Statistical computing; Linkage disequilibrium



The age of mutation is the time of the origin of mutation
from its most recent common ancestor (MRCA). Estimating
an age of mutation that occurs in autosomes is slightly
different as it involves some stochastic processes as compared
to estimating an age using X-chromosome or Y-chromosome.
The age of mutation can be estimated based on genealogy and
from the allele frequency. Population genetics provides us a
way to study the distribution and change of allele frequency
under the certain evolutionary processes like natural selection,
genetic drift, mutation and gene flow.
Hypotrichosis with juvenile macular dystophy (HJMD;
OMIM 601553) is a rare autosomal recessive disorder in
CDH3 (OMIM 114021) gene in consanguineous Pakistani
family causes hair loss and eye blindness [1]. This disorder
occurs due to splice-site mutation. The affected individuals
revealed a homozygous recessive transversion mutation
(IVS10-1 G ĺ T). In CDH3 gene, a mutation occurs due to
genetic drift and resides in intron 10 near the splice acceptor
site of exon 11 on chromosome 16q22.1 [1].

978-1-4673-4450-0/12/$31.00 ©2012 IEEE

Sulman Basit3

Department of Biochemistry
Quaid-i-Azam University
Islamabad, Pakistan

Our Contributions
We have estimated a time of splice site (IVS10-1 G ĺ T)
mutation occurs in a Pakistani family in CDH3 gene. We have
used two methods, first method is based on allele frequency
and is proposed by Kimura et al. [3] and second method is
DMLE+, proposed by Rannala et al. in 2001 [2] for age
prediction. We compared two approaches for predicting the
age of splice-site mutation with same parameters on the CDH3
gene dataset. We have found a mutation age predicted by one
method closely to the credible intervals as predicted by the
second method. We performed three independent simulation
runs of Markov chain Monte Carlo (MCMC) [2] and find the
consistent results about mutation age.
The organization of this paper is as follows: Section 2
describes the related work, Section 3 describes the proposed
statistical methods used for predicting a mutation age. In
Section 4, we describe about real dataset, Section 5 presents
experimental results, and Section 6 presents conclusions and
future work.


A mutation age can be predicted based on genealogy and
based on allele frequency. Here we discuss both methods as
these methods are within the scope of our work.
A. Estimate of Mutation Age based on Genealogy
A primarily work on intra-allelic variability was first
proposed by Serre et al. in 1990 [4] on ǻF508 mutation that
occurs in CFTR gene. In this approach, di-allelic loci are
considered from a sample taken from 240 French families. The
estimate of allele age can be obtained from a moment
estimator. To obtain a confidence interval of an age estimate
using a moment estimator, is uncertain [5]. The factors of
uncertainty are recombination rate and mutation rate.
In 1999, Griffiths et al. [6] proposed infinitely sites
mutation model that is represented by a unique gene tree. Here
the age of mutations and an age of most recent common
ancestor are estimated and the conditional distribution of ages

Bayesian linkage disequilibrium gene mapping is an intra-allelic coalescent model that uses a multiple genetic markers to find a linkage among marker alleles. Therefore. In this model. A coalescent model that models the evolution of DNA sequences. III. a deleterious mutation occurs due to natural selection or random genetic drift. This method is implemented in program DMLE+ [2]. [15] has estimated an age variant based on replicates observed in a population. A study of likelihood is taken on the basis of discrete branching process model. To obtain confident results of an allele age. In 2003. age estimation for growing population is set to the genetic clock according to Luria-Delbruck correction as follows: gc = gc + g0 (3) where ‘g’ is a number of generations estimated from (2) and g0= . MCMC method is used to generate joint posterior density of parameter based on Metropolis-Hastings algorithm. . An age estimated by (2) is an under estimate for growing population and the genetic clock tics is more slowly than expected. This method allows a mutation to occur more than once in a genealogy tree but it is ncessary that mutation-carrying haplotypes to be known. In coalescent process mutations are imposed on the tree as a poisson processes of rate ș/2. Thompson et al. gives the ancestral relationship among the number of DNA sequences.y) is a Green function of the diffusion. [9] implemented a multiple marker likelihood method for estimating a mutation age. He has estimated a time when allele is reached at present frequency by using diffusion methods. Therefore. Griffiths [13] has estimated an expected age of allele ‘A’ having frequency ‘x’ as a sample path average in diffusion process that is E(Age) = œ01 G (x. A neutrality tests are performed to test a hypothesis that whether a population is evolved according to Wright Fisher model with constant effective population size or not. a gene program tree is used for mutation age calculation. Linkage disequilibrium can be helpful in finding an age of mutation and a location of mutation. Maruyama et al. where fd = ed / ed – 1 in a growing population with growth rate ‘d’. A coalescent model is proposed by Griffiths et al.are found for a Melanesian population using a ȕ-globin locus by considering a diploid dataset [7]. Colombo [16] has estimated an age of N370S mutation using (2). point out the haplotype on which the mutation arises. Here. Stochastic evolution process is required for age defined as a random variable. Probability distribution of trees can be calculated by fundamental recursion for probability Monte Carlo simulation approach. In 2002. A Markov chain Monte Carlo method is used to estimate a mutation age and location jointly called joint based estimation. g = log ı / log (1 – ș) (2) where ı is a LD measure and ș is a recombination fraction. Here. B. proposed a diffusion model [11] to find an approximate age of neutral mutant allele when a allele frequency is fixed and it also gives an age before the fixation of allele. Yu et al. Monte Carlo experiments are performed. in 1998 [14]. In this work. an extension of Bayesian linkage disequilibrium mapping is described. Estimate a Mutation Age based on Allele Frequency In 1975. y) u0 (y) / u0 (x) dy (1) where G (x. MCMC methods are used to estimate the likelihood function. position of disease locus and ancestral haplotypes. a different mutation rates can be set for different types of markers but a growth rate is not taken ino account. he concluded that a mean age is larger than variance of age. in which it is assumed that €n. so here a selection coefficient is considered as constant against heterozygotes. The author has applied a result when mutation rate is low and population size is constant. In 2000. Here it is assumed that each variant has a unique origin and each mutant gene reproduces independently. In 1996. Goldgar et al. and is defined as a 4NM.(1/d) ln (ș fd). When a mutation occurred then there will be ‘k’ ancestors of the sample. expected value of mutant allele is calculated along a sample path starts from initial frequency ‘p’of allele at time 0 and reaches ‘x’ at some time later. A population parameter used here is ș. where ș is a parameter of a function of population size ‘N’ and a mutation rate ‘M’ per sequence per generation. Likelihood analysis provides a confidence interval range in which a variant is originated. PROPOSED STATISTICAL METHODS USED FOR MUTATION AGE ESTIMATION We have estimated an age of splice-site (IVS10-1 G ĺ T) mutation using DMLE+ [2] and a method [3].b denoted the age of mutant gene that is there exist ‘b’ copies of mutant gene in a sample of ‘n’ chromosomes. Conditional distribution of €n. u0 (x) is the probability of absorption at initial frequency of ‘x’. A genotype data is considered in a population of Africa. A modified Goldgar method [10] is presented where the author has modified the likelihood to allow the haplotype uncertainty that is a mutation occurs more than once in a genealogy tree if the haplotypes carrying the mutation is unsured. Since. In 1976. Li [12] has introduced a new method to find an age of deleterious mutation that causes a severe disease. [8] proposed the Parsimony principle to deduce a number of mutations. In 1975. and estimates the time of the origin of mutation.k is distributed as UTk + Sk+1.

[3] proposed an expected age of mutant allele that has frequency ‘x’ in a constant population size. genetic drift. mutation and geneflow. Mutation Analysis and Input Paremeters for DMLE+ Mutation analysis is performed by using genotype of family members of both affected and control indiviudals of the Pakistani family with microsatellite markers closely linked to the CDH3 gene. The disease-associated haplotypes are shown beneath each symbol.3) progarm. according to [17].Figure 1. unaffected subjects. a splice-site mutation is discovered by Jelani et al.x log x (4) . Second method [3] that we have used for estimating an age of splice site mutation is based on allele frequency. • Proportion of disease chromosomes in a sample is 0. These parameters are • Genotype data as shown in Fig. So we considered a growth rate of the population of Pakistan according to 2008. Therefore. The remaining parameters are set as default parameters of DMLE+. • We have set a variable ‘Mendilian Inheritance’ as recessive. Pedigree of the family with hypotrichosis and juvenile macular dystrophy. Change in the frequency of allele in population occurs due to some evolutionary processes like natural selection. 1. Kimura et al. 1. affected subjects. [1] A. • Since. a growth rate of Pakistani population was 0.02. Filled symbols. We have given different parameters as collected over the CDH3 gene dataset to inputs to the DMLE+ (release 2. that is . In 1973. • Genetic distances that we have used between markers are shown in Fig.001. open symbols.2 x / 1 . in 2008 [1].

J. 2000.098. So by using r = 0. including 6 affected and 7 unaffected members using automated DNA sequencer. Ohta. 34. et al. 18. U. 1. B. R. M. We have performed three independent simulation runs using MCMC [2] method with same parameters in order to obtain the consistent results for the age of IVS10-1 G ĺ T mutation. M. According to molecular population theory [8]. the age of mutation in years. E. A mutation age estimated by method [3] is found nearly to the credible intervals estimated using DMLE+ but method [3] is a good choice. S. 894-895. 1973. Slatkin et al. a growth rate ‘r’ of Pakistani population was 0. is w * 20 = k years. R. We have applied two approaches. pp. vol.0138 x 10000 = 138 generations.S. A. 68-73. Kimura and T. Ahmad. if a mutation occurs in certain gene due to some evolutionary process and when parameteric information is not readily available. if we assume a constant population size of N=10000. Estimated mutation age (generations) According to (4).B.1 and are affected with HJMD disease as shown in Fig. IR. vol. 6 affected members carry splice site (IVS10-1 G ĺ T) mutation on chromosome 16q22. pp: 225-249. 75. CONCLUSIONS In this paper. This formula gives an age in scaled time units say ‘z’. 91. Simon-Bouy. P.2 * P / 1 – P ) * ln ( P ) iii) AGEINYEARS ĸ L * E ( t1 ) iv) return AGEINYEARS In 2000. “ Genetics. This dataset is provided by the Department of Biochemistry. ” Proc. we will be applying these methods on more datasets.” Journal of Clinical and Experimental Dermatology. “A novel splice-site mutation in the CDH3 gene in hypotrichosis with juvenile macular dystrophy. Table I shows the estimated mutation ages using DMLE+. We have observed that the mutation age vary with population growth rate and the allele frequency. “Studies of RFLP closely linked to the cystic fibrosis locus throughout Europe lead to new considerations in population genetics. ” The Annals of Applied Probability.02. 2002. which are found very close to each other and within the range of credible intervals. 374] VI. J. Quaid-i-Azam University. We have found that the expected time E(t1) = 0. Serre.0138 in scaled time units. are applied over the CDH3 gene dataset of a consanguineous Pakistani family. Chishti and W. In future. S. Harding. C. The estimated time of mutation predicted by using DMLE+ method is more reliable as it is the best representation of gene genealogies and their variability as compared to method [3]. 1990. B. IR. the expected time of mutation is 0. 2) Output: Mutation age in years i) P ĸ IR ii) E ( t1 ) ĸ ( . 84. . M.” Annual Review of Genomics and Hum. pp. “The age of a neutral mutant persisting in a finite population. Genet. ” Hum. Boyce and J. Among them.02 we have set a 95% credible interval in units of generations. 1999. Islamabad. M. Jaume-Roig. B.where ‘x’ is the frequency of a mutant allele. Clegg. “Molecular and population genetic analysis of allelic sequence diversity at the human ȕ-globin locus. Balassopoulou. According to molecular population theory [8]. 2008. pp:449-454. method [2] and method [3] over CDH3 gene dataset and successfully estimating a mutation age with 95% credibility. L. Method: Mutation Age Calculation 1) Input: N. The mutation analysis was carried out over the genotype data by examining the 13 members of a family. vol.L. A. in 2008. Therefore. Reeve. We have analyzed a CDH3 gene dataset for a family of Pakistani population whose some members have splice-site (IVS10-1 G ĺ T) mutation that causes HJMD disease. Total numbers of chromosomes are 13 including case and controls and are used as inputs for DMLE+ program. Acad. pp. Rannala and J. The MLE is (5) DATASET The statistical methods discussed in Section 3 for age estimation. vol. To find an age in years we have assumed a length of human generation that is 20 years [8]. [5] has used this fomula for age estimation and formulated a maximum likelihood estimation (MLE) for the expected time of mutation. B. Nat. “The Ages of Mutations in Gene Trees.” Bioinformatics. Tavare. 385] [156. 3) ProcedureMutationAge (N. Griffiths and S. pp: 567-590.0138 of the mutation is different from the MLE. 01.. L a) N: Total number of individuals in a population b) IR: Incidence Rate c) P: Mutant Allele Frequency TABLE I. Jelani. vol. Fullerton. Simulation Run (MCMC) Equation (5) gives us a maximum likelihood estimation for mutation age that is 0. 9. vol. if we assume a population size N=10000 which is a minimum estimate of population size of modern humans during the period before recent growth and assume that this poluation size is constant then the mutation age is 0. 199–212.. 95% credible interval (DMLE) 1 2 3 232 238 226 [144. Rannala. Slatkin and B. we have considered the problem of mutation age estimation. AGEINYEARS) IV. pp: 18051809.. Genet. [4] V. d) L: Generation Length – ln (1 – p) – 2 / n ESTIMATES OF MUTATION AGE FOR IVS10-1 G ĺ T [5] [6] [7] M. REFERENCES [1] [2] [3] EXPERIMENTAL RESULTS According to [17].A. then z * 10000 = w generations. 1994. M.P. vol. 364] [152. Sc. Pakistan. “DMLE+: Bayesian linkage disequilibrium gene mapping. “Estimating Allele Age. Mornet.

Genet. G. “The first arrival time and mean age of a deleterious mutant gene in a finite population. S. M. Veenstra.” BMC Genetics. Tomlinson. C. J. G. B. O. pp: 692–997. R. L. "Age Estimate of the N370S Mutation Causing Gaucher Disease in Ashkenazi Jews and European Populations: A Reappraisal of Haplotype Data. Foulkes. Li. Kelsell. Offit. J. vol. “Haplotype and Phenotype Analysis of Six Recurrent BRCA1 mutation in 61 families. M. Hum. Niell. B. "The frequency spectrum of a mutation and its age. E. Acad. Neuhausen. Thompson. Nat.A. King. “How old is this mutation?. Lenoir. pp: 241-251.html.. W. 11:39. Li. 72. Narod. P. Griffiths.a study of three Ashkenazi Jewish founder mutations. "The age of mutation in a general coalescent tree. H. T. R. Greenwood. D. vol. THE WORLD FACTBOOK. vol. UK. Weber. Ponder.1998. Stratton. Friedman. L. Struewing. T. J. 1975. 1975. Gruber and W.” Am. . Genet.. pp: 271280. Kimura. pp: 273295. Solomon. C. Statist.. N. Goldgar. D. EA. J. K. Tavare. 14.Stochastic Models. S.[8] [9] [10] [11] [12] [13] [14] [15] [16] [17] N. 276-287. 58. 1602-1604. L. Serova. Colombo. ”Mol. D. Hum. 2000. Y."Commun. 2010. S. Genet. Sci. Vol. Cannon-Albright." Am. 2002. pp: 2131-2141. ” Results of an International study. Yu. D. “Estimation of age and rate of increase of rare variants.cia. Mazoyer. E. 28:442–52. Tonin. Fu and W. Evol.“Moments for sum of an Arbitrary Function of Gene Frequency along a Stochastic Path of Gene Frequency publications/the-world-factbook/fields/2002. S. pp. Source: https://www. Kolnick. R. MT. Biol. Durocher. 1996. “DNA Polymorphism in a World Wide Sample of Human X Chromosomes. Maruyama and M. 27. in a general diffusion model. Bishop. Couch. Sun. S. B. H. 66.” Theoretical Population Biology. Hum. C. 19. M. vol." Proc.” Am. Am J Hum. Hamel. vol.S. Caligo. F.C. vol. pp. 64. U. F. StoppaLyonnet. Griffiths and S. Genet. Easton. 1976. 2003. A. D. J.