33 views

Uploaded by (unknown)

Software Familias 3, User Manual

- c1s1
- dieboldliyue_joe08
- 01211432
- MG Lecture 4-1
- Hayes Et Al 2007
- Dr. Gerald Crabtree - Our Fragile Intellect, Part I
- 11 3 National Reuters/IPSOS
- Viscum Genetic
- 10th Guess 2018
- Quasi-Bayesian Model Selection
- 2
- statistic-descriptive-ch-5-uph-july-2007 (2).ppt
- Zoology Solved MCQS
- Quiz 1
- Cape Biology 2016 u1 p2
- DOC-51521
- 6_Probit&Logit
- Monograph 29
- Animal Breeding
- Long Term Planning

You are on page 1of 7

Contents lists available at ScienceDirect

**Forensic Science International: Genetics
**

journal homepage: www.elsevier.com/locate/fsig

**Familias 3 – Extensions and new functionality
**

Daniel Kling a,b,*, Andreas O. Tillmar c,d, Thore Egeland b,e

a

**Department of Family Genetics, Norwegian Institute of Public Health, Oslo, Norway
**

Department for Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Aas, Norway

c

Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linko¨ping, Sweden

d

Department of Clinical and Experimental Medicine, Faculty of Health Sciences, Linko¨ping University, Linko¨ping, Sweden

e

Department of Forensic Genetics, Norwegian Institute of Public Health, Oslo, Norway

b

A R T I C L E I N F O

A B S T R A C T

Article history:

Received 14 March 2014

Received in revised form 23 June 2014

Accepted 1 July 2014

**In relationship testing the aim is to determine the most probable pedigree structure given genetic
**

marker data for a set of persons. Disaster Victim Identiﬁcation (DVI) based on DNA data from presumed

relatives of the missing persons can be considered to be a collection of relationship problems. Forensic

calculations in investigative mode address questions like ‘‘How many markers and reference persons are

needed?’’ Such questions can be answered by simulations. Mutations, deviations from Hardy–Weinberg

Equilibrium (or more generally, accounting for population substructure) and silent alleles cannot be

ignored when evaluating forensic evidence in case work. With the advent of new markers, so called

microvariants have become more common. Previous mutation models are no longer appropriate and a

new model is proposed. This paper describes methods designed to deal with DVI problems and a new

simulation model to study distribution of likelihoods. There are softwares available, addressing similar

problems. However, for some problems including DVI, we are not aware of freely available validated

software. The Familias software has long been widely used by forensic laboratories worldwide to

compute likelihoods in relationship scenarios, though previous versions have lacked desired

functionality, such as the above mentioned. The extensions as well as some other novel features

have been implemented in the new version, freely available at www.familias.no. The implementation

and validation are brieﬂy mentioned leaving complete details to Supplementary sections.

ß 2014 Elsevier Ireland Ltd. All rights reserved.

Keywords:

Familias

Paternity

Likelihood computations

Simulation

Mutation

Disaster victim identiﬁcation

1. Introduction

There are several applications that require determination of

genetic relatedness. The focus of this paper is to describe

methods and implementations for complex relationships problems and disaster victim identiﬁcation (DVI). While we have

forensic applications in mind similar problems occur in a wide

range of areas. The core computational problem is to calculate

the likelihood of the data given competing hypotheses and from

this to form the likelihood ratio (LR). We may further use a

Bayesian approach with prior information to compute the

posterior probabilities. In this paper we restrict attention to

unlinked STR markers and then likelihoods are typically

calculated using extensions of the Elston–Stewart (ES) algorithm

*** Corresponding author at: Norwegian Institute of Public Health Department of
**

Family Genetics, Gaustadalle´en 30, NO-0027 Oslo, Norway. Tel.: +47 210 77663.

E-mail addresses: daniel.kling@fhi.no (D. Kling), andreas.tillmar@rmv.se

(A.O. Tillmar), thore.egeland@nmbu.no (T. Egeland).

http://dx.doi.org/10.1016/j.fsigen.2014.07.004

1872-4973/ß 2014 Elsevier Ireland Ltd. All rights reserved.

**[1] accommodating correction for population substructure
**

(theta-correction), mutations and silent alleles [2]. The algorithm

is in concept a peeling algorithm, where we consider subsets of

a pedigree as conditionally independent given the connecting

node. As a consequence, the algorithm may require long

computation time, when marriage and inbreeding loops are

present [3]. An implementation of the ES algorithm is provided in

the software Familias [4]. The program is used by a large number

of laboratories worldwide [5] when calculating likelihoods in

relationship scenarios. Though previous versions of the software

have included several important features, such as null/silent

alleles, advanced mutation models and subpopulation correction,

Familias has also lacked some desired functionality [6]. With the

advent of new STR markers, micro-variant alleles (i.e., 9.3) have

become more common necessitating an appropriate mutation

model to handle transitions to and from such alleles. Whereas

previous models are generally not designed to handle these

transitions, this paper presents a new model, providing an

extension of the stepwise mutation model [7,8], thereby

accommodating for microvariants.

3. The algorithm can also effectively accommodate many unlinked markers. R. 2.e.3 and 15. FamLink[19] or Merlin [20]. . there is a need for a new mutation model capable of handling transitions to and from microvariants.g. . The next section ﬁrst describes the new mutation model. Since the early report on the successful use of DNA as a tool to identify victims of a mass disaster by Olaisen et al. with elements mij. Simulation Simulations provide means to calculate prediction intervals and investigate speciﬁc likelihood ratio thresholds for a given case. 3. The rows must sum to unity and therefore the normalizing constants ki are determined by the constraints PN j¼1 mi j ¼ 1. 2. Further note that for transitions from allele 9 for example. . (i) Familias is validated Drabek [6].name/OpenFamilias). Mutation model As mentioned. 10. it is the value with which the probability decreases for each further step away from the original allele mutates.. where each cutset is conditionally independent given the rest of the pedigree. denoted m. numerous papers have been published demonstrating its application and utility [10–15]. simulations and DVI feature) into one user friendly environment.16. For the scope of this paper we consider smaller to medium sized DVI situation where the number of missing persons is typically limited to 1000. For two different hypotheses H1 and H2. The probability of falsely including/excluding a true hypothesis with a given LR threshold can be estimated. We specify the model by letting M be the mutation matrix. LR ratios (sometimes converted to posterior probabilities) are reported and the aim is to compare large amounts of reference data. Biologically R is often explained by slippage error during DNA replication [8] while a is connected to insertions/ deletions and point mutations. Next the model is speciﬁed precisely by the transition probabilities mij. prior to deciding to accept a case. Finally. Example 1. simulations can be used to get an idea of what evidential strength we will achieve for a given case. Thereafter. Disaster victim identiﬁcation (DVI) applications can be considered as a potentially large collection of relationship estimation problems. meaning that the probability of observing a mutation from 9 to 9. There are three different alternatives: 1. the implementation is brieﬂy described deferring more complete descriptions to supplementary sections and the manual. as well as providing intervals. The model used for simulation is the same as the one used for likelihood and LR calculation. The underlying core computational model remains the same as in standard likelihood calculations. one may for instance conclude that it is not worthwhile to proceed with a case unless more reference persons are genotyped or the number of genetic markers is increased. the probability that an allele does not mutate. The interface may be utilized to examine the number of genetic markers we need to obtain a sufﬁciently good LR. Ni = 2 as there are two MVM:s given allele 9 as starting point. the likelihood ratio LR ¼ PðdatajH1 . A hypothesis H corresponds to a pedigree. i. Consider a marker containing the alleles 9. mi j ¼ ki a=Ni for micro variant mutations: Ni is the number of MVM-s from allele i. 9. We provide a new stepwise mutation model accounting for MVM. mutations and subpopulation structure. one corresponding to integer mutations. 10. The transition matrix M is then given by: 2 3 k1 a=2 ð1 aÞk1 r 1 k1 a=2 ð1 aÞk1 r 6 1 ðR þ aÞ 1 6 k2 a=3 1 ðR þ aÞ k2 a=3 ð1 aÞk2 r k2 a=3 7 6 7 1 M¼6 ð1 a Þk r k a =2 1 ðR þ a Þ k a =2 ð1 aÞk1 r5 7 3 3 3 6 7 4 k4 a=3 ð1 aÞk4 r 1 k4 a=3 1 ðR þ aÞ k4 a=3 5 6 5 ð1 aÞk5 r k5 a=2 ð1 aÞk5 r k5 a=2 1 ðR þ aÞ In this case. Methods In relationship testing.2. The computation of the likelihood is in this paper based on the Elston–Stewart algorithm [1] and later extensions described in [18].3 is not the same as observing a mutation from 9. m ¼ R þ a. the implementation beneﬁts from integrating similar problems (LR calculations. The simulation interface accounts for all parameters in the model. the emphasis of this paper is on the new methods. Based on simulations. As previously stated. [9].e.j = 1. Some of the functionality of the new version or similar features can be found in other software [6.g. where i. e. . Should we need to account for dependency between markers. then the simulation approach and a framework to deal with DVI problems.3 to 9. between 9 and 9. and one to the micro-variants a.. Each element mij is the probability of a transition from allele Ai to allele Aj. Some current models treat such microvariant mutations (MVM) in the same way as integer mutations (IM) or neglect them as the mentioned transitions are considered improbable.e. The current model separates the overall mutation rate. mi j ¼ ð1 ðR þ aÞÞ if i ¼ j. the mutation range r. In our context. with unidentiﬁed remains.122 D. into two parts. the simulations reﬂect the chosen mutation model. Simulation also extends the results from a point estimate of the LR to a complete description of its probability distribution. k1 ¼ ðR þ aÞ=ða þ ð1 aÞðr þ r 6 ÞÞ: Similar calculation can be shown for the other ki. other algorithms and implementations must be considered. i. This is a consequence of the deﬁnition of M.0 [4]. In other words. some general principles related to validation are described. Details on implementation and validation of the new software Familias 3 [hereafter called only Familias]. 2. fÞ=PðdatajH2 . where the latter connects two or more individuals in a relationship tree. appear as supplementary material and in the manual. the matrix M is not necessarily symmetric. k1 is found as follows 1 ¼ 1 ðR þ aÞ þ k1 a=2 þ ð1 aÞk1 r þ k1 a=2 þ ð1 aÞk1 r 6 . Brieﬂy the algorithm peels the pedigree by calculating conditional probabilities for cutsets. N and where N is the number of alleles.1. However. silent alleles and incorporate theta correction. and can thus be effectively used on large pedigrees. family members or personal belongings of missing persons. mutually exclusive hypotheses are normally formulated. i.3. / Forensic Science International: Genetics 13 (2014) 121–127 Monte Carlo simulation is a generic approach of relevance to virtually all areas of science. is deﬁned as for previous IM models.g. Kling et al. This is biologically unreasonable and the problem has become more pronounced as MVM are more common in the latest STR kits. The model is called the extended step wise model in the implementation. Typically. 2. which extends on Familias 2. mi j ¼ ki ð1 aÞr ji jj for integer mutations. In addition. The core problem is to calculate the PðdatajH. The last parameter. e. Note that. fÞ is typically calculated and reported. (ii) widely used [5] (iii) freely available and (iv) the basic code is open (see http:// familias.17]. fÞ where the data consists of alleles for different genetic markers and f represents parameters needed to model e..

e.1. G2 jH2 Þ PN PN i¼1 j¼1 PðGtrue. As of now. e. Similar to Merlin [20]. one may instead use a general error variable. We consider a latent genotype Gtrue. In addition to the DVI module. Kling et al. i. we have K number of unidentiﬁed DNA proﬁles and M number of reference DNA proﬁles. the probability PðLR xjHÞ is estimated for a given threshold x and an assumed hypothesis H. DVI Since the introduction of DNA. In Familias we specify two sets of data.G2jH2)] in the ﬁrst and ﬁfth line of Table 1 reduces to1/P(A. will typically produce longer computation times. Further note that if d > >c > e and d is comparatively small. transitions within the pedigree are sampled using a transition matrix. The former data set may be reduced to K0 as identical DNA proﬁles are found through blind matching while the latter is reduced to L0 if some of the M reference proﬁles belong to the same cluster. where we deﬁne missing persons. j ÞPðG2 jGtrue. in this setting meaning the same reference family. This may in some situations be appropriate as the meta data may have been incorrectly speciﬁed. .g.B). The user-friendlyness of handling three parameters (d. Furthermore.. the implementation is exact. P(G1jGtrue. dropins and errors are assumed to occur independently. For k = 1. . Simply put. Here. though its utility in the current setting is limited. say below 0. To calculate the transition probabilities in the direct matching we specify three parameters. there is a blind search interface. Using a Bayesian approach the likelihoods are converted to posterior probabilities including prior probabilities set by the user. Complex pedigrees. modeled by the parameter u (sometimes denoted Fst in the literature). pi*pj. consisting of all possible genotypes for the current marker. given two mutually exclusive hypotheses. Further.i. We can now specify the LR as LR ¼ ¼ PðG1 . Moreover.j) are the transition probabilities from the latent genotype to the observed genotypes. Note that these parameters only apply to direct matching function and are not used in the kinship calculations. j Þ PðG1 ÞPðG2 Þ (2) where N is the number of alleles at the current marker and P(Gtrue. To compute the LR we require some more deﬁnitions. j ÞPðG1 jGtrue.21]. c = allelic dropin probability and e = typing error probability. where several of the remains as mentioned may originate from the same individual and AM (Ante Mortem) data. The module calculates likelihoods for each combination of PM data and AM data. A more general formula.i. Brieﬂy..3.j) and P(G2jGtrue.G2jH1)/P(G1. while theta correction is applied in all scenarios should the value be nonzero. We have K K 0 and L L0 .G2jH1) and P(G1. the simulation algorithm starts by detecting all founders for a given pedigree. .i. the LR [P(G1. For the new direct matching feature. while the remaining lines simpliﬁes to zero. See Table 1 below for a list of P(G1. consider two proﬁles G1 and G2.D. In the AM data we deﬁne reference families for each missing person. L0 reference families. for the latent genotype with alleles i and j. to search for relations in a set of individuals before creating a population frequency database or as in the DVI situation to ﬁnd direct matches or relations between PM samples.e.. To specify. we specify dropout as the probability of one allele not being unobserved for a heterozygous genotype (allelic dropout). Interested users may use raw data from the simulations to study observed mutation rates or the occurrence of silent alleles. [23]. . (The formulas are simpliﬁed to ﬁt. PM (Post Mortem) data – obtained from unidentiﬁed remains.i.j) is the genotype probability. / Forensic Science International: Genetics 13 (2014) 121–127 Speciﬁcally. also accounting for inbreeding can be derived. g0. consider the competing hypotheses: H1: The proﬁles belong to the same person H2: The proﬁles belong to two unrelated persons The hypotheses. the formulas implemented are based on identical by descent (IBD) sharing probabilities not accounting for inbreeding. The reference family can contain arbitrary pedigree structures. G2 jH1 Þ PðG1 . H1 and H2. g1 and g2 are functions of allele probabilities depending only on the genotype data.. where we may have genetic data from relatives of the missing person or direct matching samples such as personal belongings. 2. several latent genotypes are unlikely as the transition probabilities are very small. (2). . . including all the effects possibly causing an erroneous genotype. i. K0 we compare each unidentiﬁed DNA proﬁle k with the l = 1. in addition to providing prediction intervals. The algorithm does not account for mutation unless the parent–child relation is chosen. dropin as the probability of an extra allele being observed for a homozygous genotype (allelic dropin) and typing error as the probability of some other laboratory error leading to an incorrect genotype [24]. .i. the interface provide relevant functionality to study thresholds and false positive/ negative rates. Familias implements a general approach. PðdatajHÞ ¼ PðIBD ¼ 0jHÞg 0 þ PðIBD ¼ 1jHÞg 1 þ PðIBD ¼ 2jHÞg 2 (1) where P(IBD = 0jH) = k0.G2jH2) for some combinations of genotypes G1 and G2.i.. allowing the user to search a set of persons for unknown relations. mutation models (see below) and theta correction. it gives the probability of obtaining a LR at least as great as a given threshold.. is distinct from the more common situation where we have some trace evidence from a crime scene and a reference proﬁle to compare with. including an example where the simplifying assumptions are omitted) We see that if d = c = e = 0. The former being uncertain while the latter is commonly considered to be accurate. The feature may be used on any data set. Dropouts. To specify. 1 respectively 2 alleles identical by descent. Founder genotypes are sampled using deﬁned allele frequencies in combination with possible subpopulation correction. P(IBD = 1jH) = k1 and P(IBD = 2jH) = k2 are the probabilities that two individuals share 0.A) and 1/P(A. and the current setting. meta data is not used to adjust priors or to exclude unidentiﬁed persons based on gender.14. The general 123 formula is. The blind searching is restricted to pairwise searches for a number of predeﬁned relationships and implements a fast algorithm based on the formulas presented in Table 4 in Hepler et al. The choice of priors has been debated elsewhere [22] and can be inﬂuenced in Familias by changing the size of the DVI operation. the latter depending on the selected mutation model. see Supplementary data 1 for a more thorough walkthrough of Eq. removing terms negligible in the calculations assuming d > > c > e. genetic data from relatives or personal belongings of missing persons have become one of the most important and reliable means of identiﬁcation [10–12. Moreover. d = allelic dropout probability. The disaster victim identiﬁcation (DVI) module in Familias is provided to assist in any operation that requires an all-against-all search. c and e) instead of one can be discussed. if subpopulation correction is nonzero the allele probabilities are not independent.

30.982e05 9. primarily connected to the direct matching.A A.D.D [ (1 d2)2(1 e)2(1 c)2]P(A. Consider one marker with the mutation model and alleles as described in Section 2 of this paper.929e05 2. results that can be derived by other means.B) P(A. the identity pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ SDðLRjH2 Þ ¼ EðLRjH1 Þ 1 (4a) relating the standard deviation (SD) under H2 to the expected value under H1. (Some useful validation ﬁles are available at the Familias homepage) where p denotes allele frequency. the expected value of the LR assuming AF and CH to be unrelated is 1 .B A. Slooten and Egeland [25] presents further theoretical properties of LR:s.A)*P(B.940e01 1. G1 or G2.3.929e06 2. Speciﬁc validation examples showing correct numerical results.B)(1 d)2c2P(A)P(C)]a (1 d2)2(1 c)2[P(A. appear in Supplementary data 2.A)*P(A. We also introduce a new blind match searching function implementing some new functionality. (5) we ﬁnd.B) P(A.B) P(A.9 then a = b = c = d = 9.A A.B)*P(C. Kling et al. however. Exact calculations are hard for general mutation models. is not valid when there are mutations or theta correction is made. see previous description.B B. Some useful validation formulae in simulations The expected value of the LR assuming the denominator hypothesis H2 to be true is 1 EðLRjH2 Þ ¼ ð1 pÞ0 þ p 1 ¼1 p (3) where p is the random match probability and 0 and 1/p are the two possible values for the likelihood ratio.B) (1 e)2[P(B. where Li is the number of alleles for marker i. the general formula for the expected value for all pairwise.5. 10. non-inbred relationships presented in Slooten and Egeland [25] can be used EðLRjH1 Þ ¼ aL2 þ bL þ ð1 a bÞ where L ¼ alleles.3.B A.B B.1. This is true also if mutations and population substructure are modeled. The likelihood ratio may be written [26] 2.05. simulations closely reproduce the theoretical values.982e05 5. G1 G2 P(G1.982e05 9.A P(A.973e05 9. 15) are (0. For instance. the simulation interface as well as the new DVI module. Familias version 3.B B.B A.familias. the allele frequencies for the alleles (9.982e05 5.A A.4. the possibility to model proﬁles with dropouts [Manuscript submitted].1.3.967e03 5. 0.C) P(A. The genotypes of AF and CH are denoted a/b and c/d.3.A)(1 d2)2cP(B) + P(A. typically for one marker. Turning to validation for simulations under the numerator hypothesis. New mutation model and simulation Example 2.B)d(1 d) (1 c)(1 d2)] (1 e)2[(1 c)2d2(1 d)2]P(A.B)e + P(C. Implementation The software functionality described herein is implemented in a Windows friendly software. Validation LR ¼ Validation can mean several things.3 10 10. For instance. Example 2 below relies heavily on the above equation.G2jH2) A.940e01 2.e. This generalizes directly to n independent markers EðLRjH1 Þ ¼ 1 ðmac þ mbc Þ pd þ ðmad þ mbd Þ pc 4 pc pd For the numerical examples below.4. is probable as the latent genotype.4 at the time of writing. and b ¼ 1 2 4 As an example. 10. b = 9.945e03 1.no.A)*P(A.G2jH1) P(G1. The mayor changes since Familias 2.g.05. Results 3. A case which would need a mutation to be consistent with paternity occurs for genotypes 9.940e01 2. several other new features will be presented in the next releases. This last equation.3 15 9 9. Moreover. 0. e. however.10. See Supplementary data 2 for some validation examples. say below 10–10 any reasonable number of simulations should lead to all LR-s being 0 as the probability of a random match is then negligible. 3.A A. including validation of methods and validation of the implementation.20. There is.3 and 10. 0. The latest version of Familias is freely available at www. When p is small. 9.1. For instance. note for a parent–child relation k1 = 1 and k2 = 0 and the expected LR is therefore (L + 3)/4 for one marker. 9.40). one exemption as explained next. i.967e03 5. a Note that neither of the observed genotypes.939e03 1. Here we focus on approaches that may be of general interest and which can be used to validate also other programs.C)d(1 d)cP(B)] (1 e)2[(1 d)2(1 c)2] P(A.3 corresponding to a = 9. 10. several simulations can be run with different seeds. when the alleged father is 9. c = 10 and d = 10.973e05 5.B)*P(A. if both individuals are homozygote 9. r ¼ 0:1 and a ¼ 0:001: The mutation matrix M becomes Allele 9 9.946e03 2. The mutation parameters are speciﬁed as: R ¼ 0:005.D) 2.982e05 5.C C.3 and the child 10.945e08 1.0 is the introduction of the new mutation model. 0. From Eq.973e05 9. To get an indication of the simulation uncertainty. In this example both simulation and the new mutation model is illustrated.A) A.B) (1 e)2[P(A.C A.B)*P(B.3 15 9. Eqs.B)d(1 d)cP(C) + P(A. a ¼ n Y Li þ 3 i¼1 4 (5) 2.982e05 5. typically exact formulae.C) P(A. This follows directly from the deﬁnition of the likelihood ratio and expectation as pointed out by Thompson [18].3 10 10.A)*P(B. (3) and (4) can be used to check simulations under the denominator hypothesis when p is not too small. Consider the hypotheses H1:AF is the father CH and H2:AF and CH are unrelated. Similarly.973e05 5.D)e]a P(A.3 LR ¼ 1 ðð5:94e 03 þ 1:98e 05Þ 0:2 þ ð2:97e 05 þ 5:94e 4 03Þ 0:3Þ=ð0:2 0:3Þ ¼ 0:0124: which is accurately reproduced by Familias 3. We have checked the code using the above formulae for one marker at the time.939e07 1.A) (1 e)2[P(A.940e01 (4b) k22 k2 þ 4k1 k2 þ 2k22 . / Forensic Science International: Genetics 13 (2014) 121–127 124 Table 1 LRs based on the direct matching feature of Familias.945e03 2.940e01 1.9.

Note that this set of frequencies will change if two alleles are IBD. which is most probably due to the low LR threshold. The direct match LR can be calculated according to Eq.43 3. 10j9.169 2. in order to determine how many relatives are necessary in a given Table 3 Distribution of log10 likelihood ratios for 10. but for the current calculation we use unrelated as the alternative hypothesis.16 1.14 The methods are Familias 3 (with and without mutations considered) as well as results presented by Ge et al. 9. Furthermore. j ÞPðG1 jGtrue. See also Ge et al. uÞPð9.88 2.97 4.001. false inclusions). for a pair of individuals P1 and P2.64 2 full sibs 5.09 2. 10Þ 2 pð9Þ pð9j9.91 0. Method Pedigree Familias3_no_mut Ge et al. i.3.07 2.2. but are based on simulations on the standard 13 CODIS STR markers.59 1 halfsib 0. i. 9.584 7. using formulas in Balding et al.25 8. which are the default values in Familias) PA PA i¼1 j¼i PðGtrue.65 2 full sibs 5.e. j Þ PðG1 ÞPðG2 Þ ¼ hWe simplify and remove terms which is negligible in the numeratori ¼ pð9Þ pð9j9. The investigated relationships are described elsewhere.90 1.92 1. (Data available upon request).05. Let G1 = 9. Speciﬁcally.000 simulations gives a value close to the theoretical. p(9j9). uÞPð9. An all-against-all search was performed in the DVI module. uÞ pð9j9. Familias3_mut Familias3_no_mut Ge et al. Familias3_mut Familias3_no_mut Ge et al. by summing over all possible genotypes for the latent genotype and compute the likelihood for each case according to: (We specify d = 0..i. 0. uÞ pð9j9. 9Þ þ 2 pð9Þ pð9j10.12 3. uÞ pð10j9. where k0 and k1 are replaced by the values according to the relationship H. 9.e. 9j9. the allele frequencies for the alleles [9.10.3.05. For simplicity we let the mutation rate be zero.4 7. Familias3_mut Familias3_no_mut Ge et al. Note that the interface allows us to scale versus some other relationship rather than unrelated. For u = 0. G2 jH0 Þ ¼ LR ¼ PðG1 . p(9j9. (4b) is also consistent with simulations. Kling et al. in total 300.98 4.01) Direct match Siblings Half siblings Cousins Parent–child 2nd cousins 29. 3. 100 pairs of siblings. 15] are [0.47 One parent/One child 4. uÞ 2 all possible combinations of unidentiﬁed remain and reference family.88 2.57 1 halfsib 1.40]. 9.26 8. the expected LR assuming AF to be the father. 100 families where the reference data was from grandparents and 100 families where the reference data was from a parent. The reference families were constructed according to the simulated relationship. Some true matches for the missing persons obtained very low LR barely above 1.5 3. .24 Mean 5percentile 1percentile 7. 10j9. producing a list of matches above a given threshold (in this case set as low as LR = 1). 0. LðDatajHÞ ¼ k0 2 pð9Þ2 pð9Þ pð10Þ þ k1 pð9Þ pð9Þ pð10Þ.i.86 2.25 5.16 1.17 7. for IBD = 2 and IBD = 1 we need to update the frequencies as only two respectively three alleles are drawn from the population.48 One parent/One child 4.25 1.07 5.i.85 One parent/One child 4. DVI module and blind search interface To validate the DVI module simulated data was constructed for a number of relationships (Data available upon request). The match list indicated some false matches (i..05. uÞ ¼ ð1 eÞ2 ½ pð9Þ pð9j9.16 0.34 6. (3) and the computer output based on 10. 0. 9. (1).07 Both parents 10. All missing persons. where LRs were calculated for PðG1 .94 4.01 we need to calculate the updated set of frequencies. Familias3_mut Both parents 10. [p(9). j ÞPðG2 jGtrue.0595.01. For each pair one of the individuals was withdrawn and denoted as missing. 9j9. [28].29 3.08 2. the pedigrees are described elsewhere [27].919 4.33 2 children (same parent 2) 6.35 3.5625 18. calculated versus unrelated as alternative hypothesis.194]. this is due to the high values on the parameters d. which was in some of the cases explained by simulated mutations (grandparents and parent) and in other cases by low number of shared alleles (sibling cases).000 simulations using three different methods. We can now easily calculate the likelihood ratio for the predeﬁned relationships in the blind search interface using Eq.D. 10ÞPð9.000 comparisons were done.48 1 halfsib 0. 10. 0. (L + 3)/4 = (5 + 3)/5 = 2. c and e. uÞdð1 dÞð1 dÞ ð1 cÞ 2 pð9Þ pð9j9. 100 families where the reference data was from siblings. while we consider both u = 0 and u = 0. Also note that the Direct match obtain a high LR even though the proﬁles are not identical. op. We further use constructed data to validate the blind searching function. [27]. (1) as.92 0.70 2 children (same parent 2) 6.34 1.001 and c = 0. i.cit. 9. Relationship LR (u = 0) LR (u = 0. no false match obtained a LR higher than the true match.09 1. G2 jH1 Þ 125 Table 2 LRs for some relationship hypotheses. 9.3.25 10 1.1.31 2 children (same parent 2) 6. 0.63 1. Simulations To further corroborate output from the simulation interface we compared results on some standard forensic cases with simulations reported in Table 6 of Ge et al. 9ÞPð9. 3. u Þ pð10j9.30. In total 300 300 = 90. 10. for a discussion on choice of reference family relatives in DVI operations [27]. 2 The theoretical values in coincide with the values calculated in Familias (Table 2). 0.20.. 0.43 2 full sibs 5.0. However.0688. p(10j9. Familias3_mut Familias3_no_mut Ge et al.e.69 1.9)] = [0. 100 pairs of grandparents/ grandchildren and 100 pairs of parent/childs were generated using the simulation interface. / Forensic Science International: Genetics 13 (2014) 121–127 according to Eq. Consider a system with alleles similar to the ﬁrst example.396 The likelihood for the different hypotheses of relatedness can now be calculated from Eq.9. were collected into a data set of unidentiﬁed remains. The point with this validation is not to investigate the match threshold but rather to demonstrate the accuracy in the calculations.338 1. Consider two persons P1 and P2 with genotypes G1 and G2. u Þð1 d Þc pð10Þ þ 2 pð9Þ pð9j10..9 and G2 = 9. see Table 3.9).1 Both parents 10. from Eq.e. (2). e = 0.

in the online version. Int. Olaisen. Kling et al. Acknowledgements The authors would like to thank the contributions from users testing and evaluating the new features of Familias. Nat. Forensic Sci. B. Poulsen. Selecting the most probable pedigree. e. We have not yet been able to derive such a stationary model. Hered. [12] C. Int. Users may now investigate a case prior to accepting it by computing prediction intervals and decide whether decisive evidence is likely to be obtained. We do not propagate for lowering the threshold only because for a given case the evidence will never reach the required value. the DVI module adopts the full functionality of Familias. Int. B. [3] C. A.30]. Triggs. 49 (2004) 939–953. Mostad. The latter is probably hard to estimate but can in some situations not be neglected. the distribution of allele frequencies will change slightly with each generation in the pedigree. Genet.R. Leclair. CRC Press Inc. The model builds on the stepwise model [7]. Some mathematical problems in the DNA identiﬁcation of victims in the 2004 tsunami and similar mass fatalities. we have developed a new mutation model. Hum. Res. In summary. A general model for the genetic analysis of pedigree data. A report of the 2009–2011 paternity and relationship testing workshops of the English Speaking Working Group of the International Society For Forensic Genetics. 110 (2000) 47–59. . Forensic Sci. Heterogeneous mutation processes in human microsatellite DNA sequences. et al.J. Thompson. As a comparison we also included simulations using the extended stepwise mutation model and the results are still close to the simulations without mutations. (1). at doi:10. k1 and k2 parameters. allowing for subpopulation frequency correction. We believe the extensions provided in this paper will be important for many users where previous versions have lacked desired functionality. Bailey-Wilson.. 1 (2007) 3–12. Hallenberg. Stewart.g. the overall mutation probability. which incorporates dropout. we have developed a DVI module. Prinz. C. et al. There are several papers and online discussions following previous larger scale mass disaster incidents.. Buckleton. Forensic Sci. C. Friis.1016/j. Epidemiology. This paper includes some points on the implementation and interested users should follow the references given above for further mathematical discussions. Similar to the simulation interface. Furthermore. M. unknown in the sense that we have no prior knowledge how the individuals in the data set are related. the search also includes a newly developed direct matching function (also part of the DVI module). The blind search is restricted to pair wise comparisons on a number of predeﬁned relationships implementing the formulas presented in Hepler et al.G. Science 310 (2005) 1122–1123.F. to cope with the increasing polymorphism in the new STR markers. Appl. a blind search tool is included. K. [13] M. USA. i. and therefore equally important as the two ﬁrst mentioned probabilities. Mayr. Forensic DNA Evidence Interpretation. Appendix A. et al. 24 (2000) 400–402. Skolnick. dropin and typing error probabilities. Discussion Familias is a well-known software in the forensic community and used by a number of laboratories [5]. [4] T. Stenersen. [11] L. W. J.. 10 (1978) 26–61. P. Such an extension should preserve the main features like the diagonal elements. Egeland. Ballantyne. Genet. the implementation in Familias opens up for future extensions where any non-inbred relationship between two individuals could be speciﬁed using the k0. providing considerable extensions to previous versions [4].126 D. the algorithm can simulate arbitrary pedigree structures where the only limitation is set by the computation time. The latest version can be freely downloaded at http://www. laboratory animals]. S. J. Int.e. Genet. To assist in mass disaster identiﬁcations.g. Forensic Sci. Probability functions on complex pedigrees [domesticated mammals. Forensic Sci. Genet. The authors are aware of the discussion in the forensic community on the use of case speciﬁc thresholds rather than using an general LR/Posterior probability threshold for all cases. 21 (1971) 523–542. The module further allows the deﬁnition of multiple alternative family hypotheses. DNA Commission of the International Society for Forensic Genetics (ISFG): recommendations regarding the role of forensic genetics for disaster victim identiﬁcation (DVI). A comprehensive simulation interface provides versatile functionality for studying distribution of likelihood ratios for a given case. This transition model is not stationary. the WTC terror attack [11.2014. [5] L.. Walsh. [7] T.. Further. et al.M. Brenner. Enhanced kinship analysis and STR-based DNA typing for human identiﬁcation in mass fatality incidents: the Swissair ﬂight 111 disaster. Bosa Roca. 4. 9. S. Forensic Sci. Beyond traditional paternity and identiﬁcation cases. J. In other words. To further aid in the identiﬁcation of unidentiﬁed remains. thus permitting each reference family to have several missing persons and the user can weigh the evidence given a match based on the possibility that the unidentiﬁed person may ﬁt in several locations in a family tree. the Tsunami disaster [12]. This paper describes methods implemented in the new version (Familias 3). Kimura.. Elston. 22 (1973) 201–204. e. Genet.fsigen. et al. The software facilitates the interpretation of the evidence by computing likelihood ratios and posterior probabilities for a given set of relationship hypotheses and genetic marker data.004. Ota. As the formulas are general in the sense that any non-inbreed pair wise relationship can be deﬁned. [10] B. allowing users to handle small to medium scale identiﬁcations. Biesecker. [8] H. M.. Adv. within each family. the software Familias has previously been proven to be a resourceful tool in calculations concerning genetic relatedness [4–6]. [6] J. The Familias simulation interface produces almost identical output as presented by Ge and colleagues. Mevag. 3 (2009) 112–118. Identiﬁcation by DNA analysis of the victims of the August 1996 Spitsbergen civil aircraft disaster.07.E.H. see Eq. As presented in this paper. Carracedo.L. Microvariants are more and more common. Int. Probab. 9 (2013) e1– e2. not accounting for inbreeding and mutations [23]. C.C. Genet. Nat. In addition to assist in DVI operations the search can also be performed to verify that data sets for the creation of population frequency databases do not contain related individuals. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a ﬁnite population.3. / Forensic Science International: Genetics 13 (2014) 121–127 case to obtain sufﬁcient LRs. Bowen. silent alleles and mutations.familias. but provides extensions for microvariants. [2] J. The users should instead study the false positive/negative rates to ﬁnd an appropriate limit. The work of the last author leading to these results was ﬁnancially supported from the European Union Seventh Framework Programme (FP7/ 2007–2013) under grant agreement no 285487 (EUROFORGENNoE).L. for instance the STR marker SE33 (ACTPB2) includes several alleles with a non-integer repeat unit and even though mutation rates for transitions between non-integer alleles and integer alleles may sometimes be negligible we require an appropriate model to handle them. M. References [1] R. DNA identiﬁcations after the 9/11 World Trade Center attack. Ellegren. 15 (1997) 402– 405. Meva˚g. J.no. Supplementary data Supplementary material related to this article can be found. 157 (2006) 172– 180. Fregeau. [9] B. Validation of software for calculating the likelihood ratio for parentage and kinship. E. Cannings. Drabek. 2004. As presented in this paper the tool can be used to rapidly scan data sets for unknown relations.J. A stationary version of the above model would be a welcomed extension.14] and the hurricane Katrina [29.

Balding.B.H. T. Forensic Sci. Forensic Sci.. Donkervoort.. J.D. B. Exclusion probabilities and likelihood ratios with applications to kinship problems. Genet. Genetics 145 (1997) 535–542. Forensic Sci. R. et al. Int. Int.M. 2000. [18] E. W. [24] A. Merlin – rapid analysis of dense genetic maps using sparse gene ﬂow trees.. [28] D. Nat. Tillmar. et al. . Genet. Genet. [25] K. R. Cherny. M. 5 (2011) 308–315. Bioinformatics and human identiﬁcation in mass fatality incidents: the world trade center disaster.A. [29] S. 52 (2007) 806–819.J. Chakraborty. Slooten.S.S. [17] K. et al. Anderson.R.S. [16] C. Slooten. Choosing relatives for DNA identiﬁcation of missing persons. [22] B. numbers. A. Brenner. Nichols. FamLink – a user friendly software for linkage calculations in family genetics. Genet.-J. Legal Med. Kling. 30 (2002) 97–101. A. [30] S. Int. Mutation Models for DVI analysis. Genetic relatedness analysis: modern data and new challenges. (2013) 1–11. Use of prior odds for missing persons identiﬁcations. et al. [15] L. Error rates in forensic DNA analysis: deﬁnition. Shaler.0. Heal. Rev. Genet. 6 (2012) 616– 620.. Bradford. Int. Nat. Genet. 11 (2009) 414–417. S. Beckwith.H. Slooten. Weir. Cookson. Issues and strategies in the DNA identiﬁcation of World Trade Center victims. Biol. / Forensic Science International: Genetics 13 (2014) 121–127 [14] B. K. Int. 127 [23] B. Forensic Sci. [21] C. et al. 7 (2006) 771–780. Statistical Inference from Genetic Data on Pedigrees.M. Ricciardi.O.O. [26] F. 12 (2014) 77–85. Validation of DNA-based identiﬁcation software by computation of pedigree likelihood ratios. J. [19] D. [27] J. 11 (2014) 85–95. 64 (1994) 125–140. A. et al. Forensic Sci.. Brenner. Ge. Int. DNA proﬁle match probability calculation: how to allow for population stratiﬁcation. S. Theor. G. Carmody. Budowle. database selection and single bands. J. Investig. 1) (2011) S23–S28. Saraiya. The emerging role of genetics professionals in forensic kinship DNA identiﬁcation after a mass fatality: lessons learned from Hurricane Katrina volunteers.D. Int. B. JSTOR.. M.S. 2 (2011) 15. Anderson. impact and communication. Dolan. Enhancing accurate data collection in mass fatality kinship identiﬁcations: lessons learned from Hurricane Katrina. relatedness. Forensic Sci. Int. Egeland. A. [20] G. Leclair. Abecasis. Dolan. R. 63 (2003) 173–178. Genet. Donkervoort. Thompson. Genet. Forensic Sci. Kloosterman. Egeland. R. Sjerps. 56 (Suppl. Genet. Chakraborty. Symbolic kinship program. 5 (2011) 291–296. J. T. Forensic Sci. Ge. Med. 2 (2008) 354–362. D. S. Popul.A.R. Genet. Forensic Sci. Kling et al. Budowle. J. Disaster victim investigation recommendations from two simulated mass disaster scenarios utilized for user acceptance testing CODIS 6. Weir. Quak. Hepler. J.

- c1s1Uploaded bycupidkhh
- dieboldliyue_joe08Uploaded byDillan Fortuin
- 01211432Uploaded byLongLong Yu
- MG Lecture 4-1Uploaded byAbdullah As'ad
- Hayes Et Al 2007Uploaded byJaeRose60
- Dr. Gerald Crabtree - Our Fragile Intellect, Part IUploaded byLeakSourceInfo
- 11 3 National Reuters/IPSOSUploaded byNoFilter Politics
- Viscum GeneticUploaded byFunda Oskay
- 10th Guess 2018Uploaded byMuhammad Usman
- Quasi-Bayesian Model SelectionUploaded bymaenfadel
- 2Uploaded byChristian Beren
- statistic-descriptive-ch-5-uph-july-2007 (2).pptUploaded byHaniya Khan
- Zoology Solved MCQSUploaded byKaleem Chaudhary
- Quiz 1Uploaded byMarie Okeen
- Cape Biology 2016 u1 p2Uploaded bySachin Bahadoorsingh
- DOC-51521Uploaded bySanjeev Malik
- 6_Probit&LogitUploaded byRetnaningrum Rn
- Monograph 29Uploaded byabdel_79
- Animal BreedingUploaded byMamta
- Long Term PlanningUploaded byRj
- Why-SIMUL8.pdfUploaded bymscarrera
- LONG TERM PLANNING.docxUploaded byflpfrn
- Remote Monitoring SystemUploaded byesmani84
- CapacityManagementatLittlefield (5)Uploaded byArjun Nayak
- Spina BifidaUploaded byJane de Castro
- biology sol pre-test part 2Uploaded byapi-110789702
- QRC_KO01_Create_Internal_OrdersUploaded byrameshd9
- Convergent Adaptation of Human Lactase Persistence in Africa AndUploaded byCraig Mawson
- chapter 2Uploaded bysabrina309
- FishUploaded byPutri Hamidah

- RES-1506 EXENTA_06-JUN-2009Uploaded by(unknown)
- ADN DropIn-DropOutUploaded by(unknown)
- María La Dura No Quiero Ser Ninja - ResumenUploaded by(unknown)
- DNA Lab ReportsUploaded by(unknown)
- Amigos Del Alma. Elvira LindoUploaded byYane Aroca Torres
- Amigos Del Alma. Elvira LindoUploaded byYane Aroca Torres

- Handout Psychometric TestsUploaded byTANTRIC555
- Directory of Expert Empanelment-EOI-Final-201301031.docUploaded bylaloo01
- wp_46853Uploaded bylodhia_pallav
- Adobe Clerical Guide V1 9.19.03 2Uploaded byEysOc11
- Hegel on Reference & KnowledgeUploaded byWillem DeVries
- analysis of bronfenbrennerUploaded byapi-385079933
- Linear ProgrammingUploaded byNorhapidah Mohd Saad
- Muhammad ShoaibUploaded byShoaib Muhammad
- Concrete Mix Design GuideUploaded byluke_0126
- Experts Guide to Successful MRP 090712Uploaded byNiraj Kumar
- large group time planning form activity 4Uploaded byapi-253522479
- Commissioning GuideSWEGONUploaded bypiros
- 2 Fundamentals of CorrosionUploaded byKenaouia Bahaa
- ‘It is necessary to have a religious experience in order to understand fully what a religious experience is'Uploaded byMaliha Ahmed
- Catalogue SmallUploaded byAyad Taha
- Thesis ConclusionUploaded byLidia García
- SUPPRESSION OF VFT IN 11OOkV GIS BY ADOPTING resistor fitted disconnectors.pdfUploaded byMuthuraj Ramaswamy
- COMPENSATION__MANAGEMENT[1]Uploaded bySneha Macher
- Dialogue English OzzyUploaded byRuab Plos
- Technics sa gx 290Uploaded byMarkov Vojislav
- mcqs 1Uploaded byRalitsa DeeDee Wilcox
- Python CryptoUploaded byinfinitus_
- Hrm IntroductionUploaded bymohanish_05
- Manual 7600Uploaded bylaymeperalta
- Carrier based Single-state PWM Technique.pdfUploaded byscribdnvnho
- vlsi 2017Uploaded byAnonymous 1aqlkZ
- unco tasl 501 rubricUploaded byapi-300516382
- Human Issues in Call Centers and BPO IndustryUploaded byVishal Kalal
- TECSYSUploaded byJenny Quach
- PPoint libro Microsoft.pdfUploaded byantoniop940