Professional Documents
Culture Documents
Research paper
A R T I C L E I N F O A B S T R A C T
Keywords: Distinguishing relationships with the same degree of kinship (e.g., uncle–nephew and grandfather–grandson) is
Kinship analysis generally difficult in forensic genetics by using the commonly employed short tandem repeat loci. In this study,
Single nucleotide polymorphisms we developed a new method for discerning such relationships between two individuals by examining the number
Same degree of kinship of chromosomal shared segments estimated from high-density single nucleotide polymorphisms (SNPs).
Chromosomal sharing
We computationally generated second-degree kinships (i.e., uncle–nephew and grandfather–grandson) and
DNA microarray
third-degree kinships (i.e., first cousins and great-grandfather–great-grandson) for 174,254 autosomal SNPs
considering the effect of linkage disequilibrium and recombination for each SNP. We investigated shared
chromosomal segments between two individuals that were estimated based on identity by state regions. We then
counted the number of segments in each pair.
Based on our results, the number of shared chromosomal segments in collateral relationships was larger than
that in lineal relationships with both the second-degree and third-degree kinships. This was probably caused by
differences involving chromosomal transitions and recombination between relationships. As we probabilistically
evaluated the relationships between simulated pairs based on the number of shared segments using logistic
regression, we could determine accurate relationships in > 90% of second-degree relatives and > 70% of third-
degree relatives, using a probability criterion for the relationship ≥0.9. Furthermore, we could judge the true
relationships of actual sample pairs from volunteers, as well as simulated data. Therefore, this method can be
useful for discerning relationships between two individuals with the same degree of kinship.
1. Introduction matches. Because the ICS values reflect the total genetic length of
shared chromosomal regions between two individuals, the values relate
In forensic genetics, a pairwise kinship analysis is typically per- to the expected proportion of the shared genome (i.e., degree of kin-
formed to investigate a relationship between two individuals by cal- ship). Thus, the ICS values in sibling (first-degree relative), un-
culating the likelihood ratio [1]. The likelihood ratio is generally the cle–nephew (second-degree relative), first cousin (third-degree re-
ratio between the probability that certain DNA typing results would be lative), first cousin once removed (fourth-degree relative), and second
obtained if the pair is related versus that obtained if the pair is un- cousin (fifth-degree relative) were distributed as log-normal distribu-
related. Meanwhile, when no predicted relationship exists between the tions, centering around the median values of 2805, 1919, 1018, 544,
pair (e.g., identification of bone remains, without personal belongings), and 311, respectively. Using this approach, we could determine the
all possible relationships should be compared simultaneously. We have degrees of kinship up to the third-degree, regardless of sex, even when
previously proposed a method for pairwise kinship analysis using the the kinship of the pair was totally unpredictable.
“index of chromosome sharing” (ICS) calculated from high-density However, we probably cannot distinguish relatives with the same
autosomal single nucleotide polymorphisms (SNPs) [2]. In the ICS degree of kinship (e.g., uncle–nephew and grandfather–grandson) using
calculation, we investigate the genetic length (centi-Morgan (cM)) of all this approach, because these relatives have the same expected total
identity by state (IBS) segments between the two individuals, and then genetic length of shared chromosomal regions. If possible, adding key
sum the genetic length of the IBS segments longer than a given individuals is effective for increasing information [3]. Otherwise, in-
threshold (i.e., 4 cM), which is applied for removing coincidental vestigating markers on the sex chromosomes [4–6] might be useful to
⁎
Corresponding author.
E-mail address: ktamaki@fp.med.kyoto-u.ac.jp (K. Tamaki).
https://doi.org/10.1016/j.fsigen.2017.11.010
Received 11 September 2017; Received in revised form 7 November 2017; Accepted 14 November 2017
1872-4973/ © 2017 Elsevier B.V. All rights reserved.
C. Morimoto et al. Forensic Science International: Genetics 33 (2018) 10–16
discriminate such relationships. For example, Y chromosomal markers types of second-degree relative pairs: uncle–nephew (Nos. 5–8 in Fig. 1)
are useful for distinguishing paternal relatives from maternal relatives. as a collateral relationship, and grandfather–grandson (Nos. 1–8 in
However, it is impossible to discriminate within paternal relationships Fig. 1) as a lineal relationship. It also included two types of third-degree
such as uncle–nephew and grandfather–grandson. Then, we can theo- relative pairs: first cousins (Nos. 8–9 in Fig. 1) as a collateral re-
retically distinguish between second-degree relatives by using identity lationship, and great-grandfather–great-grandson (Nos. 1–11 in Fig. 1)
by descent (IBD) probabilities of two linked loci [7–9]. However, these as a lineal relationship.
studies were not verified by practical data, and the accuracy is un- In the SNP typing system based on DNA microarrays, uncalled
known. events (i.e., genotypes that cannot be obtained) or genotyping errors
Chromosomal sharing patterns, including the number and the ge- (e.g., a true genotype, AB, is miscalled as AA) occur infrequently even
netic length of the actual shared segments (i.e., IBD regions) on the in pristine DNA samples. In this study, we also included uncalled events
chromosomes, are expected to differ between relatives, even for the and genotyping errors into the simulated genotypes for each prob-
same degree of kinship, due to differences in the frequency of meioses ability. The values of the two probabilities were determined from the
and chromosomal transitions from their common ancestors [10–14]. If actual genotyping results of 67 pristine DNA samples typed on the
two individuals have a collateral relationship, the shared chromosomal HumanCore BeadChip (Illumina) in our previous study [2]. Detailed
segments are expected to be larger in number and shorter in genetic methods are presented in the Supplementary information and Supple-
length than those in lineal relationship with the same degree of kinship. mentary Table 1. Uncalled events were randomly generated at all SNPs,
This is because the number of recombination events from the common with a probability of 5.97 × 10−4. We assumed that genotyping errors
ancestors is larger in a collateral relationship than that in a lineal re- occurred only in heterozygous genotypes through allele imbalance (i.e.,
lationship with the same degree of kinship [11,13]. one allele in the heterozygous genotype is undetected). Therefore,
In this study, we further developed our previous approach to dis- genotyping errors were introduced by converting heterozygous geno-
tinguish between relationships with the same degree of kinship by in- types into randomly chosen homozygous genotypes, with a probability
vestigating the differences between chromosomal sharing patterns. We of 2.28 × 10−4. All programs used for simulations were run using the
targeted the second and third-degree of kinship, including collateral statistical software R, version 3.3.2 [18].
and lineal relationships: uncle–nephew vs. grandfather–grandson as
second-degree kinships, and first cousins vs. great-grandfather–great- 2.2. Analyzing chromosomal sharing patterns between individual pairs
grandson as third-degree kinships. We focused on the number of shared
chromosomal segments between two individuals. The shared segments We investigated the differences in chromosomal sharing patterns
were estimated on the basis of IBS regions, the effect of coincidental between collateral relatives and lineal relatives with the same degree of
matches being minimized. For a probabilistic prediction of the re- kinship. We ordered all 174,254 SNPs according to their chromosomal
lationships between two individuals, we proposed a logistic regression positions and then counted the IBS state at each locus between two
model, using computationally generated genotypes. We also confirmed individuals. The IBS state has three possible outcomes: 0, 1, or 2. SNPs
the validity of the proposed method employing actual samples that with uncalled events in either individual are ignored in the subsequent
were genotyped using DNA microarray. analysis. We then chose IBS regions (i.e., regions where 1 or 2 are
continuously aligned), which could reflect chromosomal sharing.
However, some IBS regions include coincidental matches, because even
2. Methods unrelated pairs can share SNPs by chance. Therefore, we have to ex-
clude comparatively shorter IBS regions to minimize the effect of co-
2.1. Generating computer-based familial genotypes of high-density SNPs incidental matches. We set a threshold for the genetic length of IBS
regions between two individuals using receiver operating characteristic
We utilized 249 computationally synthesized familial genotypes curve analysis. We identified IBS regions longer than the threshold (i.e.,
(Fig. 1) of 174,254 autosomal SNPs on the HumanCore BeadChip (Il- 4 cM) as apparent “shared segments.”
lumina, San Diego, CA, USA) used in our previous study [2]. To analyze the total amounts of chromosomal sharing, ICS values in
The SNP haplotypes of the founders (i.e., Nos. 1, 2, 3, 6, 7, and 10 in each pair were calculated by summing the genetic length of the shared
Fig. 1) were estimated using the Shape-IT algorithm [15] and 1498 segments in the same way as our previous study [2]. In this study, we
Japanese individuals in the Nagahama Prospective Genome Cohort also counted the number of the shared segments in each relationship.
[16], to incorporate linkage disequilibrium into the computational We then compared the ICS values and the number of shared segments
genotypes. We assumed that the founders were unrelated (i.e., non- between collateral relatives and lineal relatives with the same degree of
inbred pedigrees). We then generated the haplotypes of the descendants kinship: uncle–nephew vs. grandfather–grandson and first cousins vs.
by taking into account recombination events between each SNP based great-grandfather–great-grandson.
on sex-averaged recombination rates [17]. The family included two
2.3. Logistic regression analysis
11
C. Morimoto et al. Forensic Science International: Genetics 33 (2018) 10–16
genotypes (including 166 collateral pairs and 166 lineal pairs for each 2.4. Validation using actual samples
degree of kinship) as training datasets to build the logistic models using
the glm function in the R statistical language [18]. The remaining To validate the prediction model, we used the SNP genotypes of
genotypes (including 83 collateral and 83 lineal pairs for each degree of actual second-degree relative pairs and third-degree relative pairs
kinship) were considered as test datasets. We calculated the Pr(C|x ) and analyzed in our previous study [2]. These included 17 uncle–nephew
Pr(L|x ) of the test datasets from Eq. (1) and evaluated the accuracy and pairs, 17 grandfather–grandson pairs, 14 first cousin pairs, and two
false classification rate of the prediction model. Correct predictions great-grandfather–great-grandson pairs. These pairs were used regard-
essentially represent instances for which the true relationship is equal less of sex because we targeted only autosomal SNPs. All participants
to the relationship with a higher probability. For example, when Pr(C|x ) provided written informed consent, and this study was approved by the
is 0.6 and Pr(L|x ) is 0.4 in a collateral relative pair, the pair is correctly Ethics Committee of the Graduate School of Medicine, Kyoto Uni-
classified as a collateral relationship. However, these probabilities do versity. All samples were genotyped by using the HumanCore-12
not strongly support the collateral relationship in actual forensic case- BeadChip or HumanCore-24 BeadChip (Illumina), according to the
work. Therefore, we set three criteria of judgement for kinship de- manufacturer’s protocol. We checked the genotyping quality from the
termination by including stricter criteria referring to the classical genotyping success rates by confirming that the rates were greater than
Hummel’s predicates of paternity [19], as in our previous study [2]. The 0.99, as recommended by the manufacturer to ensure good quality data.
criteria were as follows: From the genotypes of 174,254 SNPs, we investigated the number of
shared segments between each pair, as well as for the computationally
(1) a relationship is “supported” if the probability is ≥0.5, generated pairs. We then determined whether the pair has a collateral
(2) a relationship is “likely” if the probability is ≥0.9, and relationship or lineal relationship by using our prediction models, si-
(3) a relationship is “practically proven” if the probability is ≥0.998. milar to the test datasets, and confirmed our prediction accuracy.
When both Pr(C|x ) and Pr(L|x ) were lower than the criterion, we
judged the relationship of the pair as “inconclusive” in the criterion of 3. Results
(2) and (3).
3.1. Comparison of chromosomal sharing patterns between collateral and
lineal relatives
In the simulation data, the distributions of ICS values for the same
12
C. Morimoto et al. Forensic Science International: Genetics 33 (2018) 10–16
degree of kinship largely overlapped, as expected (Fig. 2). Therefore, 3.2. Accuracy of the logistic regression model
we could not distinguish collateral relatives from lineal relatives for the
same degree of kinship by using only ICS values. On the other hand, By using the training datasets, we estimated β0 and β1 for the lo-
there were large differences in the number of shared segments between gistic regression model. In the second-degree relationship, β0 and β1
the uncle–nephew and grandfather–grandson pairs, despite the same were −52.5 and 1.03, respectively. In the third-degree relationship, β0
degree of kinship (Fig. 3a). The minimum, median, and maximum va- and β1 were −24.0 and 0.557, respectively. Fig. 4 shows the logistic
lues for the uncle–nephew pairs were 47, 63, and 79, respectively. curves and the number of shared segments in the training datasets
These three quantiles were larger than the values for the grand- (boxplots) and indicates that the logistic models fitted the predicted
father–grandson pairs (29, 39, and 53, respectively). Similarly, the relationship.
number of shared chromosomal segments in collateral relationships was We evaluated the accuracy of the prediction models by using the
larger than that in lineal relationships with a third-degree of kinship test datasets (Table 2). About 80% of second-degree relatives had cor-
(Fig. 3b). The median values of the number of shared segments for first rectly predicted relationships, even when judged by the strictest cri-
cousins and great-grandfather–great-grandson kinships were 51 and 37, terion (i.e., Pr(C|x ) or Pr(L|x ) ≥0.998). Third-degree relatives were
respectively. From these results, the genetic length of each shared predicted with over 70% accuracy, as the criterion was ≥0.5 or ≥0.9.
13
C. Morimoto et al. Forensic Science International: Genetics 33 (2018) 10–16
Table 2 should select strict criteria, even if inconclusive results occur. In addi-
Prediction accuracy of the test datasets using three criteria. tion, when we tested the accuracy and false classification rates of the
training datasets, few differences were observed (Supplementary
Degree of kinship Criteria
Table 2).
≥0.5 ≥0.9 ≥0.998
3.3. Predicting relationships of actual samples
Second-degree 98.2% 95.2% 86.1%
Third-degree 94.0% 76.5% 22.9%
The genotyping success rates for all actual samples were > 0.99;
therefore, all data could be used for predicting relationships. The pre-
Table 3 diction accuracy of actual samples was similar to that of test datasets
False classification rates of the test datasets using three criteria. (Table 4). When we adopted a probability of ≥0.5, all samples were
judged as true relationships except for one first cousin pair. Because the
Degree of kinship Criteria
number of shared segments in this pair was 43, Pr(C|x ) and Pr(L|x ) were
≥0.5 ≥0.9 ≥0.998 0.479 and 0.521, respectively. Although these probabilities slightly
supported the hypothesis that the pair had a great-grandfather–great-
Second-degree 1.81% 0.602% 0% grandson relationship, we could not judge whether either relationship
Third-degree 6.02% 1.81% 0%
was “likely” using our criteria. Furthermore, when we adopted a
probability of ≥0.9 or ≥0.998, there was no false classification. These
However, it was difficult to distinguish first cousins from great-grand- results suggest that this method has a high ability for discriminating
parents–great-grandsons using the strictest criteria. However, false relationships with the same degree of kinship, especially second-degree
classification could be decreased because the criterion was stricter relationships.
(Table 3). Meanwhile, inconclusive results increased (e.g., 13.9% of
second-degree pairs and 77.1% of third-degree pairs, as the criterion 4. Discussion
was ≥0.998). To avoid incorrect determination in actual casework, we
We had previously proposed an approach for pairwise kinship
14
C. Morimoto et al. Forensic Science International: Genetics 33 (2018) 10–16
15
C. Morimoto et al. Forensic Science International: Genetics 33 (2018) 10–16
[18] R.R. Core Team, A Language and Environment for Statistical Computing, R detection and applications, Annu. Rev. Genet. 46 (2012) 617–633, http://dx.doi.
Foundation for Statistical Computing, Vienna, Australia, 2017 (Available), http:// org/10.1146/annurev-genet-110711-155534.
www.R-project.org/. [23] L. Han, M. Abney, Using identity by descent estimation with dense genotype data to
[19] K. Hummel, Biostatistical Opinion of Parentage, Stuttgart, Gustav Fischer Verlag, detect positive selection, Eur. J. Hum. Genet. 21 (2013) 205–211, http://dx.doi.
1971 (written in German and English). org/10.1038/ejhg.2012.148.
[20] M. Prinz, A. Carracedo, W.R. Mayr, N. Morling, T.J. Parsons, A. Sajantila, [24] C.D. Huff, D.J. Witherspoon, T.S. Simonson, J. Xing, W.S. Watkins, Y. Zhang,
R. Scheithauer, H. Schmitter, P.M. Schneider, DNA Commission of the International T.M. Tuohy, D.W. Neklason, R.W. Burt, S.L. Guthery, S.R. Woodward, L.B. Jorde,
Society for Forensic Genetics (ISFG): recommendations regarding the role of for- Maximum-likelihood estimation of recent shared ancestry (ERSA), Genome Res. 21
ensic genetics for disaster victim identification (DVI), Forensic Sci. Int. Genet. 1 (2011) 768–774, http://dx.doi.org/10.1101/gr.115972.110.
(2007) 3–12, http://dx.doi.org/10.1016/j.fsigen.2006.10.003. [25] H. Li, G. Glusman, H. Hu, J. Shankaracharya, R. Caballero, D. Hubley,
[21] B.L. Browning, S.R. Browning, Improving the accuracy and efficiency of identity-by- S.L. Witherspoon, D.E. Guthery, L.B. Mauldin, L. Jorder, J.C. Hood, C.D. Roach,
descent detection in population data, Genetics 194 (2013) 459–471, http://dx.doi. Huff, Relationship estimation from whole-genome sequence data, PLoS Genet. 10
org/10.1534/genetics.113.150029. (2014) e1004144, http://dx.doi.org/10.1371/journal.pgen.1004144.
[22] S.R. Browning, B.L. Browning, Identity by descent between distant relatives:
16