You are on page 1of 18

iScience ll

OPEN ACCESS

Perspective
Review: Computational analysis of human skeletal
remains in ancient DNA and forensic genetics
Ainash Childebayeva1,2,* and Elena I. Zavala3,4

SUMMARY
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics.
While aDNA studies typically center around human evolution and past history, and forensic genetics is
often more concerned with identifying a specific individual, scientists in both fields face similar challenges.
The overlap in source material has prompted periodic discussions and studies on the advantages of collab-
oration between fields toward mutually beneficial methodological advancements. However, most have
been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In
this review, we focus on the computational side of the analytical workflow. We discuss limitations and con-
siderations to consider when working with degraded DNA. We hope this review provides a framework to
researchers new to computational workflows for how to think about analyzing highly degraded DNA and
prompts an increase of collaboration between the forensic genetics and aDNA fields.

INTRODUCTION
Human genetics is a cornerstone of both the fields of forensic genetics and ancient DNA (aDNA). In forensic genetics, this typically relates to
linking DNA recovered from a piece of evidence to a specific individual. This can include not only matching DNA profiles, but also information
about an individual’s phenotype,1 genetic ancestry,2 and/or relatives3 that may be paired with non-genetic evidence to narrow the investiga-
tive space. In aDNA, recovered DNA has been used to learn more about past human interactions, kinship structures and migrations,4–7 test
evolutionary hypotheses,8 and to study phylogenetic relationships between archaic lineages and their modern representatives.9–11 Degraded
DNA is a hallmark of aDNA, due to the time periods of the remains from which data are generated. Human identification (HID) casework, a
subset of forensic genetics that includes disaster victim identification, active and cold cases, and historical identifications, also deals with
degraded DNA depending on the time periods and environmental conditions of the recovered human remains.12–14 The challenges faced
with generating DNA profiles from degraded DNA are therefore shared between the forensic genetics and aDNA fields. Overlaps and ben-
efits of exchanging laboratory protocols between these fields have been previously outlined.15,16 In this review, we build on this foundation by
focusing on the impacts of degraded DNA on the computational portion of analysis while highlighting overlapping and distinct features be-
tween forensic genetics and aDNA.
The key characteristics of degraded DNA are its relatively short fragment length (30–70 base pairs), limited quantity, and damage pat-
terns,17–20 each of which presents a challenge that both fields have worked to overcome for data generation and analysis. Conventional labo-
ratory methods for isolating non-degraded DNA from different sources (DNA extraction) and preparing it for downstream analysis favor the
exclusion of small DNA fragments, which are typically thought to be artifacts or uninformative. These protocols have thus needed to be altered
for application to degraded DNA. Early exchanges of DNA extraction protocols between the fields21–24 have continued through the decades,
leading to the recovery of DNA fragments less than 50 base pairs14,25–29 and establishing pre-treatment protocols for contamination
removal.30–32 The later due to the low endogenous DNA content of degraded samples, which makes them susceptible to exogenous contam-
ination from other DNA sources (i.e., microbial and non-degraded human DNA). This has resulted in all pre-DNA amplification steps being car-
ried out in specialized clean room laboratories dedicated to aDNA work,20,33,34 with similar guidelines being established for forensic analyses.35
Degraded DNA is typically fragmented and present in low quantities which is a challenge for preparing the extracted DNA for downstream
analyses. DNA cloning was used to identify the first aDNA fragments,36,37 but this method often generated artifacts that led to false positives.
While the advent of PCR helped to overcome this challenge, damage patterns and short fragment sizes resulted in low amplification efficiency
and the co-amplification of often indistinguishable contaminant DNA.19,20 The advent of next-generation sequencing (NGS) technology has
provided an avenue for data generation through parallel sequencing of millions of DNA molecules and downstream bioinformatic process-
ing. As with the initial data generation steps, the analysis of NGS data has required the development of bioinformatic tools and techniques to
address the difficulties arising from the degraded nature of the DNA source, which is the focus of this review.

1Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
2Department of Anthropology, University of Kansas, Lawrence, KS, USA
3Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
4Department of Biology, University of Oregon, Eugene, OR, USA

*Correspondence: ainash_childebayeva@eva.mpg.de
https://doi.org/10.1016/j.isci.2023.108066

iScience 26, 108066, November 17, 2023 ª 2023 The Author(s). 1


This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
ll iScience
OPEN ACCESS Perspective

Figure 1. Distribution of published ancient DNA data and frequency of forensic genetics NGS studies
(A) Map of published aDNA data. Frequency indicates a number of individuals, Years BP = thousands of years before present, Data source = capture data,
shotgun data, or a combination of both. AADR v54.163 was used for metadata; (B) A histogram of the number of articles with titles, abstracts, or keywords
that include forensics and mention NGS or massively parallel sequencing (gray) and degraded (yellow) based on a search on scopus.com; (C) Average
Annual Temperature map (The Nelson Institute Center for Sustainability and the Global Environment, University of Wisconsin-Madison); (D) Average Annual
Relative Humidity map (The Nelson Institute Center for Sustainability and the Global Environment, University of Wisconsin-Madison).

Despite all these challenges, in the last decade, the publication of more than 10,000 genome-wide and whole-genome data from ancient
humans has made it possible to learn more about human evolutionary history and genetics, even in areas that are known to be challenging for
DNA preservations due to high ambient temperatures and humidity (Figures 1A, 1C, and 1D). NGS-based methodologies have expanded the
information that can be gained from degraded DNA samples beyond the traditional forensic DNA profile standard of short tandem repeats
(STRs).38–42 Although STRs are unlikely to be replaced for routine forensic casework due to their prevalence in existing databases and the ease
of generating STR profiles, NGS analysis of SNPs has gained traction. SNPs have been shown to be more effective for generating DNA profiles
from degraded remains,43–47 including enabling the identification of individuals through more distant relatives (investigative genetic gene-
alogy, IGG) instead of via first-generation relatives or a direct match.48,49 The growing interest for NGS in forensic studies is exemplified by the
marked increase in studies related to NGS in forensics in the last decade (Figure 1B). The increase in the number of laboratories performing
research on degraded DNA, publications of step-by-step laboratory protocols,26,50–55 computational pipelines,56–58 and workflow primers55
has helped to ensure transparency and reproducibility of processing between aDNA datasets. Within forensics, organizations such as the Sci-
entific Working Group on DNA Analysis Methods (SWGDAM)59 (in the US), the European Network of Forensic Science Institutes (ENSFI),60
and the International Society of Forensic Genetics (ISFG)61 have served as platforms for discussion, sharing of protocols, and the development
of guidelines for quality assurance of forensic DNA analysis. However, in both fields, step-by-step pipelines of the computational workflows,
including discussions around limitations, are limited.
Since the advent of NGS, a natural widening has occurred between the forensic and aDNA fields. The legal connotations of forensic case-
work and its consequences on people living today require forensic laboratories to adhere to strict quality assurance standards for laboratory
accreditation and strict IT requirements.62 This includes performing verification and validation studies before the implementation of new wet
laboratory methods and software (including any version updates). These rigorous standards necessarily slow the integration of new technol-
ogy into forensic genetics practice, emphasizing the importance of understanding the limits and factors impacting the accuracy of new
methods and techniques. Leveraging the flexibility of the aDNA field to explore and test new methods has the potential to narrow the search
space for advancing forensic genetics technology, as has already been discussed for laboratory methods.15 In this review, we focus on the
computational workflows performed in forensic genetics and aDNA analysis when working with low-coverage NGS data. This includes a dis-
cussion of limitations and contextualizing the decision-making processes involved at different steps. We hope this both provides a solid foun-
dation for those new to computational analysis in either field, but also prompts interdisciplinary conversations that will lead to mutually bene-
ficial advancements in forensics and aDNA.

SAMPLING AND LABORATORY WORK


The general laboratory workflow for degraded DNA analysis can be divided into five steps: sample preparation, DNA extraction, library prep-
aration, in some cases targeted enrichment, and sequencing (Figure 2). The genetic material used for the analyses covered in this review is
typically recovered from skeletal material; however, rootless hair has also been shown to yield degraded DNA,64–66 including in commercial
forensics applications.67 DNA is extracted and purified from bone or tooth powder that has been drilled or ground from a particular skeletal
element. The resulting DNA extract contains all DNA extracted from the sample, including microbial DNA and other non-endogenous DNA

2 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

Figure 2. Simplified workflow for forensic and ancient genetic analyses


The first step in the workflow is DNA extraction in a dedicated pre-PCR lab room. Following the DNA extraction, a genetic library is constructed by adding
barcodes/indices and adapters to the DNA for downstream sequencing. The library is then sequenced for downstream bioinformatic analyses. In some
cases, targeted capture is done to enrich for specific targets or types of DNA, also followed by NGS and downstream analyses.

(contamination) from individuals or animals who may have come into contact with the remains of interest. Extracted DNA is then converted
into libraries by adding adapter sequences, which allow it to be sequenced. Included in this library preparation step is the addition of indices
(short oligos) to the DNA library molecule, which work as a tag to identify which DNA sequences are associated with which library. Depending
on the data quality and types of questions that are being addressed, enrichment of specific loci may also be performed prior to sequencing.
While this provides a brief overview, there are different considerations that must be made for each step.
Due to the limited availability of ancient remains, the importance of these remains for relatives and descendant communities, or the his-
torical and biological record, many studies have focused on reducing the amount of bone material used, while maximizing the amount of DNA
recovered.68–70 While the petrous portion of the temporal bone has been found to be a rich source of DNA in both forensic and aDNA
studies,71 due to its importance for understanding hominin evolution and associated invasive sampling requirements, teeth and other
long bones are often substituted depending on the sample quality and availability.70 For the recovery of highly degraded DNA (<100
base pairs in length), different versions of an inorganic DNA extraction method26,72–74 are used in both aDNA and forensics, paired with either
double- or single-stranded (ss) DNA library preparation.53,75–78 Library preparation for forensics is typically amplicon based and sequencing
directly follows the completion of this step. For ‘‘younger’’ (<30,000 years) aDNA samples that have relatively better quality DNA, the dou-
ble-stranded library preparation method is typically coupled with a partial uracil-DNA glycosylase (UDG) treatment to repair DNA damage
while preserving the deamination on the terminal ends to allow for aDNA authentication.79 Decisions around library preparation methods and
whether whole-genome shotgun or targeted enrichment is performed impact how the data are analyzed downstream as will be discussed in
the following section.
Specific regions or types of DNA are sometimes targeted for enrichment (e.g., hybridization capture) in order to decrease sequencing costs
and/or restrict the type of genetic data that is produced.44,80–82 Which regions of the genome are targeted for analysis differ between forensic
and aDNA analyses. Forensic studies typically focus on common SNPs that are informative for individualization and may also include SNPs that
provide insights about genetic ancestry, phenotype (skin, hair, and eye color), or can be used for identifying genetic relatives44,49,83(Figure 3A).
Concerns around genetic privacy of individuals whose data may be collected for forensic databases also motivates the use of SNP panels instead
of whole-genome sequencing (WGS) in forensic casework to minimize collecting medically informative data.84 Ancient data capture arrays target
SNPs that are informative for evaluating genetic variation on a population rather than individual scale. One array commonly used in aDNA studies
is the ‘‘1240k’’ SNP capture array that targets 1.2 million single-nucleotide positions representing global genetic variation, as well as functional
SNPs and SNPs under selection,4,85 which has been commercially available since 2021.86 An updated version of the array, known as the ‘‘Twist
Ancient DNA’’ assay,87 is able to enrich for 1.4 million SNPs, containing additional SNPs not present on the 1240k array. A larger set of 3.7
million SNPs adds additional SNP panels to the 1240k set that are informative for genetic variation observed in Neanderthals and Denisovans.85
In addition, the aDNA field generally promotes open data sharing.88 However, there are cases when open data sharing is discouraged, for
example, when working with Indigenous groups89 (see the Public Databases section for further discussion).

INITIAL BIOINFORMATIC PROCESSING


In order to perform downstream analysis, raw sequence data must undergo preliminary bioinformatic processing (Figure 4). This generally
includes demultiplexing (assigning sequences to their specific libraries based on their assigned indices), trimming of adapter sequences,

iScience 26, 108066, November 17, 2023 3


ll iScience
OPEN ACCESS Perspective

Figure 3. Comparison of aDNA and forensic genetics analyses and quality control measures
(A) The different types of conventional downstream analysis currently performed in aDNA and forensic genetics fields as well as (B) quality control measures for
monitoring contamination and confirming endogenous DNA.

removal of PCR duplicates, filtering based on length and quality metrics, and mapping of sequences to a reference genome. These prelim-
inary steps are performed in both fields. Subsequent NGS forensic genetics analysis can be split into three categories (amplicon-based
sequencing, enrichment or capture, and WGS) all of which are typically performed with commercial kits. Amplicon-based sequencing is widely
used within forensics as it helps to maintain similarity to previous amplicon-based genetic workflows, compatibility with existing databases,
and is required for correctly identifying STR alleles due to challenges in determining the start and end positions of these short repetitive re-
gions. Unless unique molecular identifiers are used, PCR duplicate removal is not performed for amplicon-based sequencing. For mtDNA
analysis, different commercial software packages are available that have specifically been designed to work with mapping mtDNA (a circular
reference) and improved calling of indels (e.g., QIAGEN’s CLC Genomics Workbench,44,90 AQME,91 and SoftGenetics’s GeneMarker HTS92)
that also allow users to visualize the pileup of reads. Nuclear DNA kits are often paired with commercial bioinformatic workflows that take in
sequencing data, perform demultiplexing, adapter trimming, and mapping, and then present the user with genotype calls (e.g., Verogen’s
FGx system,93 or Thermo Fisher Scientific’s HID Ion GeneStudio S5 System94,95). The development of bioinformatics workflows for sequencing
applications in forensics is rare as forensic laboratories may not have the personnel (bioinformaticians) or the flexibility to develop such pipe-
lines when adhering to the information management and IT standards of government data security systems. Thus, commercial software pack-
ages such as Parabon Fc Forensic Analysis Software Platform, QIAGEN’s CLC Genomics Workbench, and SoftGenetic’s NextGENe may be
used for analysis of WGS and capture data, each of which includes a deduplication step in addition to the adapter trimming and mapping.
While helpful for reproducibility, it is often difficult for the general public to directly test how different data qualities impact specific workflow
elements via simulations or other testing as these workflows are typically not open source.
Initial bioinformatic analysis of aDNA data is conventionally performed as follows. After demultiplexing, raw sequencing FASTQ files are
trimmed to remove sequencing adapters,96 and the reads are mapped to the reference genome using tools like bwa97 or Bowtie98 with
relaxed parameters99 to produce BAM files. Duplicate reads are removed using tools such as Picard MarkDuplicates100 or Dedup.101 The
aDNA status is then authenticated based on deamination, represented by C-to-T substitutions on the 50 end (and G-to-A substitutions on
the 30 end in double-stranded libraries) (see Deamination in Figure 3B), and using tools like MapDamage2.0102 or DamageProfiler,103 which
allow the user to both visualize and quantify the damage patterns in the data. Based on the damage profile, read trimming is performed to
remove the DNA damage, which accumulates at the terminal ends of the reads, with the number of the bases trimmed depending on the type
and architecture of the library preparation. Two to three bases can be trimmed from a double-stranded UDG-half library,104 eight to ten bases
from a double-stranded non-UDG library (assessed per library), while the double-stranded full-UDG libraries are not trimmed. In the case of
non-UDG ssDNA libraries, the number of bases trimmed is typically performed on the terminal bases in a library-dependent manner.

Identifying contamination
Contamination in the context of genetic analysis of an individual’s skeletal remains refers to the presence of any DNA that is not from that
individual. The presence of contamination is a concern for both forensic and aDNA analyses and each field has developed different tech-
niques for monitoring its presence (Figure 3B). Forensic laboratories maintain databases of DNA profiles from potential contamination sour-
ces (individuals who are involved in casework, have been in the laboratory, etc.) and previous casework which can be compared to genotyped
casework samples.35 Internal validation studies are used to set coverage thresholds for calling consensus alleles and determining that a profile
is predominantly from a single individual.105,106 This is also part of what drives the high-coverage thresholds in forensic analysis. Data that are
determined to represent multiple individuals are excluded from downstream analyses. Finally, in some forensics laboratories, when possible
at least two skeletal elements or bone powder aliquots per individual are processed and the DNA profiles between these extracts are required
to match.12 This independent replication of results also serves as a means to check for concordance between replicates, and to monitor for
sample switches that may have occurred during batched downstream processing. Both forensic and aDNA fields carry contamination con-
trols, such as no-template controls also known as negative controls and reagent blanks containing all reagents used in the experiment minus
the sample,107 from DNA extraction and library preparation steps through to sequencing to identify potential contamination from reagents or
handling in the laboratory.

4 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

Adaptor removal
fastq Raw data QA/QC
Read merging/trimming
fastq

Read mapping and evaluation


Duplicates Contamination
mapping
removal estimation
Damage pattern
evaluation
Damage
bam removal

Genotyping Imputation and phasing

vcf geno/snp/ind vcf haps/sample

PCA, ADMIXTURE IBD sharing

F-statistics Local Ancestry Inference

Biological relatedness

Selection Scans

IGG/Profile matching

Figure 4. General bioinformatic processing steps in aDNA and forensic genetics


Light blue color indicates steps that are more relevant for aDNA, while light gray for forensics.

Methods for estimating contamination in aDNA studies are often based on the haploid DNA elements and rely on measuring heterozy-
gosity levels: X chromosome in males,108,109 or mtDNA (contamMix110,111 and schmutzi112) in both males and females. Other methods include
comparing the contamination in reads with and without aDNA damage,113 or evaluating the breakdown of linkage disequilibrium.114 Unam-
biguous sex determination can also be used as a metric for contamination assessment, except in instances of same-sex contamination, as well
as the unambiguous determination of mitochondrial and Y chromosome haplogroups. The methods listed previously that are based on con-
firming the presence of a single individual can also be used in HID casework as exemplified in a recent study on the analysis of hair from Lud-
wig van Beethoven.115

Limitations of contamination metrics


Each of the contamination evaluation methods described previously has its own limitations, the impacts of which are dependent on the data
that are being evaluated. Methods based on the coverage of the X chromosome are most effective when applied to male individuals for de-
tecting female contamination. Detecting male contamination is still possible, but requires determining if there are multiple X chromosomes
resulting in polymorphic loci. Methods that rely on mtDNA overcome this issue, but still require at least 3-fold coverage depth (e.g.,
schmutzi112) and an even coverage of the mitogenome. Other methods were designed for specific sample preparation and sequencing pa-
rameters (e.g., AuthentiCT113), such as single-stranded library preparation and paired-end sequencing. More generally, contamination esti-
mation methods tend to work more reliably on higher coverage data, and are not as accurate when used on low-coverage samples. Users
must consider which tools are most applicable to their data and may decide to use multiple methods. It should also be noted that contam-
ination estimates for the mtDNA and nuclear DNA can differ and it is therefore necessary to monitor the mt/nuclear (nc) DNA ratio.116 The mt/
nc ratio is known to vary between and within the same bone sample,117 and can influence the contamination estimates by underestimating
nDNA contamination when extrapolated from mtDNA in cases of high mt/nc ratio.116 Overall, for aDNA analysis, a contamination level of 5%
is often considered an upper threshold for inclusion in the downstream analyses.109,114

Chromosomal sex determination


Sex determination based on evidence for the presence of different sex chromosomes is common practice in forensic and aDNA analysis for
both inferring biological sex and evaluating the presence of contamination. Within the forensic field, sex determination is sometimes per-
formed by looking for the number of copies of the amelogenin gene where two copies indicate a male and one indicates a female. However,
this test is not always reliable as a deletion of this gene in males has been observed,118,119 leading to the use of other regions of the Y
chromosome.120,121
In aDNA studies, different methods for chromosomal sex determination are used based on the data analyzed. The first and perhaps most
straightforward approach is to evaluate the ratio of average coverage across the X and Y chromosomes compared to the average coverage
across autosomal chromosomes,122 while normalizing for the target size of each chromosome.123 The expectation for this method (after

iScience 26, 108066, November 17, 2023 5


ll iScience
OPEN ACCESS Perspective

normalization) is that males with one X chromosome would have half the coverage on the X compared to females. This method has been
shown to work with at least 1,000 reads, but should only be applied to WGS data. When using capture data, different methods are used
to correct for the preferential enrichment of regions of the autosomal and sex chromosomes which impacts expectations around coverage
ratios. When limiting analysis to a set of ‘‘390k’’ array SNPs (a subset of the 1240k SNP panel), a ratio of reads mapping to the Y and X chro-
mosomes is calculated as Y/(Y + X).4 When expanding the analysis to the full 1240k SNP panel, these ratios are corrected by the number of
bases targeted on each chromosome.124 Another technique, which has been used for both low-coverage WGS and capture data, is to calcu-
late the ratio of the average coverage of the X chromosome to the average coverage of X and autosomal chromosomes (X/(X + auto)).125 Care
should be taken when applying this technique to capture data, which is already known to not work as expected with the 1240k SNP panel.
Sanity checks by comparing calculations with similarly processed data for individuals of known chromosomal sex can be helpful in these sit-
uations. All methods mentioned will be impacted by the presence of contamination and also are created to differentiate between a binary
where XX and XY are the two possible outcomes. Alternative methods have been developed for identifying other karyotypes, which have
resulted in the identification of ancient individuals who may have had Klinefelter syndrome (XXY).126,127

Summary statistics
The points for making decisions around data quality and downstream processing differ between the forensic genetics and aDNA fields. In
forensic workflows, qPCR is performed on DNA extracts to quantify the amount (nanograms) of DNA present in the extract and to detect po-
tential PCR inhibition.128–130 This step is used for calculating input volumes for library preparation. Evaluation of contamination and coverage
estimates is then combined with the genotyping and analysis steps. The motivation behind this workflow is the large numbers of samples
(predominantly non-degraded), time constraints, and that validated workflows cannot be changed in order to enable direct comparisons
across laboratories and to maintain laboratory accreditations.
In contrast, many aDNA laboratories perform low-coverage WGS to evaluate the data quality and make decisions around how, and if, to
generate more data. In studies with large numbers of individuals that are presumed to have DNA of similar quality, a subset of skeletal ele-
ments may be evaluated for their DNA preservation before making decisions that are applied to the full set of skeletal remains. The set of
summary statistics used for this initial data quality evaluation typically include the percent of sequences that map to the reference genome
(in total and for a certain length cutoff), duplication rate, deamination percentages, coverage of the reference genome, complexity of each
library, and contamination. The percent of mapped reads informs on how much human DNA (endogenous and contamination) is present in a
DNA library relative to all sequences (and categorized by a minimum length, typically 25–35 base pairs). If a library contains a high percentage
of short DNA fragments, gel cuts, or physical separation of DNA band(s) above a certain fragment length from an agarose gel for downstream
analysis, may aid in decreasing sequencing costs.53 As this is a time-intensive and complex protocol, it is not recommended for routine use.
Duplication rates can be a reflection of the amount of unique DNA molecules in a library, since the more times that the same original DNA
molecule is sequenced the less likely it is that new, unsequenced molecules are still present in a library. High duplication rates can indicate that
increased sequencing depth will unlikely result in an increased genome coverage. Deamination rates are used to determine the aDNA status,
and an arbitrary cutoff of at least 10% observed C-to-T substitutions on terminal ends is used to indicate the presence of aDNA in non-UDG-
treated libraries, or C-to-T substitutions in only the two terminal bases of reads in UDG-half libraries. Coverage estimates begin to provide
information about how much data are sequenced from a given library; however, this is only informative for the portion of the library that was
sequenced. To determine how much data may still be available in the library, we recommend using complexity estimates. This can be
measured by first calculating the number of informative sequences present in the library (percentage of mapped reads above a certain length
and quality threshold multiplied by the number of molecules present in the library as determined by qPCR).131 The number of informative
sequences are then multiplied by the average fragment length of filtered reads and divided by the genome target size (i.e., 3 billion for
the human genome). This metric provides an estimate for the theoretical coverage that can be obtained from a library if every DNA molecule
is sequenced. Alternatively, library complexity can be calculated bioinformatically after sequencing using tools like Picard (GATK). Contam-
ination estimates (discussed previously) will provide information as to what percentage of the previously calculated complexity is endogenous
DNA. These numbers can then be used to estimate the cost and feasibility of generating different genome coverages. It is at this point where
decisions are made as to whether to proceed with data generation and if WGS or SNP capture-based approaches should be pursued. While
WGS is the gold standard when it comes to the amount of data generated and the potential analyses available, SNP capture is a more cost-
effective approach when taking into account the low endogenous content of the aDNA data.

GENOTYPING
Forensic genetics
The term genotyping, while focused on allele determination at specific loci, has been used to refer to different segments of workflows from
sample preparation to allele determination using various methods (e.g., capillary electrophoresis). In this review, we will refer to genotyping as
the process by which sequencing data are used to determine alleles at specific loci.
Software coupled with amplicon-based NGS kits for forensic applications use a binary or threshold genotyping approach where the
sequencing read pileup at each position is examined to determine if a locus is homozygous or heterozygous. Genotype calls are based
on predetermined analytical thresholds for allelic coverage and heterozygous balance and typically require relatively high coverages at
each locus (e.g., >650 reads for amplicon-based sequencing93). For degraded samples profiled with NGS, a 10X reporting threshold has

6 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

been used for both mtDNA12 and SNPs.48 Validation studies are key for setting reporting thresholds as outlined in SWGDAM guidelines,132
FBI Quality Assurance Standards,133 and ENSFI best practices manuals.134,135
Probabilistic genotyping offers an alternative to the binary approach that can include models evaluating multiple factors (e.g., number of
individuals, heterozygosity, amount of data) and incorporate prior knowledge based on available reference data (e.g., sequencing error, allele
frequency errors, patterns of linkage disequilibrium),136 to provide a probability that each genotyped SNP is correct. Genotype probabilities
can then be incorporated into downstream analyses and allow for more informed decisions on which SNPs to include in a final DNA profile.
The benefits of integrating probabilistic genotyping methods into forensic genetics have been recognized via validation studies and guide-
lines.137,138 While most of these studies are focused on STR and mixture analysis, the application of probabilistic genotyping to degraded,
single-source samples has in recent years begun to be explored for both identification of human remains and for identifying potential per-
petrators in criminal cases.45,139 Notably, the probabilistic genotyping method used in the study by Gordan et al., 202245 (ATLAS140) was
developed for aDNA data and allows the user to take into account deamination rates, which have been shown to be present in historical re-
mains.14 Due to the highly degraded quality of DNA in ancient studies, it is unsurprising that methods from this field may be useful for forensic
casework involving historical and/or degraded remains.

Ancient DNA
Genotyping of aDNA is often split into two categories depending on the data quality and planned downstream analyses. The first is pseudo-
haploid genotyping, which involves randomly selecting a single read per position in place of calling a true diploid haplotype. This is typically
performed when working with low-coverage data, genome-wide array data, or when analyzing a large number of genomes where the majority
are low coverage. There are different available software for performing pseudo-haploid genotyping, including pileupCaller and bam-caller
(Table S1). Each of these softwares allows users to specify which SNPs should be genotyped and allows filtering based on coverage, mapping
quality, and base quality. They can either randomly select a read from the pileup or, given sufficient coverage, select the allele supported by
the majority of the reads. This step should be performed after additional end trimming to minimize impacts of deamination. For non-UDG or
partial-UDG ssDNA libraries, deamination or contamination impacts can be further minimized by limiting calls of C-to-T SNPs to the reverse
strands, and G-to-A to the forward strands only. In addition, after genotyping, one can quantify the observed number of transitions and trans-
versions (C>T, A>G, A>C, etc.) to determine if this ratio (also known as Ti/Tv) differs from the expected value of 2–2.1 for WGS data.141 How-
ever, this method will not work for capture arrays where the Ti/Tv ratios significantly deviate from the expectation.
Probabilistic genotyping is generally used for aDNA samples with better coverage. There are different software that can be used to deter-
mine genotype likelihoods in ancient samples: snpAD,142 ATLAS,143 bcftools,144 GATK,145 ANGSD,108 and others (Table S1). A set of reference
genotypes from modern data is often employed to aid genotyping ancient samples, a commonly used one being the 1000 Genomes refer-
ence dataset.146 Again, trimming of termini is important prior to diploid genotype calling. However, tools like ATLAS143 are able to take into
account the aDNA damage when determining genotype likelihoods and thus additional preprocessing is not necessary prior to the genotyp-
ing. Moreover, when using non-UDG data, genotyping can be restricted to transversions only, which are not prone to aDNA deamination, and
are thus more reliable for downstream analyses, as well as restricting to damaged reads with PMDtools.147 Non-UDG ssDNA library data
further allow for processing reads separately, which can serve as an additional control.

Limitations and considerations


Limitations of genotyping can be examined from two different perspectives: genotype accuracy (error rate and allelic dropout) and the impact
of this accuracy on downstream analyses. Here, we focus on the former for currently used methods in forensic and aDNA work based on eval-
uations with simulation studies, which allow decoupling of laboratory and bioinformatic parameters. This excludes analysis pipelines paired
with commercial kits.
Low coverage is a known concern for genotyping as it can result in allelic dropout and increased stochasticity in allele sampling, compli-
cating differentiation between heterozygous and homozygous loci. A recent forensic case solving a 16-year-old double murder in Sweden48
encouraged pairing WGS with SNP panels as a check for these issues. Conventional forensic genotyping also does not take into account the
presence of damage patterns, which have been observed in DNA recovered from historical and forensic casework.14,148–150 Due to the high-
coverage values required for typical forensic casework, low rates of damage are negligible and are not expected to impact downstream an-
alyses, but may have an impact on low-coverage samples. Practices from the aDNA field of trimming ends of reads to remove damage pat-
terns or utilizing probabilistic genotypers that take postmortem damage into account may open up more degraded samples for HID analysis.
Genotyping of low-coverage aDNA data with pseudo-haploid calling in theory does not have a limit as, given coverage by at least one
read, an allele can always be selected. However, the presence of contamination can decrease the chance of randomly sampling reads
from an endogenous content. In parallel processing, limiting analyses to putatively deaminated fragments can serve as a sanity check for eval-
uating if certain signals are contamination driven. Another check is to use the f4-statistic, a summary statistic measuring correlations in allele
frequencies between four populations151 (see the downstream analyses section for explanation of f statistics), in the form f4 (all fragments,
deaminated fragments; set of test modern populations, outgroup). If there is no contamination, the resulting statistic should be 0, i.e., indis-
tinguishable from 0. Reference bias is also a concern and can be checked with an f4-statistic if there is a diploid version of the genotype as well
(i.e., in scenarios where higher and low coverage data are co-analyzed). The f4-statistic can be used to detect reference bias in pseudo-haploid
testing, when set in the form f4 (diploid genotypes, pseudo-haploid genotypes; reference genome, outgroup). A significant negative

iScience 26, 108066, November 17, 2023 7


ll iScience
OPEN ACCESS Perspective

f4-statistic would indicate attraction between the pseudo-haploid data and the reference (reference bias). In case of archaic individuals,
pseudo-haploid data may be attracted to the outgroup via the so-called ‘‘long-branch’’ attraction.
Reference bias continues to be a concern for probabilistic genotyping, which is relevant for both modern and ancient applications. This
bias was identified when researchers discovered that higher genotype probabilities are assigned to calls that are homozygous with respect to
the reference used for alignment, which are composed predominantly of European and African ancestry.152 Evaluating reference bias for gen-
otyping individuals from underrepresented populations continues to be assessed153 and many studies focused on these groups start with an
evaluation of genotyping accuracy for identifying rare variants. Modern reference databases may not fully represent the genetic variation of
individuals involved in forensic casework or past populations studied in aDNA. Moreover, modern individuals represent already admixed
states, and thus exhibit different patterns on linkage disequilibrium and potentially shorter haplotypes compared to the more ancient un-ad-
mixed sources. When deciding which methods to use, it is important to remember that different degrees of uncertainty can be tolerated for
genotyping in forensics and aDNA. In the case of aDNA, more relaxed quality control metrics, and reliance on population-wide estimates
allows for usage of lower quality and quantity data compared to the forensic data where the goal is identification of individuals. The potential
implications of making an error are also significantly greater in forensics where genetic evidence is used in court cases. However, there is also
potential for aDNA to directly impact present-day people, for example aDNA evidence can be used by Native American tribes to gain U.S.
federal recognition.154

IMPUTATION
Imputation, or filling in missing genotype information, is a common procedure in both modern and ancient datasets that allows the inclusion
of additional genetic information based on patterns of linkage disequilibrium across the genome.155,156 Reference datasets provide informa-
tion on what alleles are more likely to be inherited together. Imputation methods often involve a phasing step where maternal and paternal
chromosomes are separated into haplotypes (for review, see the study by De Marino et al.157). Phasing often relies on using a reference data-
set, or related trios (parents and offspring), as short read sequencing is not informative on the background of alleles. Common uses for data
after imputation and phasing include identity-by-descent (IBD) calling, local ancestry inference, selection scans, demographic modeling, and
other analyses. When performing imputation, reference panels of worldwide populations are regularly used (such as the 1000 Genomes Refer-
ence panel146); however, in cases with large numbers of test samples, the use of a reference set may not be necessary.158

Forensic genetics
Imputation is just beginning to be explored for forensic applications with low-coverage data.48,159 Direct-to-consumer testing companies,
such as FamilyTree,160 and GEDmatch,161 have databases that have been used for IGG and typically type between 0.7 and 1.6 million
SNPs. Commercial forensic sequencing labs like Astrea Forensics use imputation in their pipeline for recovery of low-quality DNA for com-
parison to direct-to-consumer tests (Astrea Forensics, California, USA). Imputation has the potential to increase the ease of comparing DNA
profiles generated from these different platforms and also improve chances of identification from low-quality samples with partial DNA pro-
files by producing a more complete profile. It could even allow for matching between STR and SNP profiles.162 However, as the reference
panels used for imputation are predominantly derived from populations of European ancestry, questions have been raised about the accuracy
of using them for inferring SNP genotypes for individuals from underrepresented populations.163 A recent study evaluated (1) the accuracy of
two imputation programs (Beagle164 and Gencove) and (2) the impact of using currently available reference panels for samples from different
African populations.153 It was found that, at 4X coverage for the five African populations included in the study, 38% of common variants and
50% of rare variants could not be imputed, likely due to variance in genetic distance to the reference panels.153 Continued studies are
needed to explore potential biases introduced by available reference panels when applied to diverse populations for individualization
purposes.

Ancient DNA
Common imputation software used in aDNA studies includes Beagle,164 GeneImp,165 and GLIMPSE.166 Generally, imputation starts with pro-
ducing genotype likelihoods. These likelihoods are then used together with a panel of modern high-coverage populations to determine the
most likely genotypes based on linkage disequilibrium patterns observed in the reference data.167 The DNA coverage cutoff of 0.5X is often
used as the inclusion criterion for imputation, since samples with lower coverage are less likely to produce accurate genotype calls after impu-
tation.168–170 In the future, greater availability of high-coverage WGS ancient genomes may be able to overcome this issue by generating
curated high-quality reference datasets of ancient individuals only, and thus removing the need for modern reference datasets.
Although powerful, there are important limitations to consider when applying imputation to degraded DNA. These include damage pat-
terns, contamination, and if one is using WGS or capture data. WGS data allow for imputation of a greater number of positions in the genome
and are less prone to ascertainment bias. With capture data, it is not possible to detect new variation or private variation present in a pop-
ulation not used in the ascertainment. Using modern reference panels contains the same concerns as those outlined in the genotyping sec-
tion. In addition, due to the nature of the imputation procedure, the homozygous reference genotypes are more likely to get high imputation
scores, while homozygote alternative and heterozygous genotypes often have lower post-imputation accuracies, which can result in a refer-
ence bias after imputation.168 For ancient individuals, this limits the types of populations that can be successfully imputed. Despite this lim-
itation, a recent study found that with 1X coverage imputation can result in >99% genotype concordance with a minor allele frequency

8 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

threshold of 0.1171 with Beagle v4.0.172 The accuracy of imputation can be assessed by downsampling high-coverage ancient or modern DNA
data.170 When evaluating imputation accuracy, it is important to remember that aDNA coverage and damage are often non-randomly distrib-
uted, and thus a randomly downsampled genome does not necessarily represent the true low-coverage state.
A general consideration when using imputation both in forensics and aDNA is the quality of the reference dataset. Worldwide genetic
variation is not represented equally in publicly available reference datasets and thus comparison of imputed samples from various parts of
the world should be done with caution.173

DOWNSTREAM ANALYSES
General population genetics analyses
Ancestry-informative markers are commonly included in forensic sequencing kits with the aim of providing investigative leads and/or aiding in
improved accuracy for downstream population genetics analyses. The goal of such work is to evaluate the ancestry of an individual. Although
conceptually different, the term ancestry is often used interchangeably with race and ethnicity in various fields, including medical genetics,
pharmacogenetics, forensics, and others.174,175 Thus, guidelines around the communication of this information to law enforcement that
decouple genetic ancestry from race and ethnicity, particularly as categories differ country by country and through time, is an area of active
discussion and attention in the field due to concerns around interpretations and dissemination of results.176,177
Principal component analysis (PCA) and ADMIXTURE178 or STRUCTURE179 analyses are among the most common population genetics
methods that are performed on modern and aDNA to better understand the population structure and broader genetic affinity among indi-
viduals and populations. Projection PCA, wherein ancient samples are projected upon modern genetic variation, is commonly used to over-
come the low coverage and the presence of damage patterns in aDNA, and the SmartPCA software from EIGENSOFT v7.2.1 (http://www.
hsph.harvard.edu/alkes-price/software/) is often used for this purpose.151 Briefly, ancient samples are merged with modern worldwide pop-
ulations, and a set of populations is chosen as a reference set upon which all other samples are projected. Multidimensional scaling (MDS) is
another dimension-reduction method that has been used to assess broad relationships between sets of individuals in a hypothesis-free
manner, similar to PCA. Ancient individuals with as low as 1% endogenous DNA have been assigned correctly to their geographic origin using
MDS.180 It is important to note that population genetics analyses like PCA and MDS are most informative when used in concert with other
methods like F-statistics, ADMIXTURE, and reference population sets. ADMIXTURE178 analysis is used to cluster individuals based on their
genetic ancestry. In an ADMIXTURE analysis, ancient samples can be analyzed using a modern reference, or without one if a sufficiently large
number of individuals from relevant population sources are available. Based on the admixture analysis and PCA, genetic ancestry and admix-
ture components are often formally tested using qpAdm and F-statistics.151,181

Biological relatedness
The identification of relatives is of interest in both forensic and aDNA studies. In forensic human identification casework, in the absence of a
direct match in available databases, kinship analysis with STRs (familial searching) has been used to identify potential perpetrators of violent
crimes, unknown remains, or determine paternity. False negative and false positive rates of this analysis have been evaluated and found to be
impacted by likelihood ratio cutoffs use, the type of relationship in question, and the individual’s genetic ancestry.182–184 One of the
commonly used software for this analysis, Familias,185,186 uses allele sharing between genetic relatives that are identical by descent (IBD)
to identify first- to second-degree relatives.
To identify more distant genetic relatives, IGG utilizes SNPs to either calculate total shared segments of DNA (measured in centiMorgans,
cM) or by computing kinship coefficients based on allele sharing and pairwise differences. There are several informative, in-depth reviews and
studies on this approach and its application in forensic settings.187–190 For the first category, a threshold is used to differentiate between seg-
ments that are identical by state and IBD.189 The total number of shared cM used for IGG relationship estimates is largely based on tests using
European populations for relatively close familial relationships (1st to 3rd degree). The frequency with which more distant genetic relatives are
misidentified as closer genetic relatives across various population backgrounds based on the number of shared cM is unknown. The second
category relies on reference panels to determine allele frequencies and to control for potential population substructure. This approach is
commonly used in the medical genetics and aDNA fields (as described in the following section) and has the benefit of working on smaller
subsets of SNPs, as it is not dependent on identifying IBD tracks.189 Another category estimates IBD tracks between individuals and therefore
does not rely on allelic frequencies.191–193 Verogen has recently leveraged the likelihood approach to develop their ForenSeq Kintelligence
kit, which contains 10,230 SNPs identified as being maximally informative for identifying genetic relatives.194 The limitations of these
methods are still being explored. A recent study found that while incomplete SNP profiles (>50%) had a minimal impact on relative identi-
fication, 1%–5% of genotyping error resulted in reduced accuracy for the segment-based identification methods often used in IGG.195
The certainty of individual identification in forensic casework is often based on a likelihood ratio that expresses how likely a certain DNA
profile would be observed from the individual in question versus from a random person in a specific population. This calculation utilizes allele
frequency data from select populations to determine likelihood ratios per population. SWGDAM guidelines indicate how to report and
describe these ratios.196
Methods used to assess biological relatedness among aDNA samples include pairwise mismatch rate (PMR), ancIBD,197 READ,198 and
lcMLkin.199 PMR, lcMLkin, and READ provide pairwise relatedness information, while ancIBD197 can be used to find links between more distant
relatives based on the IBD sharing. PMR and READ can be used on genotype calls, while lcMLkin relies on genotype likelihoods, and ancIBD is
based on imputed and phased data. Most methods that are used to determine biological relatedness in aDNA data are only able to

iScience 26, 108066, November 17, 2023 9


ll iScience
OPEN ACCESS Perspective

determine biological kinship up to a second degree. Some, such as PMR and READ, do not separate between parent-offspring (PO) and full
siblings (FS), while others, lcMLkin and ancIBD, can be used to differentiate PO from FS, and identify more distant relatives.
More recent methods that have been developed to determine biological relatedness among individuals rely on imputed and phased data.
One of these methods is ancIBD, which assesses pairwise haplotype sharing in a set of samples. IBD-based methods can generally differen-
tiate between PO and FS and identify more distant relatedness, such as 4-5th degree, as well as avuncular relatedness in some cases. Gener-
ally, a combination of several methods to estimate biological relatedness is used. Additionally, uniparental marker data (Y- and mtDNA hap-
lotypes), age at death, and archaeological context are used when building family trees. There are important caveats and limitations to
consider when estimating relatedness, such as consanguineous relationships, increased background relatedness due to a population bottle-
neck, and sample coverage (lower coverage may increase relatedness). Determining mtDNA haplotypes and heteroplasmy for low-quality
data, including deconvoluting mixtures, is another area of overlap between aDNA and forensic genetics.40,149,200–203

Admixture and genetic introgression


F-statistics
F-statistics are a commonly used suite of methods in aDNA to test for various scenarios of admixture and population relationships.151,204 In
forensics, F-statistics have been used for quality control and STR evaluation and analysis,205,206 as well as the analysis of ancestry-informative
SNPs.207 These methods are based on either two-, three-, or four-population comparisons, and are called, respectively, f2, f3, and f4.151 The
f2-statistic determines the difference in allele frequencies between two populations. In comparison, f3 is a three-population test typically rep-
resented as f3(A,B; C) where each A, B, and C are a different population or individual. Depending on the configuration, it can be used to test if
population C can be modeled as an admixture of the populations A and B or it can test for shared drift between populations A and B
compared to the outgroup (C). In the case of admixture f3, the statistic is expected to be negative, while in the case of the outgroup f3 it
is expected to be positive. Adding another population, an f4-statistic, similar to the D-statistic,208 which is also known as the ABBA-BABA
statistic that has been developed to test for admixture in closely related populations, can be used to test for admixture and tree-ness using
the formula f4(A,B; C,D) = (a-b)(c-d), where a, b, c, and d are allele frequencies in populations A, B, C, and D. When there is no additional
admixture between A and B, and C and D, the statistic would be non-significant. As mentioned in the contamination section, f4 can also
be used to identify presence of contamination within a dataset. The f-statistics calculation has been implemented in the software
ADMIXTOOLS151 and an R-package admixr,209 as well as treemix,210 each of which have primers.
Another admixture modeling method qpAdm relies on the basic idea of the f4-statistic.4 The main difference lies in the ability of qpAdm to
estimate the admixture proportions in the target population. The limitation of qpAdm is the need for the test of reference populations/in-
dividuals used as a tree scaffold to understand the relationship between the potential sources and the target.211 Choosing the outgroups
correctly then becomes crucial for being able to disentangle how the source populations are related to the target. The use and limitations
of qpAdm have been recently described.211

PUBLIC DATABASES
Both aDNA and forensic genetics use databases of modern populations to explore wider population genetics signatures and inform field-
specific questions. In addition, in the aDNA field, the publication of genetic data from ancient individuals is commonplace. Databases are
important resources that must be treated with care in relation to quality and accuracy of contributed data as well as considerations of privacy
and respect for the individuals who have contributed their data. There have been vigorous discussions on best practices in both fields that
include definitions of informed consent, acknowledgment of power dynamics, and potential implications of these databases on descendants
and descendant communities, or genetic relatives of individuals in these databases.88,89,212–216
Within the forensics field, databases are typically used in three different ways. One is to provide references for determining haplogroup
information for uniparental markers as exemplified by EMPOP (European DNA Profiling mtDNA Population Database)217 and YHRD (Y Chro-
mosome Haplotype Reference Database).218 A second is to provide allele frequency information for different loci in order to aid in calculations
for determining the certainty of an identification. These frequencies are also published for common forensic markers.219,220 The third is to
provide searchable DNA profiles for identifying remains of missing people or providing leads to identify potential persons of interest in crim-
inal cases. As of December 2022, EMPOP contains over 48,000 mitochondrial haplotypes that cover at least the hypervariable I region (over
4,200 complete mitochondrial genomes). This database has clear guidelines around nomenclature221 and quality control217,222 for the upload
of new profiles. YHRD contains more than 350,000 Y-STR profiles and aims to provide accurate allele frequencies for Y chromosome STRs,
although it contains Y-SNP data as well.223 Databases that provide autosomal frequency data vary by country and are typically divided based
on either genetic ancestry or race, although there are ongoing debates as to the use (and accuracy) of these divisions.224
Laws and access to databases for identifying individuals vary by country, but are typically well defined and stringent.225 The inclusion of
individuals in these databases also varies greatly and is an ongoing area of debate with criteria for inclusion ranging from citizenship,226
arrestee status, and category of criminal offense, and the biases of these databases due to racial and socioeconomic disparities.227,228 Inter-
national database sharing among law enforcement also presents logistical and ethical questions, with international agencies such as
INTERPOL creating protected measures for DNA profile searching.229,230 Outside of law enforcement, there are selected public databases
of genetic data that law enforcement can search depending on the scope of the case in question. For example, the genetic profiles of the 1.4
million users of the genetic genealogical database GEDmatch are accessible for searching for missing person identifications if users make
their profile public, but users can decide if their genetic data can be also used by law enforcement for searches related to violent crimes.231

10 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

However, there have been serious concerns that the profiles of GEDmatch users who opted out of sharing their genetic profiles with the law
enforcement were still accessible to the police.232,233 Due to the ability of genetic genealogical analysis to identify deep family connections,234
investigators have estimated that a database similar to GEDmatch could be used to identify over 60% of individuals of European descent in
the United States,235 raising significant concerns about privacy.
There are multiple publicly available databases and repositories that are commonly used to house aDNA data, such as: the European
Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/), the Edmond Open Research Data Repository of the Max Planck Society
(MPS) (https://edmond.mpdl.mpg.de/), Allen Ancient DNA Resources (AADR),63 the Poseidon database (http://www.poseidon-adna.org/
#/), and others. AADR contains downloadable genotypes of ancient and present-day DNA data that cover various panels with the 1.2 million
SNPs described in the ‘‘sampling and laboratory work’’ section. The AADR database also includes rich annotation information for each sample
in the dataset, including the age of the sample, contamination metrics, coverage, sex, and geographic location, to name a few. The Edmond
database can be used by MPS members to upload any files associated with their publications, including genotypes. Another resource is the
Allen Ancient Genome Diversity Project/John Templeton Ancient DNA Atlas containing medium- and high-coverage shotgun sequencing
data for 216 individuals (https://reich.hms.harvard.edu/ancient-genome-diversity-project). The authors ask the community to observe the
Fort Lauderdale principles entitling the authors to be the first to present and publish the dataset. The Poseidon (http://www.poseidon-
adna.org/#/) framework features a decentralized repository of genotyping and sequence data. Poseidon is managed through the efforts
of the Department of Archaeogenetics of the Max Planck Institute for Evolutionary Anthropology. However, individual researchers are encour-
aged to submit packages of their genotyping data, with annotation files, and links to the BAM and FASTQ files. Finally, most aDNA studies
published make the data available via the raw FASTQ files and/or BAM files aligned to the reference genome. ENA accession numbers can
usually be found within the publication ‘‘Data Availability’’ statement.

STUDY DESIGN
Power analysis
One very important issue to consider when designing a study is the number of samples and/or individuals necessary to answer certain
questions. This is relevant for both fields, as genetic analysis of skeletal remains typically requires destructive sampling.236 Different cul-
tures and communities can have various points of view as to the implications of destruction of remains that should be weighed and
discussed as part of a study’s initial design.236 In scenarios where destructive sampling is not a concern for cultural reasons, it can still
result in the destruction of certain morphological features, such as teeth or petrous bone. Thus, a potential benefit of a large sample
size has to be weighed against the consequences of this type of analysis. Power analysis is a way to assess the necessary sample size for
certain research questions. For aDNA studies, this can refer either to the number of individuals, geographic locations, or number of loci
covered. Demographic reconstructions,237 studies of natural selection,238 and single-locus analyses239 are examples of types of ques-
tions where a sufficient number of individuals and/or loci covered are necessary. For forensic casework related to HID, power analysis is
still relevant for determining how much data are needed to reach conclusions around an identification. In addition, methodological and
validation studies that seek to evaluate different steps within the wet and dry lab workflows should consider power analyses as part of
the study design. It is also important to consider confounding factors, for example, how to isolate evaluations of the accuracy of gen-
otyping and imputation methods. For methodological development and evaluation of bioinformatic tools, simulations are a powerful
and necessary component.

SUMMARY AND OUTLOOK


The development of new methods has enabled researchers to trace ancient populations and solve decades-old cold cases. Each year seems
to bring a new study that pushes the limits as to what was previously thought possible for genetic research from degraded DNA. In addition,
the number of aDNA labs and forensic genetic labs integrating DNA sequencing continues to increase, expanding the size of both fields and
number of people performing these types of work. We hope that this review serves as both a primer for those new to working with sequencing
data from degraded DNA and a marker for the current guidelines and limitations of different types of analyses. Due to these fluctuations, we
encourage new and old members of the field to join and stay active in international and regional field-specific organizations such as the In-
ternational Society for Biomolecular Anthropology, American Association of Anthropological Genetics, ISFG (which has language-specific
working groups), SWGDAM, and ENSFI. In scenarios of no existing studies on the limitations of certain methods for working with incomplete
data with signs of degradation and potential contamination commonly seen in forensics and aDNA fields, we also recommend performing
simulations and/or downsampling tests in order to understand the power of any resulting associations from lower quality data. While this re-
view has been limited to describing human genetic analysis from skeletal remains, these same methodological advancements have opened
up new areas of study including ancient pathogens and sediment DNA that face similar (and added) challenges. As exploration into these new
areas continues, we hope this review also serves to highlight the overlaps between forensics and ancient genetics and motivates future col-
laborations between these disciplines.

SUPPLEMENTAL INFORMATION
Supplemental information can be found online at https://doi.org/10.1016/j.isci.2023.108066.

iScience 26, 108066, November 17, 2023 11


ll iScience
OPEN ACCESS Perspective

ACKNOWLEDGMENTS
We would like to thank Wolfgang Haak, Mateja Hajdinjak, Thiseas Lamnidis, Charla Marshall, Priya Moorjani, Rori Rohlfs, and Irina Velsko for
feedback and helpful conversations. Illustrations for Figures 2 and 3 were created by Petra Korlevic. E.I.Z. would like to thank the Miller Insti-
tute for Basic Research in Science, University of California Berkeley for funding her work on this project.

AUTHOR CONTRIBUTIONS
A.C. and E.I.Z.: Conceptualization, writing – original draft preparation, and review & editing.

DECLARATION OF INTERESTS
The authors declare no competing interests.

Box 1. Glossary
Library (next-generation sequencing (NGS)): A collection of DNA fragments from an organism with synthetic DNA attached that allow them to be sequenced
and identified.
Library preparation: The addition of adapters to DNA molecules to allow them to adhere onto the beads, flow cells, or chips used in NGS. This step often
includes adding indices for identifying which DNA sequences are from the same library. Double-stranded (ds) DNA library preparation only uses dsDNA as a
template. Concern of the impact of losing single-stranded (ss) DNA fragments depends on the extent of DNA degradation. SsDNA library preparation starts
with a denaturation step and uses ssDNA as a template so theoretically all DNA molecules are converted into libraries. This is useful for degraded DNA
applications, but due to the time and consumable costs of existing protocols, it is less widely used. Uracil-DNA glycosylase (UDG) treatment removes uracil
residues from DNA, which are often observed in degraded DNA as a result of the deamination of cytosines.
Adapter sequences: Short DNA fragments (oligos) that are used in library preparation to prepare DNA molecules for sequencing. For example, in the case
of Illumina sequencing, adapter sequences are essential for binding to and generating clusters on the flow cell.
Reads: Sequences of base pairs corresponding to a fragment of DNA generated through sequencing.
Demultiplexing: Splitting reads from different libraries that were sequenced together into separate files per library. Demultiplexing is done based on the
barcodes or indices (unique sequences) attached to the DNA molecules in each library on the sequencing run.
Enrichment: The process of targeting and amplifying certain parts of the genome for downstream sequencing. Enrichment allows us to focus on a specific
organism, gene, set of SNPs, etc., based on the design of capture that is used for this purpose.
PCR duplicates: Identical reads that belong to the same PCR clone/template. These are typically identified based on start and/or end coordinates.
Human identification (HID): In this paper, we define HID casework to encompass unknown human skeletal remains from cold cases, disaster victim iden-
tification (DVI), and historical identification cases.
Investigative genetic genealogy (IGG): The identification of an individual through distant relatives (typically 2-4th degree genetic relatives) by using SNP
profiles that are uploaded to large databases. The DNA profile is used to identify the genetic relative and then genealogists use other information (census
records, obituaries, etc.) to trace through family trees and identify the unknown individual.
Pileup: The alignment of all filtered reads to a reference sequence.
Ascertainment bias: Bias that results from the non-random selection of loci that are not representative of the full genetic diversity. A common source of
ascertainment bias lies in the selection of the SNPs to include on a certain SNP array. For this reason, arrays are not capable of detecting new variation or
private variation present in a population not used in the ascertainment.
Local ancestry inference: Decomposition of chromosomes into ancestral chunks in admixed populations.

SUPPORTING CITATIONS
The following references appear in the supplemental information: 56–58,91–95,97,98,101–104,108,110–113,142–145,147,151,164–166,178,180,191,193,197–199,209,210,240–253.

REFERENCES
1. Kayser, M. (2015). Forensic DNA G., Nordenfelt, S., Harney, E., Stewardson, practices in an Early Neolithic tomb. Nature
Phenotyping: Predicting human K., et al. (2015). Massive migration from the 601, 584–587.
appearance from crime scene material for steppe was a source for Indo-European 7. Ning, C., Zhang, F., Cao, Y., Qin, L., Hudson,
investigative purposes. Forensic Sci. Int. languages in Europe. Nature 522, 207–211. M.J., Gao, S., Ma, P., Li, W., Zhu, S., Li, C.,
Genet. 18, 33–48. 5. Patterson, N., Isakov, M., Booth, T., Büster, et al. (2021). Ancient genome analyses shed
2. Phillips, C. (2015). Forensic genetic analysis L., Fischer, C.-E., Olalde, I., Ringbauer, H., light on kinship organization and mating
of bio-geographical ancestry. Forensic Sci. Akbari, A., Cheronet, O., Bleasdale, M., practice of Late Neolithic society in China.
Int. Genet. 18, 49–65. et al. (2022). Large-scale migration into iScience 24, 103352.
3. Ge, J., and Budowle, B. (2021). Forensic Britain during the Middle to Late Bronze 8. Mathieson, I., Lazaridis, I., Rohland, N.,
investigation approaches of searching Age. Nature 601, 588–594. Mallick, S., Patterson, N., Roodenberg, S.A.,
relatives in DNA databases. J. Forensic Sci. 6. Fowler, C., Olalde, I., Cummings, V., Armit, Harney, E., Stewardson, K., Fernandes, D.,
66, 430–443. I., Büster, L., Cuthbert, S., Rohland, N., Novak, M., et al. (2015). Genome-wide
4. Haak, W., Lazaridis, I., Patterson, N., Cheronet, O., Pinhasi, R., and Reich, D. patterns of selection in 230 ancient
Rohland, N., Mallick, S., Llamas, B., Brandt, (2022). A high-resolution picture of kinship Eurasians. Nature 528, 499–503.

12 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

9. Meyer, M., Kircher, M., Gansauge, M.-T., Li, 23. Hochmeister, M.N., Budowle, B., Jung, J., 38. Just, R.S., Irwin, J.A., and Parson, W. (2015).
H., Racimo, F., Mallick, S., Schraiber, J.G., Borer, U.V., Comey, C.T., and Dirnhofer, R. Mitochondrial DNA heteroplasmy in the
Jay, F., Prüfer, K., de Filippo, C., et al. (2012). (1991). PCR-based typing of DNA extracted emerging field of massively parallel
A high-coverage genome sequence from an from cigarette butts. Int. J. Leg. Med. 104, sequencing. Forensic Sci. Int. Genet. 18,
archaic Denisovan individual. Science 338, 229–233. 131–139.
222–226. 24. Hochmeister, M.N., Budowle, B., Borer, 39. Alonso, A., Barrio, P.A., Müller, P., Köcher,
10. Prüfer, K., Racimo, F., Patterson, N., Jay, F., U.V., Eggmann, U., Comey, C.T., and S., Berger, B., Martin, P., Bodner, M.,
Sankararaman, S., Sawyer, S., Heinze, A., Dirnhofer, R. (1991). Typing of Willuweit, S., Parson, W., Roewer, L., and
Renaud, G., Sudmant, P.H., de Filippo, C., deoxyribonucleic acid (DNA) extracted from Budowle, B. (2018). Current state-of-art of
et al. (2014). The complete genome compact bone from human remains. STR sequencing in forensic genetics.
sequence of a Neanderthal from the Altai J. Forensic Sci. 36, 1649–1661. Electrophoresis 39, 2655–2668.
Mountains. Nature 505, 43–49. 25. Dabney, J., Knapp, M., Glocke, I., 40. Marshall, C., and Parson, W. (2021).
11. Reich, D., Green, R.E., Kircher, M., Krause, J., Gansauge, M.-T., Weihmann, A., Nickel, B., Interpreting NUMTs in forensic genetics:
Patterson, N., Durand, E.Y., Viola, B., Briggs, Valdiosera, C., Garcı́a, N., Pääbo, S., Seeing the forest for the trees. Forensic Sci.
A.W., Stenzel, U., Johnson, P.L.F., et al. Arsuaga, J.-L., and Meyer, M. (2013). Int. Genet. 53, 102497.
(2010). Genetic history of an archaic hominin Complete mitochondrial genome sequence 41. Budowle, B., Schmedes, S.E., and Wendt,
group from Denisova Cave in Siberia. of a Middle Pleistocene cave bear F.R. (2017). Increasing the reach of forensic
Nature 468, 1053–1060. reconstructed from ultrashort DNA genetics with massively parallel sequencing.
12. Marshall, C., Sturk-Andreaggi, K., Daniels- fragments. Proc. Natl. Acad. Sci. USA 110, Forensic Sci. Med. Pathol. 13, 342–349.
Higginbotham, J., Oliver, R.S., Barritt-Ross, 15758–15763. 42. Børsting, C., and Morling, N. (2015). Next
S., and McMahon, T.P. (2017). Performance 26. Rohland, N., Glocke, I., Aximu-Petri, A., and generation sequencing and its applications
evaluation of a mitogenome capture and Meyer, M. (2018). Extraction of highly in forensic genetics. Forensic Sci. Int. Genet.
Illumina sequencing protocol using non- degraded DNA from ancient bones, teeth 18, 78–89.
probative, case-type skeletal samples: and sediments for high-throughput 43. Parsons, T.J., Huel, R.M.L., Bajunovic, Z., and
Implications for the use of a positive control sequencing. Nat. Protoc. 13, 2447–2461. Rizvic, A. (2019). Large scale DNA
in a next-generation sequencing procedure. 27. Damgaard, P.B., Margaryan, A., Schroeder, identification: The ICMP experience.
Forensic Sci. Int. Genet. 31, 198–206. https:// H., Orlando, L., Willerslev, E., and Allentoft, Forensic Sci. Int. Genet. 38, 236–244.
doi.org/10.1016/j.fsigen.2017.09.001. M.E. (2015). Improving access to 44. Gorden, E.M., Sturk-Andreaggi, K., and
13. Ambers, A., Bus, M.M., King, J.L., Jones, B., endogenous DNA in ancient bones and Marshall, C. (2021). Capture enrichment and
Durst, J., Bruseth, J.E., Gill-King, H., and teeth. Sci. Rep. 5, 11184. massively parallel sequencing for human
Budowle, B. (2020). Forensic genetic 28. Gamba, C., Hanghøj, K., Gaunitz, C., identification. Forensic Sci. Int. Genet. 53,
investigation of human skeletal remains Alfarhan, A.H., Alquraishi, S.A., Al-Rasheid, 102496.
recovered from the La Belle shipwreck. K.A.S., Bradley, D.G., and Orlando, L. (2016). 45. Gorden, E.M., Greytak, E.M., Sturk-
Forensic Sci. Int. 306, 110050. Comparing the performance of three Andreaggi, K., Cady, J., McMahon, T.P.,
14. Zavala, E.I., Thomas, J.T., Sturk-Andreaggi, ancient DNA extraction methods for high- Armentrout, S., and Marshall, C. (2022).
K., Daniels-Higginbotham, J., Meyers, K.K., throughput sequencing. Mol. Ecol. Resour. Extended kinship analysis of historical
Barrit-Ross, S., Aximu-Petri, A., Richter, J., 16, 459–469. https://doi.org/10.1111/1755- remains using SNP capture. Forensic Sci. Int.
Nickel, B., Berg, G.E., et al. (2022). Ancient 0998.12470. Genet. 57, 102636.
DNA Methods Improve Forensic DNA 29. Zavala, E.I., Rajagopal, S., Perry, G.H., 46. Sanchez, J.J., and Endicott, P. (2006).
 Parsons, T.J., and
Kruzic, I., Basic, Z., Developing multiplexed SNP assays with
Profiling of Korean War and World War II
Unknowns. Genes 13, 129. https://doi.org/ Holland, M.M. (2019). Impact of DNA special reference to degraded DNA
10.3390/genes13010129. degradation on massively parallel templates. Nat. Protoc. 1, 1370–1378.
sequencing-based autosomal STR, iiSNP, 47. Quintáns, B., Alvarez-Iglesias, V., Salas, A.,
15. Hofreiter, M., Sneberger, J., Pospisek, M.,
and mitochondrial DNA typing systems. Int. Phillips, C., Lareu, M.V., and Carracedo, A.
and Vanek, D. (2021). Progress in forensic
J. Leg. Med. 133, 1369–1380. (2004). Typing of mitochondrial DNA coding
bone DNA analysis: Lessons learned from
30. Kemp, B.M., and Smith, D.G. (2005). Use of region SNPs of forensic and anthropological
ancient DNA. Forensic Sci. Int. Genet. 54,
bleach to eliminate contaminating DNA interest using SNaPshot minisequencing.
102538.
from the surface of bones and teeth. Forensic Sci. Int. 140, 251–257.
16. Capelli, C., Tschentscher, F., and Pascali,
Forensic Sci. Int. 154, 53–61. 48. Tillmar, A., Fagerholm, S.A., Staaf, J.,
V.L. (2003). ‘‘Ancient’’ protocols for the crime
31. Korlevic, P., and Meyer, M. (2019). Sjölund, P., and Ansell, R. (2021). Getting the
scene?: Similarities and differences between
Pretreatment: Removing DNA conclusive lead with investigative genetic
forensic genetics and ancient DNA analysis.
Contamination from Ancient Bones and genealogy – A successful case study of a 16
Forensic Sci. Int. 131, 59–64.
Teeth Using Sodium Hypochlorite and year old double murder in Sweden. Forensic
17. Briggs, A.W., Stenzel, U., Johnson, P.L.F., Phosphate. Methods Mol. Biol. 15–19. Sci. Int. Genet. 53, 102525. https://doi.org/
Green, R.E., Kelso, J., Prüfer, K., Meyer, M., 32. Hajdinjak, M., Fu, Q., Hübner, A., Petr, M., 10.1016/j.fsigen.2021.102525.
Krause, J., Ronan, M.T., Lachmann, M., and Mafessoni, F., Grote, S., Skoglund, P., 49. Peck, M.A., Koeppel, A.F., Gorden, E.M.,
Pääbo, S. (2007). Patterns of damage in Narasimham, V., Rougier, H., Crevecoeur, I., Bouchet, J.L., Heaton, M.C., Russell, D.A.,
genomic DNA sequences from a et al. (2018). Reconstructing the genetic Reedy, C.R., Neal, C.M., and Turner, S.D.
Neandertal. Proc. Natl. Acad. Sci. USA 104, history of late Neanderthals. Nature 555, (2022). Internal Validation of the ForenSeq
14616–14621. 652–656. Kintelligence Kit for Application to Forensic
18. Lindahl, T. (1993). Instability and decay of 33. Velsko, I., Skourtanioti, E., and Brandt, G. Genetic Genealogy. Preprint at bioRxiv.
the primary structure of DNA. Nature 362, (2020). Ancient DNA Extraction from https://doi.org/10.1101/2022.10.28.514056.
709–715. Skeletal Material V1. https://doi.org/10. 50. Yates, J.A.F., Aron, F., Neumann, G.U.,
19. Hebsgaard, M.B., Phillips, M.J., and 17504/protocols.io.baksicwe. Velsko, I., Skourtanioti, E., Orfanou, E.,
Willerslev, E. (2005). Geologically ancient 34. Fulton, T.L., and Shapiro, B. (2019). Setting Fagernas, Z., et al. (2020). A–Z of Ancient
DNA: fact or artefact? Trends Microbiol. 13, Up an Ancient DNA Laboratory. Methods DNA Protocols for Shotgun Illumina Next
212–220. Mol. Biol. 1963, 1–13. Generation Sequencing.
20. Pääbo, S., Poinar, H., Serre, D., Jaenicke- 35. Scientific Working Group on DNA Analysis 51. Stahl, R., Warinner, C., Velsko, I., Orfanou,
Despres, V., Hebler, J., Rohland, N., Kuch, Methods (SWGDAM) (2017). Contamination E., Aron, F., and Brandt, G. Illumina Double-
M., Krause, J., Vigilant, L., and Hofreiter, M. Prevention and Detection Guidelines for Stranded DNA Dual Indexing for Ancient
(2004). Genetic analyses from ancient DNA. Forensic DNA Laboratories. DNA V2. 10.17504/protocols.io.bvt8n6rw
Annu. Rev. Genet. 38, 645–679. 36. Higuchi, R., Bowman, B., Freiberger, M., 52. Aron, F., Neumann, G.U., and Brandt, G.
21. Hagelberg, E., Sykes, B., and Hedges, R. Ryder, O.A., and Wilson, A.C. (1984). DNA (2022). Half-UDG treated double-stranded
(1989). Ancient bone DNA amplified. Nature sequences from the quagga, an extinct ancient DNA library preparation for Illumina
342, 485. member of the horse family. Nature 312, sequencing v1. https://doi.org/10.17504/
22. Pääbo, S., Gifford, J.A., and Wilson, A.C. 282–284. protocols.io.bmh6k39e.
(1988). Mitochondrial DNA sequences from 37. Pääbo, S. (1985). Molecular cloning of 53. Gansauge, M.-T., Aximu-Petri, A., Nagel, S.,
a 7000-year old brain. Nucleic Acids Res. 16, Ancient Egyptian mummy DNA. Nature 314, and Meyer, M. (2020). Manual and
9775–9787. 644–645. automated preparation of single-stranded

iScience 26, 108066, November 17, 2023 13


ll iScience
OPEN ACCESS Perspective

DNA libraries for the sequencing of DNA ossicles as an alternative optimal source of mtDNA of Different Age and Origin. Genes
from ancient biological remains and other ancient DNA. Genome Res. 30, 427–436. 8, 237. https://doi.org/10.3390/
sources of highly degraded DNA. Nat. 69. Harney, É., Cheronet, O., Fernandes, D.M., genes8100237.
Protoc. 15, 2279–2300. Sirak, K., Mah, M., Bernardos, R., Adamski, 83. Tillmar, A., Sturk-Andreaggi, K., Daniels-
54. Llamas, B., Valverde, G., Fehren-Schmitz, L., N., Broomandkhoshbacht, N., Callan, K., Higginbotham, J., Thomas, J.T., and
Weyrich, L.S., Cooper, A., and Haak, W. Lawson, A.M., et al. (2021). A minimally Marshall, C. (2021). The FORCE Panel: An
(2017). From the field to the laboratory: destructive protocol for DNA extraction All-in-One SNP Marker Set for Confirming
Controlling DNA contamination in human from ancient teeth. Genome Res. 31, Investigative Genetic Genealogy Leads and
ancient DNA research in the high- 472–483. for General Forensic Applications. Genes
throughput sequencing era. STAR: Sci. 70. Parker, C.E., Bos, K.I., Haak, W., and Krause, 12, 1968. https://doi.org/10.3390/
Technol. Archaeol. Res. 3, 1–14. J. (2021). Optimized Bone Sampling genes12121968.
55. Orlando, L., Allaby, R., Skoglund, P., Der Protocols for the Retrieval of Ancient DNA 84. Schneider, P.M. (1997). Basic issues in
Sarkissian, C., Stockhammer, P.W., Ávila- from Archaeological Remains. J. Vis. Exp. forensic DNA typing. Forensic Sci. Int.
Arcos, M.C., Fu, Q., Krause, J., Willerslev, E., https://doi.org/10.3791/63250. 88, 17–22.
Stone, A.C., and Warinner, C. (2021). 71. Pinhasi, R., Fernandes, D., Sirak, K., Novak, 85. Fu, Q., Hajdinjak, M., Moldovan, O.T.,
Ancient DNA analysis. Nat. Rev. Methods M., Connell, S., Alpaslan-Roodenberg, S., Constantin, S., Mallick, S., Skoglund, P.,
Primers 1, 14–26. Gerritsen, F., Moiseyev, V., Gromov, A., Patterson, N., Rohland, N., Lazaridis, I.,
56. Fellows Yates, J.A., Lamnidis, T.C., Borry, Raczky, P., et al. (2015). Optimal Ancient Nickel, B., et al. (2015). An early modern
M., Andrades Valtueña, A., Fagernäs, Z., DNA Yields from the Inner Ear Part of the human from Romania with a recent
Clayton, S., Garcia, M.U., Neukamm, J., and Human Petrous Bone. PLoS One 10, Neanderthal ancestor. Nature 524, 216–219.
Peltzer, A. (2021). Reproducible, portable, e0129102. 86. MyBaits Expert Human Affinities Daicel
and efficient ancient genome reconstruction 72. Rohland, N., and Hofreiter, M. (2007). Arbor Biosciences. https://arborbiosci.com/
with nf-core/eager. PeerJ 9, e10947. Comparison and optimization of ancient genomics/targeted-sequencing/mybaits/
57. Schubert, M., Ermini, L., Der Sarkissian, C., DNA extraction. Biotechniques 42, 343–352. mybaits-expert/mybaits-expert-human-
Jónsson, H., Ginolhac, A., Schaefer, R., 73. Loreille, O.M., Diegoli, T.M., Irwin, J.A., affinities/.
Martin, M.D., Fernández, R., Kircher, M., Coble, M.D., and Parsons, T.J. (2007). High 87. Rohland, N., Mallick, S., Mah, M., Maier, R.,
McCue, M., et al. (2014). Characterization of efficiency DNA extraction from bone by Patterson, N., and Reich, D. (2022). Three
ancient and modern genomes by SNP total demineralization. Forensic Sci. Int. Reagents for In-Solution Enrichment of
detection and phylogenomic and Genet. 1, 191–195. Ancient Human DNA at More than a Million
metagenomic analysis using PALEOMIX. 74. Amory, S., Huel, R., Bilic, A., Loreille, O., and SNPs. Preprint at bioRxiv. https://doi.org/
Nat. Protoc. 9, 1056–1082. Parsons, T.J. (2012). Automatable full 10.1101/2022.01.13.476259.
58. Neuenschwander, S., Cruz Dávalos, D.I., demineralization DNA extraction procedure 88. Alpaslan-Roodenberg, S., Anthony, D.,
Anchieri, L., Sousa da Mota, B., Bozzi, D., from degraded skeletal remains. Forensic Babiker, H., Bánffy, E., Booth, T., Capone, P.,
Rubinacci, S., Delaneau, O., Rasmussen, S., Sci. Int. Genet. 6, 398–406. Deshpande-Mukherjee, A., Eisenmann, S.,
and Malaspinas, A.-S. (2023). Mapache: a 75. Meyer, M., and Kircher, M. (2010). Illumina Fehren-Schmitz, L., Frachetti, M., et al.
flexible pipeline to map ancient DNA. sequencing library preparation for highly (2021). Ethics of DNA research on human
Bioinformatics 39, btad028. https://doi.org/ multiplexed target capture and sequencing. remains: five globally applicable guidelines.
10.1093/bioinformatics/btad028. Cold Spring Harb. Protoc. 2010, prot5448. Nature 599, 41–46.
59. HOME Swgdam. https://www.swgdam. https://doi.org/10.1101/pdb.prot5448. 89. Kowal, E., Weyrich, L.S., Argüelles, J.M.,
org/. 76. Troll, C.J., Kapp, J., Rao, V., Harkins, K.M., Bader, A.C., Colwell, C., Cortez, A.D., Davis,
Cole, C., Naughton, C., Morgan, J.M., J.L., Figueiro, G., Fox, K., Malhi, R.S., et al.
60. ENFSI (2016). ENFSI | European Network of
Shapiro, B., and Green, R.E. (2019). A (2023). Community Partnerships Are
Forensic Science Institutes. https://enfsi.eu/.
ligation-based single-stranded library Fundamental to Ethical Ancient DNA
61. ISFG. https://www.isfg.org/.
preparation method to analyze cell-free Research. Hum. Genet. Genom. Adv.
62. Federal Bureau of Investigation (2020). DNA and synthetic oligos. BMC Genom. 100161. https://doi.org/10.1016/j.xhgg.
Quality Assurance Standards for Forensic 20, 1023. 2022.100161.
DNA Testing Laboratories. 77. Fortes, G.G., and Paijmans, J.L.A. (2015). 90. Parson, W., Huber, G., Moreno, L., Madel,
63. Mallick, S., Micco, A., Mah, M., Ringbauer, Analysis of Whole Mitogenomes from M.-B., Brandhagen, M.D., Nagl, S., Xavier,
H., Lazaridis, I., Olalde, I., Patterson, N., and Ancient Samples. Methods Mol. Biol. 1347, C., Eduardoff, M., Callaghan, T.C., and Irwin,
Reich, D. (2023). The Allen Ancient DNA 179–195. J.A. (2015). Massively parallel sequencing of
Resource (AADR): A Curated Compendium 78. Sproul, J.S., and Maddison, D.R. (2017). complete mitochondrial genomes from hair
of Ancient Human Genomes. Preprint at Sequencing historical specimens: successful shaft samples. Forensic Sci. Int. Genet.
bioRxiv. https://doi.org/10.1101/2023.04. preparation of small specimens with low 15, 8–15.
06.535797. amounts of degraded DNA. Mol. Ecol. 91. Sturk-Andreaggi, K., Peck, M.A., Boysen, C.,
64. Loreille, O., Tillmar, A., Brandhagen, M.D., Resour. 17, 1183–1201. Dekker, P., McMahon, T.P., and Marshall,
Otterstatter, L., and Irwin, J.A. (2022). 79. Rohland, N., Harney, E., Mallick, S., C.K. (2017). AQME: A forensic mitochondrial
Improved DNA Extraction and Illumina Nordenfelt, S., and Reich, D. (2015). Partial DNA analysis tool for next-generation
Sequencing of DNA Recovered from Aged uracil–DNA–glycosylase treatment for sequencing data. Forensic Sci. Int. Genet.
Rootless Hair Shafts Found in Relics screening of ancient DNA. Philos. Trans. R. 31, 189–197.
Associated with the Romanov Family. Genes Soc. Lond. B Biol. Sci. 370, 20130624. 92. Holland, M.M., Pack, E.D., and McElhoe,
13, 202. https://doi.org/10.3390/ 80. Burbano, H.A., Hodges, E., Green, R.E., J.A. (2017). Evaluation of GeneMarker HTS
genes13020202. Briggs, A.W., Krause, J., Meyer, M., Good, for improved alignment of mtDNA MPS
65. Brandhagen, M.D., Loreille, O., and Irwin, J.M., Maricic, T., Johnson, P.L.F., Xuan, Z., data, haplotype determination, and
J.A. (2018). Fragmented Nuclear DNA is the et al. (2010). Targeted investigation of the heteroplasmy assessment. Forensic Sci. Int.
Predominant Genetic Material in Human Neandertal genome by array-based Genet. 28, 90–98.
Hair Shafts. Genes 9, 640. https://doi.org/ sequence capture. Science 328, 723–725. 93. Jäger, A.C., Alvarez, M.L., Davis, C.P.,
10.3390/genes9120640. 81. Avila-Arcos, M.C., Cappellini, E., Romero- Guzmán, E., Han, Y., Way, L., Walichiewicz,
66. Gutierrez, R., LaRue, B., and Houston, R. Navarro, J.A., Wales, N., Moreno-Mayar, P., Silva, D., Pham, N., Caves, G., et al.
(2021). Novel extraction chemistry and J.V., Rasmussen, M., Fordyce, S.L., Montiel, (2017). Developmental validation of the
alternative amplification strategies for use R., Vielle-Calzada, J.-P., Willerslev, E., and MiSeq FGx Forensic Genomics System for
with rootless hair shafts. J. Forensic Sci. 66, Gilbert, M.T.P. (2011). Application and Targeted Next Generation Sequencing in
1929–1936. comparison of large-scale solution-based Forensic DNA Casework and Database
67. Harkins Kincaid, K. (2020). Solve Cold Cases DNA capture-enrichment methods on Laboratories. Forensic Sci. Int. Genet.
with DNA from Rootless Hair Using Genetic ancient DNA. Sci. Rep. 1, 74. 28, 52–70.
Genealogy (The ISHI Report). 82. Eduardoff, M., Xavier, C., Strobl, C., Casas- 94. Børsting, C., Fordyce, S.L., Olofsson, J.,
68. Sirak, K., Fernandes, D., Cheronet, O., Vargas, A., and Parson, W. (2017). Mogensen, H.S., and Morling, N. (2014).
Harney, E., Mah, M., Mallick, S., Rohland, N., Optimized mtDNA Control Region Primer Evaluation of the Ion Torrent HID SNP
Adamski, N., Broomandkhoshbacht, N., Extension Capture Analysis for Forensically 169-plex: A SNP typing assay developed for
Callan, K., et al. (2020). Human auditory Relevant Samples and Highly Compromised human identification by second generation

14 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

sequencing. Forensic Sci. Int. Genet. 12, mitochondrial consensus calling for ancient 128. Ewing, M.M., Thompson, J.M., McLaren,
144–154. DNA. Genome Biol. 16, 224. R.S., Purpero, V.M., Thomas, K.J.,
95. Seo, S.B., King, J.L., Warshauer, D.H., Davis, 113. Peyrégne, S., and Peter, B.M. (2020). Dobrowski, P.A., DeGroot, G.A., Romsos,
C.P., Ge, J., and Budowle, B. (2013). Single AuthentiCT: a model of ancient DNA E.L., and Storts, D.R. (2016). Human DNA
nucleotide polymorphism typing with damage to estimate the proportion of quantification and sample quality
massively parallel sequencing for human present-day DNA contamination. Genome assessment: Developmental validation of
identification. Int. J. Leg. Med. 127, Biol. 21, 246. the PowerQuant() system. Forensic Sci. Int.
1079–1086. 114. Nakatsuka, N., Harney, É., Mallick, S., Mah, Genet. 23, 166–177.
96. Martin, M. (2011). Cutadapt removes M., Patterson, N., and Reich, D. (2020). 129. Vernarecci, S., Ottaviani, E., Agostino, A.,
adapter sequences from high-throughput ContamLD: estimation of ancient nuclear Mei, E., Calandro, L., and Montagna, P.
sequencing reads. EMBnet. j. 17, 10–12. DNA contamination using breakdown of (2015). Quantifiler Trio Kit and forensic
97. Li, H. (2013). Aligning Sequence Reads, linkage disequilibrium. Genome Biol. samples management: a matter of
Clone Sequences and Assembly Contigs 21, 199. degradation. Forensic Sci. Int. Genet.
with BWA-MEM. Preprint at arXiv. https:// 115. Begg, T.J.A., Schmidt, A., Kocher, A., 16, 77–85.
doi.org/10.48550/arXiv.1303.3997. Larmuseau, M.H.D., Runfeldt, G., Maier, 130. Pineda, G.M., Montgomery, A.H.,
98. Langmead, B., Trapnell, C., Pop, M., and P.A., Wilson, J.D., Barquera, R., Maj, C., Thompson, R., Indest, B., Carroll, M., and
Salzberg, S.L. (2009). Ultrafast and memory- Szolek, A., et al. (2023). Genomic analyses of Sinha, S.K. (2014). Development and
efficient alignment of short DNA sequences hair from Ludwig van Beethoven. Curr. Biol. validation of InnoQuant, a sensitive human
to the human genome. Genome Biol. 33, 1431–1447.e22. DNA quantitation and degradation
10, R25. 116. Furtwängler, A., Reiter, E., Neumann, G.U., assessment method for forensic samples
99. Schubert, M., Ginolhac, A., Lindgreen, S., Siebke, I., Steuri, N., Hafner, A., Lösch, S., using high copy number mobile elements
Thompson, J.F., Al-Rasheid, K.A.S., Anthes, N., Schuenemann, V.J., and Krause, Alu and SVA. Forensic Sci. Int. Genet. 13,
Willerslev, E., Krogh, A., and Orlando, L. J. (2018). Ratio of mitochondrial to nuclear 224–235.
(2012). Improving ancient DNA read DNA affects contamination estimates in 131. Glocke, I., and Meyer, M. (2017). Extending
mapping against modern reference ancient DNA analysis. Sci. Rep. 8, 1–8. the spectrum of DNA sequences retrieved
genomes. BMC Genom. 13, 178. 117. Green, R.E., Briggs, A.W., Krause, J., Prüfer, from ancient bones and teeth. Genome Res.
100. Broad Institute (2019). ‘‘Picard Toolkit’’ K., Burbano, H.A., Siebauer, M., Lachmann, 27, 1230–1237.
(Broad Institute, GitHub Repository). M., and Pääbo, S. (2009). The Neandertal 132. SWGDAM (2017). SWGDAM Interpretation
101. Peltzer, A., Jäger, G., Herbig, A., Seitz, A., genome and ancient DNA authenticity. Guidelines for Autosomal STR Typing by
Kniep, C., Krause, J., and Nieselt, K. (2016). EMBO J. 28, 2494–2502. Forensic DNA Testing Laboratories.
EAGER: efficient ancient genome 118. Steinlechner, M., Berger, B., Niederstätter, 133. FBI (2020). Quality Assurance Standards for
reconstruction. Genome Biol. 17, 60. H., and Parson, W. (2002). Rare failures in the Forensic DNA Testing Laboratories.
102. Jónsson, H., Ginolhac, A., Schubert, M., amelogenin sex test. Int. J. Leg. Med. 116, 134. ENFSI (2022). Best Practice Manual for
Johnson, P.L.F., and Orlando, L. (2013). 117–120. Human Forensic Biology and DNA Profiling
mapDamage2.0: fast approximate Bayesian ENFSI-DNA-BPM-03.
119. Kao, L.-G., Tsai, L.-C., Lee, J.C., and Hsieh,
estimates of ancient DNA damage 135. ENFSI (2010). Recommended Minimum
H.-M. (2007). Controversial cases of human
parameters. Bioinformatics 29, 1682–1684. Criteria for the Validation of Various Aspects
gender identification by amelogenin test.
103. Neukamm, J., Peltzer, A., and Nieselt, K. of the DNA Profiling Process.
Forensic Sci. J 6, 69–71.
(2021). DamageProfiler: fast damage pattern 136. Nielsen, R., Paul, J.S., Albrechtsen, A., and
120. Drobnic, K. (2006). A new primer set in a SRY
calculation for ancient DNA. Bioinformatics Song, Y.S. (2011). Genotype and SNP calling
gene for sex identification. Int. Congr. Ser.
37, 3652–3653. from next-generation sequencing data. Nat.
1288, 268–270.
104. eager. Introduction. https://nf-co.re/eager/ Rev. Genet. 12, 443–451.
2.4.7. 121. Kayser, M. (2017). Forensic use of 137. Haned, H., Gill, P., Lohmueller, K., Inman, K.,
105. DNA Analysis Methods (SWGDAM) (2015). Y-chromosome DNA: a general overview. and Rudin, N. (2016). Validation of
Guidelines for the Validation of Probabilistic Hum. Genet. 136, 621–635. probabilistic genotyping software for use in
Genotyping Systems. 122. Skoglund, P., Storå, J., Götherström, A., and forensic DNA casework: Definitions and
106. SWGDAM (2021). Interpretation Guidelines Jakobsson, M. (2013). Accurate sex illustrations. Sci. Justice 56, 104–108.
for Autosomal STR Typing by Forensic DNA identification of ancient human remains 138. Bright, J.-A., Taylor, D., McGovern, C.,
Testing Laboratories. using DNA shotgun sequencing. Cooper, S., Russell, L., Abarno, D., and
107. Wilson, M.R., DiZinno, J.A., Polanskey, D., J. Archaeol. Sci. 40, 4477–4482. Buckleton, J. (2016). Developmental
Replogle, J., and Budowle, B. (1995). 123. Mittnik, A., Wang, C.-C., Svoboda, J., and validation of STRmix, expert software for
Validation of mitochondrial DNA Krause, J. (2016). A Molecular Approach to the interpretation of forensic DNA profiles.
sequencing for forensic casework analysis. the Sexing of the Triple Burial at the Upper Forensic Sci. Int. Genet. 23, 226–239. https://
Int. J. Leg. Med. 108, 68–74. Paleolithic Site of Dolnı́ Vestonice. PLoS doi.org/10.1016/j.fsigen.2016.05.007.
108. Korneliussen, T.S., Albrechtsen, A., and One 11, e0163019. 139. Nielsen, M.B., Andersen, M.M., Eriksen, P.S.,
Nielsen, R. (2014). ANGSD: Analysis of Next 124. Fu, Q., Posth, C., Hajdinjak, M., Petr, M., Mogensen, H.S., and Morling, N. (2022).
Generation Sequencing Data. BMC Bioinf. Mallick, S., Fernandes, D., Furtwängler, A., Probabilistic SNP genotyping at low DNA
15, 356. Haak, W., Meyer, M., Mittnik, A., et al. (2016). concentrations. Forensic Sci. Int.: Genet.
109. Huang, Y., and Ringbauer, H. (2022). The genetic history of Ice Age Europe. Suppl. Series 8, 151–152.
hapCon: Estimating Contamination of Nature 534, 200–205. 140. Hofmanová, Z., Kreutzer, S., Hellenthal, G.,
Ancient Genomes by Copying from 125. Meyer, M., Arsuaga, J.-L., de Filippo, C., Sell, C., Diekmann, Y., Dı́ez-Del-Molino, D.,
Reference Haplotypes. Bioinformatics 38, Nagel, S., Aximu-Petri, A., Nickel, B., van Dorp, L., López, S., Kousathanas, A.,
3768–3777. https://doi.org/10.1093/ Martı́nez, I., Gracia, A., Bermúdez de Castro, Link, V., et al. (2016). Early farmers from
bioinformatics/btac390. J.M., Carbonell, E., et al. (2016). Nuclear across Europe directly descended from
110. Fu, Q., Mittnik, A., Johnson, P.L.F., Bos, K., DNA sequences from the Middle Neolithic Aegeans. Proc. Natl. Acad. Sci.
Lari, M., Bollongino, R., Sun, C., Giemsch, L., Pleistocene Sima de los Huesos hominins. USA 113, 6886–6891.
Schmitz, R., Burger, J., et al. (2013). A revised Nature 531, 504–507. 141. Wang, J., Raskin, L., Samuels, D.C., Shyr, Y.,
timescale for human evolution based on 126. Moilanen, U., Kirkinen, T., Saari, N.-J., and Guo, Y. (2015). Genome measures used
ancient mitochondrial genomes. Curr. Biol. Rohrlach, A.B., Krause, J., Onkamo, P., and for quality control are dependent on gene
23, 553–559. Salmela, E. (2022). A woman with a sword?– function and ancestry. Bioinformatics 31,
111. Fu, Q., Li, H., Moorjani, P., Jay, F., weapon grave at Suontaka Vesitorninmäki, 318–323.
Slepchenko, S.M., Bondarev, A.A., Johnson, Finland. Eur. J. Archaeol. 25, 42–60. 142. Prüfer, K. (2018). snpAD: an ancient DNA
P.L.F., Aximu-Petri, A., Prüfer, K., de Filippo, 127. Roca-Rada, X., Tereso, S., Rohrlach, A.B., genotype caller. Bioinformatics 34,
C., et al. (2014). Genome sequence of a Brito, A., Williams, M.P., Umbelino, C., 4165–4171.
45,000-year-old modern human from Curate, F., Deveson, I.W., Souilmi, Y., 143. Link, V., Kousathanas, A., Veeramah, K., Sell,
western Siberia. Nature 514, 445–449. Amorim, A., et al. (2022). A 1000-year-old C., Scheu, A., and Wegmann, D. (2017).
112. Renaud, G., Slon, V., Duggan, A.T., and case of Klinefelter’s syndrome diagnosed by ATLAS: Analysis Tools for Low-Depth and
Kelso, J. (2015). Schmutzi: estimation of integrating morphology, osteology, and Ancient Samples. Preprint at bioRxiv.
contamination and endogenous genetics. Lancet 400, 691–692. https://doi.org/10.1101/105346.

iScience 26, 108066, November 17, 2023 15


ll iScience
OPEN ACCESS Perspective

144. Li, H. (2011). A statistical framework for SNP imputation software. PLoS One 17, haplotype clustering. Am. J. Hum. Genet.
calling, mutation discovery, association e0260177. 81, 1084–1097.
mapping and population genetical 158. Davies, R.W., Flint, J., Myers, S., and Mott, R. 173. Rubinacci, S., Hofmeister, R.J., Sousa da
parameter estimation from sequencing (2016). Rapid genotype imputation from Mota, B., and Delaneau, O. (2023).
data. Bioinformatics 27, 2987–2993. sequence without reference panels. Nat. Imputation of low-coverage sequencing
145. McKenna, A., Hanna, M., Banks, E., Genet. 48, 965–969. data from 150,119 UK Biobank genomes.
Sivachenko, A., Cibulskis, K., Kernytsky, A., 159. Cady, J., and Greytak, E.M. (2022). Whole- Nat. Genet. 55, 1088–1090. https://doi.org/
Garimella, K., Altshuler, D., Gabriel, S., Daly, genome sequencing of degraded DNA for 10.1038/s41588-023-01438-3.
M., and DePristo, M.A. (2010). The Genome investigative genetic genealogy. Forensic 174. Lu, C., Ahmed, R., Lamri, A., and Anand, S.S.
Analysis Toolkit: a MapReduce framework Sci. Int.: Genet. Suppl. Series 8, 20–22. (2022). Use of race, ethnicity, and ancestry
for analyzing next-generation DNA 160. DNA Testing for Ancestry & Genealogy. data in health research. PLOS Glob. Public
sequencing data. Genome Res. 20, http://familytreedna.org. Health 2, e0001060.
1297–1303. 161. DNA and genealogy tools to grow your 175. Bonham, V.L., Green, E.D., and Pérez-
146. 1000 Genomes Project Consortium, Auton, family tree (2022). GEDmatch - Stable, E.J. (2018). Examining How Race,
A., Brooks, L.D., Durbin, R.M., Garrison, E.P., Comprehensive Solutions for Genetic Ethnicity, and Ancestry Data Are Used in
Kang, H.M., Korbel, J.O., Marchini, J.L., Genealogy and Family Tree Reseach. http:// Biomedical Research. JAMA 320,
McCarthy, S., McVean, G.A., and Abecasis, gedmatch.com. 1533–1534.
G.R. (2015). A global reference for human 162. Kim, J., and Rosenberg, N.A. (2022). Record- 176. Skinner, D. (2020). Forensic genetics and the
genetic variation. Nature 526, 68–74. matching of STR Profiles with Fragmentary prediction of race: What is the problem?
147. Skoglund, P., Northoff, B.H., Shunkov, M.V., Genomic SNP Data. Preprint at bioRxiv. BioSocieties 15, 329–349.
Derevianko, A.P., Pääbo, S., Krause, J., and https://doi.org/10.1101/2022.09.01.505545. 177. Gannett, L. (2014). Biogeographical ancestry
Jakobsson, M. (2014). Separating 163. Wojcik, G.L., Fuchsberger, C., Taliun, D., and race. Stud. Hist. Philos. Biol. Biomed.
endogenous ancient DNA from modern day Welch, R., Martin, A.R., Shringarpure, S., Sci. 47, 173–184.
contamination in a Siberian Neandertal. Carlson, C.S., Abecasis, G., Kang, H.M., 178. Alexander, D.H., Novembre, J., and Lange,
Proc. Natl. Acad. Sci. USA 111, 2229–2234. Boehnke, M., et al. (2018). Imputation-Aware K. (2009). Fast model-based estimation of
148. Holland, C.A., McElhoe, J.A., Gaston- Tag SNP Selection To Improve Power for ancestry in unrelated individuals. Genome
Sanchez, S., and Holland, M.M. (2021). Large-Scale, Multi-ethnic Association Res. 19, 1655–1664.
Damage patterns observed in mtDNA Studies. G3 (Bethesda) 8, 3255–3267. 179. Pritchard, J.K., Stephens, M., and Donnelly,
control region MPS data for a range of 164. Browning, B.L., and Browning, S.R. (2016). P. (2000). Inference of population structure
template concentrations and when using Genotype Imputation with Millions of using multilocus genotype data. Genetics
different amplification approaches. Int. J. Reference Samples. Am. J. Hum. Genet. 98, 155, 945–959.
Leg. Med. 135, 91–106. 116–126. 180. Malaspinas, A.-S., Tange, O., Moreno-
149. Rathbun, M.M., McElhoe, J.A., Parson, W., 165. Spiliopoulou, A., Colombo, M., Orchard, P., Mayar, J.V., Rasmussen, M., DeGiorgio, M.,
and Holland, M.M. (2017). Considering DNA Agakov, F., and McKeigue, P. (2017). Wang, Y., Valdiosera, C.E., Politis, G.,
damage when interpreting mtDNA GeneImp: Fast Imputation to Large Willerslev, E., and Nielsen, R. (2014).
heteroplasmy in deep sequencing data. Reference Panels Using Genotype bammds: a tool for assessing the ancestry of
Forensic Sci. Int. Genet. 26, 1–11. Likelihoods from Ultralow Coverage low-depth whole-genome data using
150. Gorden, E.M., Sturk-Andreaggi, K., and Sequencing. Genetics 206, 91–104. multidimensional scaling (MDS).
Marshall, C. (2018). Repair of DNA damage 166. Rubinacci, S., Ribeiro, D.M., Hofmeister, Bioinformatics 30, 2962–2964.
caused by cytosine deamination in R.J., and Delaneau, O. (2021). Efficient 181. Reich, D., Thangaraj, K., Patterson, N., Price,
mitochondrial DNA of forensic case phasing and imputation of low-coverage A.L., and Singh, L. (2009). Reconstructing
samples. Forensic Sci. Int. Genet. 34, sequencing data using large reference Indian population history. Nature 461,
257–264. panels. Nat. Genet. 53, 120–126. 489–494.
151. Patterson, N., Moorjani, P., Luo, Y., Mallick, 167. Kabisch, M., Hamann, U., and Lorenzo 182. Ge, J., and Budowle, B. (2020). How many
S., Rohland, N., Zhan, Y., Genschoreck, T., Bermejo, J. (2017). Imputation of missing familial relationship testing results could be
Webster, T., and Reich, D. (2012). Ancient genotypes within LD-blocks relying on the wrong? PLoS Genet. 16, e1008929.
admixture in human history. Genetics 192, basic coalescent and beyond: consideration 183. Rohlfs, R.V., Fullerton, S.M., and Weir, B.S.
1065–1093. of population growth and structure. BMC (2012). Familial identification: population
152. Carneiro, M.O., Russ, C., Ross, M.G., Genom. 18, 798. https://doi.org/10.1186/ structure and relationship distinguishability.
Gabriel, S.B., Nusbaum, C., and DePristo, s12864-017-4208-2. PLoS Genet. 8, e1002469.
M.A. (2012). Pacific biosciences sequencing 168. Hui, R., D’Atanasio, E., Cassidy, L.M., Scheib, 184. Fortier, A.L., Kim, J., and Rosenberg, N.A.
technology for genotyping and variation C.L., and Kivisild, T. (2020). Evaluating (2020). Human-Genetic Ancestry Inference
discovery in human data. BMC Genom. genotype imputation pipeline for ultra-low and False Positives in Forensic Familial
13, 375. coverage ancient genomes. Sci. Rep. 10, Searching. G3 (Bethesda) 10, 2893–2902.
153. Martin, A.R., Atkinson, E.G., Chapman, S.B., 18542. 185. Kling, D., Tillmar, A.O., and Egeland, T.
Stevenson, A., Stroud, R.E., Abebe, T., 169. Childebayeva, A., Rohrlach, A.B., Barquera, (2014). Familias 3 - Extensions and new
Akena, D., Alemayehu, M., Ashaba, F.K., R., Rivollat, M., Aron, F., Szolek, A., functionality. Forensic Sci. Int. Genet. 13,
Atwoli, L., et al. (2021). Low-coverage Kohlbacher, O., Nicklisch, N., Alt, K.W., 121–127.
sequencing cost-effectively detects known Gronenborn, D., et al. (2022). Population 186. Egeland, T., Mostad, P.F., Mevâg, B., and
and novel variation in underrepresented Genetics and Signatures of Selection in Early Stenersen, M. (2000). Beyond traditional
populations. Am. J. Hum. Genet. 108, Neolithic European Farmers. Mol. Biol. Evol. paternity and identification cases. Selecting
656–668. 39, msac108. https://doi.org/10.1093/ the most probable pedigree. Forensic Sci.
154. Imbler, S. (2022). New DNA Analysis molbev/msac108. Int. 110, 47–59.
Supports an Unrecognized Tribe’s Ancient 170. Sousa da Mota, B., Rubinacci, S., Cruz 187. Kling, D., and Tillmar, A. (2019). Forensic
Roots in California (The New York Times). Dávalos, D.I., G Amorim, C.E., Sikora, M., genealogy—A comparison of methods to
155. Li, Y., Willer, C., Sanna, S., and Abecasis, G. Johannsen, N.N., Szmyt, M.H., Włodarczak, infer distant relationships based on dense
(2009). Genotype Imputation. Annu. Rev. P., Szczepanek, A., Przybyła, M.M., et al. SNP data. Forensic Sci. Int. Genet. 42,
Genom. Hum. Genet. 10, 387–406. https:// (2023). Imputation of ancient human 113–124.
doi.org/10.1146/annurev.genom.9.081307. genomes. Nat. Commun. 14, 3660. 188. Kling, D. (2019). On the use of dense sets of
164242. 171. Ausmees, K., Sanchez-Quinto, F., SNP markers and their potential in
156. Browning, S.R. (2008). Missing data Jakobsson, M., and Nettelblad, C. (2022). An relationship inference. Forensic Sci. Int.
imputation and haplotype phase inference empirical evaluation of genotype Genet. 39, 19–31.
for genome-wide association studies. Hum. imputation of ancient DNA. G3 (Bethesda) 189. Kling, D., Phillips, C., Kennett, D., and
Genet. 124, 439–450. 12, jkac089. https://doi.org/10.1093/ Tillmar, A. (2021). Investigative genetic
157. De Marino, A., Mahmoud, A.A., Bose, M., g3journal/jkac089. genealogy: Current methods, knowledge
Bircan, K.O., Terpolovsky, A., 172. Browning, S.R., and Browning, B.L. (2007). and practice. Forensic Sci. Int. Genet. 52,
Bamunusinghe, V., Bohn, S., Khan, U., Rapid and accurate haplotype phasing and 102474.
Novkovic, B., and Yazdi, P.G. (2022). A missing-data inference for whole-genome 190. Greytak, E.M., Moore, C., and Armentrout,
comparative analysis of current phasing and association studies by use of localized S.L. (2019). Genetic genealogy for cold case

16 iScience 26, 108066, November 17, 2023


iScience ll
Perspective OPEN ACCESS

and active investigations. Forensic Sci. Int. STR markers: A worldwide survey. Forensic 221. Parson, W., Gusmão, L., Hares, D.R., Irwin,
299, 103–113. Sci. Int. Genet. 23, 91–100. J.A., Mayr, W.R., Morling, N., Pokorak, E.,
191. Conomos, M.P., Reiner, A.P., Weir, B.S., and 207. He, G., Liu, J., Wang, M., Zou, X., Ming, T., Prinz, M., Salas, A., Schneider, P.M., et al.
Thornton, T.A. (2016). Model-free Zhu, S., Yeh, H.-Y., Wang, C., Wang, Z., and (2014). DNA Commission of the
Estimation of Recent Genetic Relatedness. Hou, Y. (2021). Massively parallel International Society for Forensic Genetics:
Am. J. Hum. Genet. 98, 127–148. sequencing of 165 ancestry-informative revised and extended guidelines for
192. Goudet, J., Kay, T., and Weir, B.S. (2018). SNPs and forensic biogeographical ancestry mitochondrial DNA typing. Forensic Sci. Int.
How to estimate kinship. Mol. Ecol. 27, inference in three southern Chinese Sinitic/ Genet. 13, 134–142.
4121–4135. Tai-Kadai populations. Forensic Sci. Int. 222. Zimmermann, B., Röck, A., Huber, G.,
193. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Genet. 52, 102475. Krämer, T., Schneider, P.M., and Parson, W.
Daly, K., Sale, M., and Chen, W.-M. (2010). 208. Durand, E.Y., Patterson, N., Reich, D., and (2011). Application of a west Eurasian-
Robust relationship inference in Slatkin, M. (2011). Testing for ancient specific filter for quasi-median network
genome-wide association studies. admixture between closely related analysis: Sharpening the blade for mtDNA
Bioinformatics 26, 2867–2873. populations. Mol. Biol. Evol. 28, 2239–2252. error detection. Forensic Sci. Int. Genet. 5,
194. Snedecor, J., Fennell, T., Stadick, S., Homer, 209. Petr, M., Vernot, B., and Kelso, J. (2019). 133–137.
N., Antunes, J., Stephens, K., and Holt, C. admixr—R package for reproducible 223. Willuweit, S., and Roewer, L. (2015). The new
(2022). Fast and accurate kinship estimation analyses using ADMIXTOOLS. Y Chromosome Haplotype Reference
using sparse SNPs in relatively large Bioinformatics 35, 3194–3195. Database. Forensic Sci. Int. Genet.
database searches. Forensic Sci. Int. Genet. 210. Pickrell, J.K., and Pritchard, J.K. (2012). 15, 43–48.
61, 102769. Inference of population splits and mixtures 224. Oldt, R.F., and Kanthaswamy, S. (2020).
195. Turner, N., Scholz, J., and Acevedo from genome-wide allele frequency data. Expanded CODIS STR allele frequencies –
Evaluating the impact of dropout and PLoS Genet. 8, e1002967. Evidence for the irrelevance of race-based
genotyping error on SNP-based kinship 211. Harney, É., Patterson, N., Reich, D., and DNA databases. Leg. Med. 42, 101642.
analysis with forensic samples. Front. Genet. Wakeley, J. (2021). Assessing the https://doi.org/10.1016/j.legalmed.2019.
196. Swgdam. (2018). Recommendations of the Performance of qpAdm: A Statistical Tool 101642.
SWGDAM Ad Hoc Working Group on for Studying Population Admixture. 225. Edwards. (1993). DNA Identification Act of
Genotyping Results Reported as Likelihood Genetics 217. https://doi.org/10.1093/ 1993. https://www.govinfo.gov/app/details/
Ratios (Federal Bureau of Investigation’s genetics/iyaa045. BILLS-103s497is.
Scientific Working Group on DNA Analysis 212. Wagner, J.K., Colwell, C., Claw, K.G., Stone, 226. Joly, Y., Marrocco, G., and Dupras, C. (2019).
Methods (SWGDAM)). A.C., Bolnick, D.A., Hawks, J., Brothers, K.B., Risks of compulsory genetic databases.
197. Ringbauer, H., Huang, Y., Akbari, A., Mallick, and Garrison, N.A. (2020). Fostering Science 363, 938–940.
S., Patterson, N., and Reich, D. (2023). Responsible Research on Ancient DNA. Am. 227. Chow-White, P.A., and Duster, T. (2011). Do
ancIBD - Screening for Identity by Descent J. Hum. Genet. 107, 183–195. Health and Forensic DNA Databases
Segments in Human Ancient DNA. Preprint 213. Tsosie, K.S., Begay, R.L., Fox, K., and Increase Racial Disparities? PLoS Med. 8,
at bioRxiv. https://doi.org/10.1101/2023.03. Garrison, N.A. (2020). Generations of e1001100. https://doi.org/10.1371/journal.
08.531671. genomes: advances in paleogenomics pmed.1001100.
198. Monroy Kuhn, J.M., Jakobsson, M., and technology and engagement for Indigenous 228. Wickenheiser, R.A. (2022). Expanding DNA
Günther, T. (2018). Estimating genetic kin people of the Americas. Curr. Opin. Genet. database effectiveness. Forensic Sci. Int.
relationships in prehistoric populations. Dev. 62, 91–96. Synerg. 4, 100226.
PLoS One 13, e0195491. 214. Ávila-Arcos, M.C., de la Fuente Castro, C., 229. Amankwaa, A.O. (2020). Trends in forensic
199. Lipatov, M., Sanjeev, K., Patro, R., and Nieves-Colón, M.A., and Raghavan, M. DNA database: transnational exchange of
Veeramah, K.R. (2015). Maximum Likelihood (2022). Recommendations for Sustainable DNA data. Forensic Sci. Res. 5, 8–14.
Estimation of Biological Relatedness from Ancient DNA Research in the Global South: 230. Triverio, S.C., and Crespillo Márquez, M.
Low Coverage Sequencing Data. Preprint at Voices From a New Generation of (2022). The need for cross-border exchange
bioRxiv. https://doi.org/10.1101/023374. Paleogenomicists. Front. Genet. 13, 880170. of genetic data for criminal investigation
200. Santibanez-Koref, M., Griffin, H., Turnbull, 215. Budowle, B., and Sajantila, A. (2023). purposes in Latin America: implementation
D.M., Chinnery, P.F., Herbert, M., and Revisiting informed consent in forensic challenges. Spanish J. Leg. Med. 48,
Hudson, G. (2019). Assessing mitochondrial genomics in light of current technologies 158–165.
heteroplasmy using next generation and the times. Int. J. Leg. Med. 137, 231. GEDmatch GEDmatch & Community Safety
sequencing: A note of caution. 551–565. 232. Murphy, H. (2020). Why a Data Breach at a
Mitochondrion 46, 302–306. 216. Katsanis, S.H., Snyder, L., Arnholt, K., and Genealogy Site Has Privacy Experts Worried
201. Churchill, J.D., Stoljarova, M., King, J.L., and Mundorff, A.Z. (2018). Consent process for (The New York Times).
Budowle, B. (2018). Massively parallel US-based family reference DNA samples. 233. Edge, M.D., and Coop, G. (2020). Attacks on
sequencing-enabled mixture analysis of Forensic Sci. Int. Genet. 32, 71–79. genetic privacy via uploads to genealogical
mitochondrial DNA samples. Int. J. Leg. 217. Parson, W., and Dür, A. (2007). EMPOP—A databases. Elife 9, e51810. https://doi.org/
Med. 132, 1263–1272. forensic mtDNA database. Forensic Sci. Int. 10.7554/eLife.51810.
202. Li, M., Schönberg, A., Schaefer, M., Genet. 1, 88–92. 234. Taylor, D., Buckleton, J., and Evett, I. (2015).
Schroeder, R., Nasidze, I., and Stoneking, M. 218. Roewer, L., Krawczak, M., Willuweit, S., Testing likelihood ratios produced from
(2010). Detecting heteroplasmy from high- Nagy, M., Alves, C., Amorim, A., Anslinger, complex DNA profiles. Forensic Sci. Int.
throughput sequencing of complete human K., Augustin, C., Betz, A., Bosch, E., et al. Genet. 16, 165–171.
mitochondrial DNA genomes. Am. J. Hum. (2001). Online reference database of 235. Erlich, Y., Shor, T., Pe’er, I., and Carmi, S.
Genet. 87, 237–249. European Y-chromosomal short tandem (2018). Identity inference of genomic data
203. Vohr, S.H., Gordon, R., Eizenga, J.M., Erlich, repeat (STR) haplotypes. Forensic Sci. Int. using long-range familial searches. Science
H.A., Calloway, C.D., and Green, R.E. (2017). 118, 106–113. 362, 690–694.
A phylogenetic approach for haplotype 219. Moretti, T.R., Moreno, L.I., Smerick, J.B., 236. Fox, K., and Hawks, J. (2019). Use ancient
analysis of sequence data from complex Pignone, M.L., Hizon, R., Buckleton, J.S., remains more wisely. Nature 572, 581–583.
mitochondrial mixtures. Forensic Sci. Int. Bright, J.-A., and Onorato, A.J. (2016). 237. Mourier, T., Ho, S.Y.W., Gilbert, M.T.P.,
Genet. 30, 93–105. Population data on the expanded CODIS Willerslev, E., and Orlando, L. (2012).
204. Peter, B.M. (2016). Admixture, Population core STR loci for eleven populations of Statistical guidelines for detecting past
Structure, and F-Statistics. Genetics 202, significance for forensic DNA analyses in the population shifts using ancient DNA. Mol.
1485–1501. United States. Forensic Sci. Int. Genet. 25, Biol. Evol. 29, 2241–2251.
205. Gouy, A., and Zieger, M. (2017). STRAF-A 175–181. 238. Malaspinas, A.-S. (2016). Methods to
convenient online tool for STR data 220. Kidd, K.K., Soundararajan, U., Rajeevan, H., characterize selective sweeps using time
evaluation in forensic genetics. Forensic Sci. Pakstis, A.J., Moore, K.N., and Ropero- serial samples: an ancient DNA perspective.
Int. Genet. 30, 148–151. Miller, J.D. (2018). The redesigned Forensic Mol. Ecol. 25, 24–41.
206. Buckleton, J., Curran, J., Goudet, J., Taylor, Research/Reference on Genetics- 239. Klunk, J., Vilgalys, T.P., Demeure, C.E.,
D., Thiery, A., and Weir, B.S. (2016). knowledge base, FROG-kb. Forensic Sci. Cheng, X., Shiratori, M., Madej, J., Beau, R.,
Population-specific FST values for forensic Int. Genet. 33, 33–37. Elli, D., Patino, M.I., Redfern, R., et al. (2022).

iScience 26, 108066, November 17, 2023 17


ll iScience
OPEN ACCESS Perspective

Evolution of immune genes is associated 245. Schubert, M., Lindgreen, S., and Orlando, L. of SAMtools and BCFtools. GigaScience 10,
with the Black Death. Nature 611, 312–319. (2016). AdapterRemoval v2: rapid adapter giab008. https://doi.org/10.1093/
240. Renaud, G., Hanghøj, K., Willerslev, E., and trimming, identification, and read merging. gigascience/giab008.
Orlando, L. (2017). gargammel: a sequence BMC Res. Notes 9, 88.
simulator for ancient DNA. Bioinformatics 246. Lindgreen, S. (2012). AdapterRemoval: easy 251. Chang, C.C., Chow, C.C., Tellier, L.C.,
33, 577–579. cleaning of next-generation sequencing Vattikuti, S., Purcell, S.M., and Lee, J.J.
241. Huang, W., Li, L., Myers, J.R., and Marth, reads. BMC Res. Notes 5, 337. (2015). Second-generation PLINK: rising to
G.T. (2012). ART: a next-generation 247. Chen, S., Zhou, Y., Chen, Y., and Gu, J. the challenge of larger and richer datasets.
sequencing read simulator. Bioinformatics (2018). fastp: an ultra-fast all-in-one FASTQ GigaScience 4, 7.
28, 593–594. preprocessor. Bioinformatics 34, i884–i890. 252. Price, A.L., Patterson, N.J., Plenge, R.M.,
242. Henriksen, R.A., Zhao, L., and Korneliussen, 248. Li, H., and Durbin, R. (2009). Fast and Weinblatt, M.E., Shadick, N.A., and Reich, D.
T.S. (2023). NGSNGS: next-generation accurate short read alignment with (2006). Principal components analysis
simulator for next-generation sequencing Burrows–Wheeler transform. Bioinformatics corrects for stratification in genome-wide
data. Bioinformatics 39, btad041. https:// 25, 1754–1760. association studies. Nat. Genet. 38,
doi.org/10.1093/bioinformatics/btad041. 249. Langmead, B., and Salzberg, S.L. (2012). Fast 904–909.
243. Renaud, G., Stenzel, U., and Kelso, J. (2014). Gapped-Read Alignment With Bowtie 2.
leeHom: adaptor trimming and merging for Nat. Methods 9, 357–359. 253. Ringbauer, H., Novembre, J., and
Illumina sequencing reads. Nucleic Acids 250. Danecek, P., Bonfield, J.K., Liddle, J., Steinrücken, M. (2021). Parental relatedness
Res. 42, e141. Marshall, J., Ohan, V., Pollard, M.O., through time revealed by runs of
244. Li, H. seqtk Toolkit for processing sequences Whitwham, A., Keane, T., McCarthy, S.A., homozygosity in ancient DNA. Nat.
in FASTA/Q formats. GitHub Davies, R.M., and Li, H. (2021). Twelve years Commun. 12, 1–11.

18 iScience 26, 108066, November 17, 2023

You might also like