You are on page 1of 12

Forensic Science International: Genetics 33 (2018) 117–128

Contents lists available at ScienceDirect

Forensic Science International: Genetics


journal homepage: www.elsevier.com/locate/fsigen

Review article

A review of bioinformatic methods for forensic DNA analyses T


a,⁎ b
Yao-Yuan Liu , SallyAnn Harbison
a
Forensic Science Program, School of Chemical Sciences, University of Auckland, 38 Princes Street, Auckland, 1010, New Zealand
b
Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland, 1142, New Zealand

A R T I C L E I N F O A B S T R A C T

Keywords: Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of
Bioinformatics markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel
Forensic science sequencing platforms in forensic science reveals new information such as insights into the complexity and
Massively parallel sequencing variability of the markers that were previously unseen, along with amounts of data too immense for analyses by
manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process
and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic
applications, development and standardization of efficient, favourable tools for each stage of data processing is
being carried out, and faster, more accurate methods that improve on the original approaches have been de-
veloped. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have in-
corporated pipelines into sequencer software to make analyses convenient. This review explores the current state
of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel
sequencing (MPS) platforms currently most widely used.

1. Introduction well as other non-human genetic evidence (for a review see [35]) are
gaining attention as MPS technology facilitates their adoption. For the
In recent years, the forensic community has been exploring the use convenience of the analyst, manufacturers of MPS platforms have
of massively parallel sequencing (MPS) for DNA analysis [1–7]. As the packaged analysis tools with their machines in the form of plugins
cost of MPS platforms became more affordable, useful information ex- within analytical software frameworks. These include the MiSeq Re-
isting in parts of DNA previously overlooked or deemed uneconomical porter/ForenSeq Universal Analysis Software bundled with the MiSeq
to examine could be investigated. Børsting and Morling have provided a (Illumina; San Diego, Ca., USA, https://www.illumina.com), and the
detailed introduction to MPS forensic analysis [8]. An area of great Torrent Suite bundled with the Ion Torrent PGM (Thermo Fisher Sci-
focus in forensic MPS studies is in data processing, where methods and entific, South San Francisco, CA, USA, https://www.thermofisher.com).
techniques are being explored for the analyses of the large amounts of Bioinformatic pipelines provided by the manufacturers of MPS
data produced during an MPS run. Forensic science has inherited platforms offer convenience at the expense of variety and flexibility of a
bioinformatic workflows and methods used in other fields of DNA sci- hand-assembled pipeline. Manufacturer-provided solutions are indeed
ence [9–12]. These workflows were further developed and/or stimu- useful and have been widely used by the forensic science community. In
lated the production of new tools for forensic science [13–17]. this review, they are described along with open-source pipelines in their
Current DNA markers of forensic interest are short tandem repeats relevant marker sections. In this rapidly changing environment, se-
(STRs), single nucleotide polymorphisms (SNPs) and polymorphisms in quence analyses software remains in continued development and ver-
the whole mitochondrial genome (mtDNA)[18,19]. These markers sions of the software described in published papers may be outdated by
provide different types of information, and as the characteristics of each the time of reading.
marker type are different, specific pipelines and tools have been de- This review describes the main marker systems currently used, or
veloped for each marker type. considered to be used, in forensic bioinformatics and introduces
Other types of forensic DNA markers have also been described, such bioinformatic solutions for data processing and analyses. Bioinformatics
as indels (for example [20–22]), epigenetic methylation markers (for tools most relevant to forensic practice are presented, and review ar-
example [23–27]) and Alu repetitive elements (for example [28–31]). ticles are provided for further information where these are considered
In addition, metagenomic approaches to forensic evidence [32–34] as helpful, however, it is not possible to include all the possible tools


Corresponding author.
E-mail address: alex.liu@esr.cri.nz (Y.-Y. Liu).

https://doi.org/10.1016/j.fsigen.2017.12.005
Received 13 September 2017; Received in revised form 30 November 2017; Accepted 10 December 2017
Available online 12 December 2017
1872-4973/ © 2017 Elsevier B.V. All rights reserved.
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

existing in the sphere of biological computation. Forensic DNA data- absolute certainty, quality scores are assigned to each base by the se-
bases will also be introduced. quencer as a metric of the probability of an incorrect basecall. In other
In addition to population sequence databases such as the genome/ words, the software calculates the probability of an incorrect base call.
exome Aggregation Databases (genomAD, http://gnomad. The lower the probability, the more confident the call. This is because
broadinstitute.org, and ExAC, http://exac.broadinstitute.org/), and inclusion of poor quality data may lead to noise or systemic errors
the 1000 Genomes Project [36,37], the forensic community now has during analysis and affect the results.
access to marker-specific databases such as EMPOP [38] for mtDNA, Quality control includes the assessment of the data quality and
ALFRED [39] and STRidER [40] for SNP and STR allele frequencies subsequent removal of poor quality information.
respectively, and STRseq [41] for STR sequence variants. Visualization of the data quality may be useful to check the success
This review does not cover the subsequent interpretations of STR of the run and overall base quality trends to identify any potential
profiles, such as the resolution of the components of a mixture or the problems with the data. FastQC (http://www.bioinformatics.bbsrc.ac.
determination of a match. uk/projects/fastqc) developed by the Babraham Bioinformatics group,
provides statistics and graphical displays of the per base sequence
2. Data preparation quality, read lengths and other helpful metrics from FASTQ and BAM/
SAM files (described below). SolexaQA is another tool for quality
Before the amplified DNA fragments are sequenced, short sequences analysis and visualization, but also packaged with quality control tools
are appended to the DNA during library preparation to facilitate the such as a read trimmer and a length-based filter [44]. This tool accepts
sequencing process [8]. As such, the raw, digital form of a DNA read FASTQ files.
contains the sequence of interest as well as the additional appended Quality control tools such as quality trimming and quality filtering
sequences. These include the PCR primers that flank the sequence of remove poor quality bases using Phred quality scores [42]. Read
interest, a barcode (index) sequence for recognition of a sample’s trimming, the removal of low-quality sequence tracts from a read, is
origin, and adapter sequences at the ends required to attach and im- useful for removing poor quality bases from the ends of reads, while
mobilize each library to a solid surface for sequencing. These sequence read filtering is useful for removing entire reads of overall poor quality.
additions are later removed to reduce the sequence of interest to its The FASTX-toolkit (http://hannonlab.cshl.edu/fastx_toolkit) contains
original state. MPS platforms such as the Ion Torrent PGM, and the both a quality filter and a quality trimmer along with many other tools.
Illumina MiSeq produce adapter-trimmed, demultiplexed FASTQ [42] As base quality drops towards the end of a sequence read, lower
files, This means each sequence read is sorted and assigned by barcode quality bases at the 3′ end of reads are removed by default in the
to their sample of origin. Each such FASTQ file contains the DNA se- Torrent Suite Software using a sliding window approach, where the
quences and their quality scores. window size is 30 bases and the trimming threshold is a q score of 15.
Data preparation, or data preprocessing, are the first steps in an Reads that are short (less than eight base pairs) or adapter dimers are
MPS data analysis pipeline. This includes quality assessment/control also removed as part of read filtering (https://ts-pgm.epigenetic.ru/ion-
steps, and for some applications, an alignment step (Fig. 1). docs/Technical-Note—Filtering-and-Trimming_6455370.html).
For a review on general tools for variant analysis on MPS data, Illumina data, on the contrary, are not quality trimmed or filtered by
Pabinger et al. [43] provide a survey of several quality assessment, default, and these steps would need to be manually included in the
alignment, and variant identification tools not covered in this review. analysis pipeline if necessary.
In most formats, the DNA data can be visualized and explored with
several software options. Examples of these are the Integrative 2.2. The general-purpose pipeline: align and variant call
Genomics Viewer (IGV, http://software.broadinstitute.org/software/
igv/home), Geneious (https://www.geneious.com/), and NextGENe For the majority of sequence studies, alignment to a reference
(http://www.softgenetics.com/NextGENe.php). genome and variant calling form the core of a general workflow for
genetic variant examination.
2.1. Data quality assessment and control Pipelines that adopt the alignment approach [2,7,45] include an
alignment step in which reads are mapped/aligned to a reference
Since MPS instruments cannot identify individual bases with genome creating an alignment file. Note that the International Society

Fig. 1. Sequences added to DNA for sequencing, and


their bioinformatic removal.

118
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

for Forensic Genetics (ISFG) recommends using the most recent build of likelihood, and then determines the most likely genotype of SNPs and
the human reference genome hg38, which includes the mitochondrial indels.
reference standard, rCRS. Variant calling is then performed on the
alignment file to determine the genotype of each base position of in- 3. SNPs
terest. This approach has been used in examining SNPs, mtDNA, and
STRs, but other strategies exist, such as the sequence-search approach 3.1. Current state
for STRs and SNPs.
Alignment pipelines mainly consist of a short read aligner and a SNPs are versatile markers, capable of providing additional in-
variant caller. Although seemingly simple in construction, different formation from forensic samples such as ancestry and externally visible
combinations of these components can yield varying results [12,46,47]. characteristics, alongside STR profiling which is the current standard
For more information on alignment-variant calling pipelines, for human identification.
Cornish et al. [46] and Hwang et al. [47] have provided detailed In forensic science, the single nucleotide changes that usefully in-
comparisons of pipelines constructed from different tools. form ancestry or phenotype are typically SNPs, common to at least 1%
of the population being studied, whilst those found in flanking regions
2.2.1. Alignment are more often described as novel and are therefore considered rare and
Read aligners map reads to a reference genome. There are many should be called SNVs. Over time as more forensic MPS studies are
aligners available, such as Bowtie 2 [48], Novoalign (http://novocraft. conducted, some SNVs may be found to be more properly termed SNPs,
com), and BWA-MEM [49]. Bowtie and Burrows-Wheeler Aligner at least in some populations. For the purposes of this review (and most
(BWA) are very efficient [50] and have been widely used. often in the literature) all of these single nucleotide differences are
Some aligners are better suited to data produced on particular referred to as SNPs.
platforms due to the platforms’ characteristic data patterns. For ex- Genome-wide association studies and the HapMap project [56]
ample, BWA is suitable for Illumina data, which contain mainly sub- generated an immense amount of SNP data. Based on the type of in-
stitution errors, and TMAP (https://github.com/iontorrent/TS/tree/ formation that can be inferred, forensically-relevant SNPs were selected
master/Analysis/TMAP) is more suitable for Ion Torrent reads which and classed into four categories: Identity-informative (IISNPs) for in-
contain predominantly indel errors [51]. dividualization, lineage-informative SNPs (LISNPs) for kinship haplo-
For more information on aligners, Fonsenca et al. [52] performed a typing, ancestry-informative SNPs (AISNPs) for biogeographical an-
survey of over 60 aligners and present an overview of their character- cestry inference, and phenotypic-informative SNPs (PISNPs) for
istics. external phenotype prediction [57].
The output of read aligners is an alignment file in SAM (Sequence A wide array of techniques and technologies exist for SNP geno-
Alignment/Map) format [53]. The SAM/BAM formats are the industry typing and these are well described in Sobrino’s review [58].
standard for alignments. SAM files contain human-readable text, with To date, forensically relevant SNPs have been mainly genotyped
each row within the file detailing an alignment of a read to the re- using the SNaPshot technique, which employs electrophoresis and
ference genome. For faster processing by computers, SAM is converted fluorescence detection of SNPs [58]. In brief, a primer targeting the SNP
to BAM (Binary Alignment/Map), a binary form of the same informa- anneals adjacent to the SNP position then is extended by a DNA poly-
tion. This is done using SAMtools, the software package for parsing and merase to incorporate a fluorescently labelled dNTP at the SNP posi-
manipulating SAM/BAM alignments [53]. tion. The fluorescent colour is then detected by capillary electrophor-
BAM files can be sorted and indexed to further improve reading esis (CE) to identify the SNP [59].
efficiency by downstream tools. BAM file sorting orders the reads based A wide range of SNaPshot assays are used for routine SNP testing
on their position in the reference, done using the Picard Tools (https:// and these are described in Mehta’s review [60]. Other SNP genotyping
broadinstitute.github.io/picard/) SortSam module, or the sort function methods such as the Genplex system from Applied Biosystems have
in SAMtools. Indexing the BAM file with the SAMtools’ index function been used (for example [61]) however these methods are limited by the
creates an index file (.bai), which allows for the more efficient reading number of SNPs that can be tested. MPS offers a much larger number of
of the BAM file. SNPs that can be genotyped concurrently compared with earlier assays,
Local realignment can be done to realign misaligned reads, which as well as the ability to run multiple samples together, detection of
may produce bioinformatic artefacts that resemble SNPs and indels. The every nucleotide in the amplicon, and the pooling of multiple SNaPshot
Genome Analysis Toolkit (GATK) contains the modules panels [62].
RealignerTargetCreator and IndelRealigner for this purpose [11,54]. There is an ongoing effort in forensic SNP panel development, done
Tools for STR local realignment are described in Section 4.3. usually by scrutinising freely available online data such as the 1000
genomes project, assessing SNP performance from different panels,
2.2.2. Variant calling combining them into a panel and testing against relevant populations;
Variant callers identify the differences between the sample and a the variety of such panels span populations, information types, and
reference sequence and typically take alignment SAM/BAM files as autosomal and sex chromosomes [63–70].
input and generate a Variant Call Format (VCF) file [55] (its binary The replacement of STRs by SNPs as the dominant forensic marker
counterpart is the BCF file) which is used for variant calling and has also been debated [71]. These single-base markers make the ex-
downstream analyses and interpretation. BCFtools is a software toolset traction and amplification from degraded DNA samples easier and are
for variant calling which reads and writes VCF and BCF files. The GATK, less prone to stochastic effects. SNPs also have lower mutation rates
and Freebayes are also common variant callers. than STRs and may associate with phenotypic traits, thus making them
Variant calling with BCFtools (included as part of the SAMtools useful for a variety of applications, and being non-tandem-repeat
package) involves calling the mpileup command to produce a BCF file markers, the lack of stutter artefacts vastly improves mixture inter-
and computes the genotype likelihood at each genomic position. Then pretation, including an assessment of the number of potential con-
the bcftools command is called to calculate the genotype and call the tributors. The main disadvantage of SNPs is the higher number of loci
variants, producing a VCF file. Variant filtering, subject to analysis, is required in order to match the discriminatory power of STR assays and
applied by running the vcfutils.pl script to remove false positives or the difficulty in mixture analysis due to the limited polymorphism
low-quality calls. (usually bi-allelic) of the SNP alleles [71]. These disadvantages may be
The GATK contains a Haplotype caller (which replaced the Unified overcome using MPS techniques and analysing larger numbers of SNPs.
Genotyper) which performs a local re-assembly, calculates haplotype Microhaplotypes are loci with two or more SNPs that occur in a

119
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

short segment of DNA within the length of a MPS read, and collectively The multiple-individuals toolset in Snipper performs ancestry clas-
define a multiallelic locus [72,73]. These microhaplotype loci can sification and accepts input data from multiple individuals in a
provide forensic information for identification, lineage and biogeo- spreadsheet format. The user is freed from the fixed SNP input order
graphic ancestry. These markers can be used for mixture deconvolution, and can add SNP information for a customized analysis. (http://
and because they are unaffected by PCR stutter may be more robust mathgene.usc.es/snipper/analysismultipleprofiles.html). HIrisPlex
than the STRs. Kidd et al. have identified a collection of microhaplotype [65] is a validated set of SNP markers used to predict hair and eye
loci and estimated their allele frequencies on 83 different populations, colour. The HIrisPlex website (http://hirisplex.erasmusmc.nl/) con-
demonstrating their superiority over individual SNPs, while fulfilling tains tools that can be used to make predictions; the hair and eye colour
the forensic objectives of STRs for discrimination between individuals prediction from SNP alleles are produced using a Multinomial Logistic
[74]. Regression model trained on population data [81,82].
SNP reporting is not complicated by nomenclature issues, but For SNP information, STRbase [83] contains a page on forensic
strandedness can cause issues in interpretation, where one allele is re- SNPs with a collection of reference materials. It includes a collection of
ported for each chromosome and passed to an analysis tool. Most online SNPs and available SNP assays, information on the varieties of SNPs,
prediction tools, such as Snipper [75], are sensitive to the input se- typing technologies, and general SNP information.
quence of the bases, and the input of the SNP on the wrong strand The Allele Frequency Database (ALFRED) is an online and freely
effectively functions as a base miscall and leads to a compromised accessible database containing allele frequencies of anthropologically
analysis. The analyst should be aware of this when using SNP analysis defined populations [39]. Its emphasis on population-based allele fre-
tools. It is notable that the Ion Torrent reports all SNPs in the forward quencies sets it apart from databases serving as general catalogues of
strand, while the MiSeq FGx reports SNPs consistent with dbSNP. Illu- polymorphisms such as dbSNP [84]. ALFRED data is used by the online
mina has devised a method for strand designation of ambiguous and tool Forensic Research/Reference on Genetics knowledge base (FROG-
non-ambiguous SNPs: (https://www.illumina.com/documents/ kb), which aims to make allele frequency data more accessible and
products/technotes/technote_topbot.pdf). useful in a forensic setting. FROG-kb contains tools to compare user
data to allele frequencies in populations, with a focus on identification,
3.2. Workflows and pipelines ancestry, and phenotype SNP types [85].
Ideally, global population sets could be used for interpretation, but
3.2.1. Data processing in practice, it is most likely that laboratories will develop independent,
Processing of SNP MPS data is traditionally done using alignment local databases reflective of local populations and in keeping with na-
pipelines such as those described in Section 2.2.1, mainly consisting of tional ethical and data-sharing legislation. Of note reference datasets
an aligner and a variant caller. However, sequencer-specific software for the major MPS ancestry SNP sets are available through the Snipper
solutions have become popular and have been used in many published website (http://mathgene.usc.es/snipper/). Accessible, appropriately
studies [70,76,77]. validated online or on-instrument resources and tools will become an
Pipelines used in Ion PGM forensic SNP studies with the Torrent important consideration for laboratories.
Suite v3.0 involve the alignment of reads to hg19 using the Torrent
Mapping Alignment Program (TMAP), which produces a BAM align- 4. STRs
ment, and subsequent variant calling is done using the Torrent Variant
Caller v3.0 with default parameters, which produces a VCF file [62]. 4.1. Current state
The HID_SNP_Genotyper is another Torrent Suite plugin based on
the Torrent Variant Caller. It performs SNP calling and outputs a VCF Short tandem repeats (STRs) are the standard marker type currently
file and a text file containing the SNP genotypes. A HID Genotyper used in forensic DNA profiling. STR alleles are typed by length using CE
Report is generated that summarizes the data and analysis in a gra- methodology.
phical user interface which can be viewed on the Torrent Server. This The methodology of STR typing by CE is well developed [86], and
plugin is used in many studies [7,63,70,76,78]. many commercial STR PCR amplification kits are available, such as the
On the MiSeq FGx, the bundled analysis software is called the GlobalFiler™ PCR amplification kit from ThermoFisher Scientific [87],
ForenSeq Universal Analysis Software (UAS). UAS performs alignment, the PowerPlex® 21 system from Promega Corporation [88], and the
allele calling, genotyping, and generates a web-based report that can be Investigator 24 plex kit from Qiagen [89].
viewed on the server in a graphical user interface. The UAS has been As MPS is becoming more widely adopted, STR typing by MPS is
used in some studies [77,79,80]. considered to be feasible and has distinct advantages over CE, such as
Although being primarily STR analysis tools, STRait Razor [15] and the ability to detect STR sequence variations and the larger number of
MyFLq [14] can be configured to extract SNP regions from a FASTQ file DNA markers that can be multiplexed together.
without the need for alignment. The sequence-search approach used in Recently, MPS has been used in numerous studies of STRs including
these tools for STRs extraction can be applied to SNPs, which involves the choice and evaluation of multiplex systems [5,90–92], the devel-
scanning the sequence data file for flanking sequences adjacent to the opment of statistical methods for interpretation [93], proof of concept
SNPs. These tools are described in Section 4.2. for simultaneous analysis of STRs with mRNAs for tissue identification
[94], improved resolution of parental cases [79], and the investigation
3.2.2. Databases of STR sequence variations [95–98].
After SNPs are identified, a number of databases and online analysis Concurrent with MPS studies of STRs were innovations in STR data
tools can be used for interpretation of the genotypes. Many of these analyses methods and software tools. These include lobSTR [99],
interactive tools make predictions based on population datasets. For STRait Razor [15,100,101], TSSV [102], the MyFLq framework [14],
example, Snipper (http://mathgene.usc.es/snipper/) [75] is an online STRinNGS [16], SEQ Mapper [103], and FDStools [13].
SNP-Indel classification tool where users can choose a marker set, the In the study of STRs, the nomenclature of STRs have been the
population groups, and a classifier algorithm. The user manually strings subject of ongoing discussions. These discussions follow from reports of
together a collection of ancestry-informative SNP genotypes in a given the extensive sequence variation within STR repeats that has been
order and feeds the string into Snipper. The classification gives a pre- found, flanking region polymorphisms found and requirements for
diction of the population group and a percentage of possible admixture. compatibility between laboratories and within laboratories for DNA
For example, using the test example SNP string on the site, the results databasing of DNA sequence profiles.
suggest that the individual is most likely European 89.24%. As MPS generates sequence information as well as length, there has

120
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

been a search for a format that best conveys the sequence information range of STR studies [96,107,108].
with all its variations, while maintaining compatibility with current CE
databases. 4.2.1. Alignment approach
One proposed format contains the combined elements of STR locus One of the earliest STR detection and identification approaches was
name, CE allele length, chromosome coordinates of the repeat motif in to align the reads to a reference genome and extract the STRs by their
the reference genome, the sequence with repeats in concatenated locations on the alignment.
bracket form, and variations outside of the STR [104]. An example from Bornman et al. developed a method to align reads to a FASTA re-
this paper is as follows: ference containing just the STR loci [2]. This took advantage of the fact
D13S317-CE12−chr1 3-GRC h37-g.82 .722 .160:82 .72 that forensic STR analyses were traditionally accomplished using am-
2.223−TATC[13] AATC[1] ATCT[3]−x. 136G > A plicon sequencing instead of whole genome sequencing.
However, complications exist beyond the formulation of a STR no- These reference files contain a concatenated STR sequence for each
menclature. An example of the challenges in the creation of a new locus and their upstream and downstream flanking sequences. The
system is the historical designation of STRs for use in CE, which were alignment was done using Bowtie, using ungapped alignment to keep
not fully consistent, examples being the strand for repeat region des- the repeat sequences of the reads intact. Picard (http://broadinstitute.
ignation and the inclusion of incomplete repeat sequences towards the github.io/picard/) and BedTools were used to convert the BAM files to
counting of repeat units in an allele [104]. This requires further work in a Browser Extensible Data (BED) format that contains the start and end
re-defining the STR regions. positions of each read, then a custom script in R was used to filter out
The International Society for Forensic Genetics (ISFG) has a com- reads that do not span the entirety of the STR and its flanks. Alleles
pilation of suggestions and insights into the matter [105], and more were called based on the number of reads, using a heuristic decision
recently, the STR Sequencing Project (STRSeq) was initiated to collect model. The components of the pipeline are interchangeable with those
and curate STR sequence data to facilitate the standardization of STR of a similar nature, such as swapping the tool Bowtie for BWA for the
sequence nomenclature [41]. alignment process [106].
Suggestions include the use of a comprehensive format for in- lobSTR [99] is a STR extraction tool, a self-contained pipeline that
formation and backward-compatibility with CE, as well as a simple detects the presence of STRs on reads by repeat stretches. Finding the
format for easier communication. STR records in databases should be in STR stretch, STR locus flanks are aligned to the left and right side of the
the form of complete sequence strings instead of shortened nomen- stretch to identify the locus and reveal the number of repeat units. A
clatures, so issues that arise from notation may be mitigated. To in- statistical model is used to call the loci alleles.
crease discrimination, flanking region variation in the genome should This approach was much faster and precise than ungapped align-
also be included in the database. ment with short-read aligners [99]. However, shortcomings that were
identified were its ability to only analyse 2-6-mer simple motif repeat
4.2. Workflows and pipelines loci, which make complex repeats difficult to analyse [100] and its lack
of awareness of homopolymer runs [9]. lobSTR has since been updated
Following pre-processing and quality assessment of the raw MPS to version 4.0.6. The approach introduced by lobSTR spawned in-
data, STRs are detected, identified, and extracted. vestigations into two intertwined details of its workings, the correct
Many methods of STR detection and identification have been de- alignment of the reads to the reference and accurate estimation of re-
vised, giving rise to many pipelines based on different approaches peat numbers.
(Table 1). A few studies have performed data analysis using more than Due to sequence variations that may differ from the reference allele
one pipeline to test concordance and reliability [95,106]. in the lobSTR database, and sequencing errors in STR sequences, initial
To streamline STR data analysis, MPS machine manufacturers have alignment may not place reads in the optimal locations, leading to
assembled and incorporated STR analysis pipelines into their software miscalled variants. Realignment tools can be used to revise misaligned
solutions. These sequencer software such as the MiSeq ForenSeq UAS reads at STR loci. The changes made to the tool can be found at: http://
and the Torrent Suite HID_STR_Genotyper plugin have been used in a lobstr.teamerlich.org/changelog.html.

Table 1
Freely available STR analysis tools.

Tool name Intended platforms Input formats Output formats Reference

loBSTRv4 Linux FASTQ/FASTA BAM alignment file [99]


Mac BAM VCF file
Curoverse (online) A text file describing the most likely alleles.
RepeatSeq Linux BAM VCF file or two custom output formats [9]
FASTA + region file
STRviper Most platforms, as it is written in Java BAM/SAM STRV file containing allele calls, convertible to VCF [111]
VCF (with scripts)
CoalescentSTR Statistical method available in paper [112]
STRait Razor v1.0-v2s Linux FASTQ/FASTA Allsequences.txt (All found sequences with read coverage) [101]
rawSTRcalls.txt (read counts of known alleles)
STRait Razor v3 Windows FASTQ/FASTA Allsequences.txt [15]
Mac
Linux
MyFLq Linux FASTQ XML files with alleles found and figures [14,116]
Illumina BaseSpace (online)
Tool webpage (Online)
STRinNGS Linux FASTQ PDF file with STR profile [16]
BAM A results file to be analysed with R
VCF
SEQ Mapper Online tool FASTA Web summary report [103]
TSSV Linux FASTQ/FASTA Overview report of general statistics and identified alleles [102]
+reference alleles in CSV format

121
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

ReviSTER [109] is an automated pipeline for local realignment. It approximate match program TRE-AGREP was used to find and extract
first maps all the reads to the reference then generates a temporary reads containing the flanking sequences. In version 3.0, a method of
reference from the consensus. The temporary reference is used as the accomplishing approximate matching using an exact match algorithm
reference for re-alignment. The reads mapped to the temporary re- with a prefix tree replaces TRE-AGREP for substring search [15].
ference are then moved back to the original reference sequence. The flanks are then trimmed off the STRs and reads not containing
STR-realigner [110] is another realignment tool for STR regions. the short substring of the repeat motif are filtered. The alleles are
The tool uses the dynamic programming algorithm for alignment and named by sequence length and converted into the numerical allele
takes into account the STR repeat pattern, allowing size change in the name using the allele length dictionary. By defining alleles by length
STR regions. In Kojima’s study, it performed better than ReviSTER and only, its capture of STR sequences is unaffected by sequence variation.
the GATK IndelRealigner. In Version 2s [101], configuration files were added for all com-
RepeatSeq [9] is an STR extraction tool that contains a realignment mercially available MPS multiplexes including SNPs, and also a data-
procedure. It performs realignment after first mapping reads to a re- base for the conversion of STR sequences to recommended nomen-
ference and then uses a Bayesian statistical model to estimate the most clature, and a tool for visualizing the output data.
probable alleles based on the reference. In a comparison with the intial It is notable that the algorithm remained mostly unchanged up to
lobSTR analyses, RepeatSeq was found to be able to assign genotypes to version 2s, whereas a new algorithm was used in version 3.0. We found
repeats at a much higher rate, but did not make calls for some of the loci version 3.0 dramatically faster, typically taking a few seconds to com-
called by lobSTR, most likely due to the disparity in the numerical re- plete processing of a test FASTQ file, whereas v2s took a couple of
quirements for allele calling and the superior ability of lobSTR to detect minutes. At stock settings, version 3.0 extracted a marginally lower
sequences that vary greatly from the reference. number of reads per locus.
STRviper [111] and CoalescentSTR [112] are also concerned with STRait Razor has been used for multiplex system evaluation [5],
repeat number estimation. Both implement novel statistical models for sequence variation [95], in a pipeline automated with Bpipe by Get-
repeat number estimation based on the difference between the actual tings et al. [115] and flank analyses [96].
insert size and the insert size inferred from aligned, pair-end reads. MyFlq [14] also extracts STRs from FASTQ files. The tool maintains
STRviper has been shown to outperform its predecessors, lobSTR and a MySQL database of known alleles which is used to calculate a con-
RepeatSeq [111], but was later outperformed by CoalescentSTR [112]. sensus left and right flank for the STR locus. Reads are assigned to a
The creators of lobSTR later developed a new tool, haplotype in- locus by the presence of PCR flanks, flanks are then removed and the
ference and phasing for STRs (HipSTR)[113]. Designed for Illumina sequence is compared with the reference alleles in the allele database.
sequence data, HipSTR learns the stutter characteristics of each STR Based on the degree of the match, the reads are assigned flags to assist
and uses this information to realign STR-containing reads and mitigate manual interpretation of the alleles. Homopolymer stretches of the se-
the effects of PCR stutter. In a benchmark for genotyping accuracy, quence are ‘compressed’ into a base during matching if no initial match
HipSTR outperformed RepeatSeq and its predecessor, lobSTR. is found in the database. It is also available as an application with a
graphical user interface [116].
4.2.2. STR detection by flanking sequence Other sequence-searching examples include STRinNGS and SEQ
A computational difficulty with the alignment approach to STR Mapper. STRinNGS [16] takes FASTQ or unaligned BAM files as input,
detection is the mapping of reads to multiple locations, also called and identifies STRs by flanking sequences. In addition to the STR se-
multi-reads. In other words, sequence similarity between repeat motifs quence, STRinNGS also analyse flanking region variations by aligning
of different STRs may result in the same read mapping to 2 different against hg19. SEQ Mapper [103] utilizes an allele library like MyFLq
loci. As the similarity between two repeat sequences increases, the and runs multiple searches for the presence of STR primers, flanks, and
confidence in any read placement within the repeat decreases [50]. STRs in different combinations. The multiple searches yield useful di-
Since the source DNA is virtually never identical to the reference, the agnostic information that may lead to the discovery of flank or se-
alignment tool may map a read to a location it deems most similar to quence variations.
the reference based on given parameters, rather than the correct loca- The dilemma with sequence-search strategies lies mainly in the
tion. thresholds used for the approximate match algorithm and the varia-
The sequence-search approach tools were developed in response to bility present in the sequences. Setting a low, hard threshold may cause
the limitations of pipelines that identified STRs only by sequence, more variable flanks to be missed while raising the threshold and al-
which are confounded by STR repeat complexity, novel alleles, and lowing more substitution and insertion/deletions may lead to false-
sequence variation. Instead of finding STRs by alignment to a reference positive flank matches. Furthermore, if the flanks are insufficiently
by sequence or feature, sequence-search tools find STRs by the repeat different from another there is a potential for ambiguous matches.
region flanks, which bypasses the difficulties that arise from STR se- When multiple flanks, genuine or otherwise, are found on one se-
quence variation and complexity. quence, this may cause the extraction of multiple alleles from one read.
STR extraction tools for sequence-search pipelines are characterized TSSV [102] functions in a similar manner to STRait Razor [100], but
by the use of a STR information database and a string matching algo- instead of an approximate match algorithm, it uses semi-global pairwise
rithm. The data does not need to be aligned to a reference sequence. alignment to find STR flanks in reads. STRs are detected by the align-
The STR database contains the flanking sequences for each STR ment of STR loci flanks onto a read and the set of flanks with the highest
locus and sometimes sequences of the STRs themselves. Such databases alignment score (most similar to the aligned region) denote the locus.
are usually able to be curated by the user, allowing for the tweaking and Once a flank pair is aligned, the STR in between is extracted and
addition of sequences of interest. The information in the database is classified as a known or new allele and reported in a text output file.
used by the string matching algorithm, which scans through the data By using alignment, a flank only locates to one position on the read,
file to find matching sequences. Usually, an approximate string circumventing the multiple flanks problem. However, the best scoring
matching algorithm is used so that a degree of mismatch is allowed to flank alignment is not guaranteed to be the correct STR flank region and
account for sequence variation. Upon finding a match, the read is STRs missing flanking region sequences are unlikely to be detected; a
copied or extracted from the data file and collected as reads for the comparison with lobSTR (publication version) revealed that TSSV is
allele. better for recognizing complex repeats, while lobSTR is better for par-
The STRait Razor tool uses a plain-text database containing flanking tial reads [102].
sequences, a short substring of the STR repeat motif and a dictionary of Most of the above tools are concerned with the extraction of the STR
allele lengths for each STR locus [100]. Up to version v2s [101,114], an alleles, leaving the separation of minor alleles from stutter artefacts to

122
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

the analyst. FDStools is a software solution taking a stutter modelling detection of heteroplasmic variations [45].
approach to identify stutter sequences [13]. In the tool, a database of
reference samples is consulted to generate a model for stutter, which is 5.2. Workflows and pipelines
applied to the alleles for profile correction.
According to the guidelines for mitochondrial DNA typing set in
4.3. Statistical analyses and databases 2014 by the International Society for Forensic Genetics, consensus se-
quences should be determined using dedicated software to align raw
STRbase, an abbreviation of Short Tandem Repeat DNA Internet reads to the rCRS. Variants should be recorded as the differences
DataBase, is a major online STR database [83]. It contains a collection compared to the rCRS, and confirmed by a second independent analysis
of STRs and allele sequences, a variety of DNA marker information, of raw data [125]. mtDNA data analysis traditionally uses the align-
training materials, and software tools. STRbase STR information has ment approach. The reads are aligned to the rCRS using short read
been referenced in the creation of custom STR reference files for aligners, then the base positions differing from the rCRS are extracted
alignment [2] and tools [14,16,100], as well as primer design [117]. from the alignment as variants, into a VCF file.
STRidER [40] is an online allele frequency population database The VCF file contains the SNP variations required for the calculation
based on allele numbers which could be queried or downloaded. of a haplotype in a format supported by many downstream analyses
The STR Sequencing Project (STRSeq)[41] is a collaborative effort tools. MitoSAVE is one excel workbook-based tool that carries out such
between several laboratories to facilitate the sequence description of haplotype conversion. Taking a VCF file as input, it produces a mtDNA
STR alleles. The project has produced a catalogue of STR sequence di- haplotype that follows standardized forensic conventions [134]. This
versity, guidelines on STR nomenclature, and STR sequences available removes the need for manual translation of a VCF output file into a
online as GenBank records. haplotype, thus reducing clerical error.
Instead of manually running the tools for data processing, one could
5. mtDNA use an automated pipeline. MToolBox is an example of such: it takes
BAM, SAM and FASTQ files as input and outputs a VCF file and hap-
5.1. Current state logroup assignments [135]. It contains a mapper/remapper that aligns
reads to either the rCRS or RSRS, the GATK indel realigner, a variant
mtDNA can be used to reveal lineage and is useful in situations caller for generation of VCF, and a classifier for haplogroup assignment.
when nuclear DNA is unavailable, highly degraded, or where additional Analysis workflows that have been polished and packaged are of-
lineage information is required [19]. fered as commercial analysis software. Commercial software, such as
Traditionally, mtDNA was analysed using Sanger sequencing NextGENe and CLC Genomics Workbench, contain tools capable of
[118–121], focusing on the noncoding control region (CR), thought to quality filtering, alignment and generation of mutation reports/variant
contain more sequence variation due to a higher mutation rate. The annotation and have seen frequent use in mtDNA studies [45,136,137].
analysis trend has moved from the specific regions of variation, hy- AQME [17] is a toolkit for use in the user-defined mtDNA analysis pi-
pervariable segments I and II (HVS-I & HVS-II) and the hypervariable peline within the CLC Genomics Workbench to further facilitate the
segment III (HVS-III) to the entire control region [122,123] and sub- automated mtDNA analysis of sequence data and the production of a
sequently the whole mtDNA genome [124]. forensic profile. GeneMarker HTS is a specialized mtDNA analysis
The move from the analyses of the hypervariable regions alone to software designed for MPS/NGS data [138].
the whole mtDNA genome was facilitated by attempts to reduce ana- Sequencer onboard software is another example of a pre-packaged
lysis errors. Errors such as ‘artificial recombination’ caused by the pipeline. MiSeq reporter is the onboard software of the MiSeq se-
mixing of independently-amplified hypervariable segments from dif- quencer. It uses the BWA aligner [139] for whole-genome alignment to
ferent individuals, low-quality raw data leading to sequence inter- the rCRS, then the GATK [54] for variant calling. Variants obtained
pretation mistakes and clerical errors led to the development of then undergo reassignment to annotate areas of length heteroplasmy,
guidelines to ensure that high-quality sequence was generated using and repeats [124,134,140]. Data could also be downloaded from the
Sanger sequencing [122,125]. The discovery of useful polymorphisms MiSeq Reporter software to be processed by alternative means [137].
outside the HVS (and control region), and the availability of MPS On the Ion Torrent the variant caller plugin, which contains the
platforms, which offer much higher production of information at lower aligner TMAP (Torrent Mapping Alignment Program), generates a VCF
costs than traditional Sanger methods, makes MPS a more suitable file from the sequence data stored on the server. The Ion Torrent soft-
option. ware has also been used in studies [136,141].
The ‘artificial recombination’ problem was ameliorated by the se-
quencing of the entire CR as one large amplicon, preventing mixture of 5.3. Databases
HVS segments. This led to the development of methods for targeted
MPS of the control region [126]. SNaPshot assays for population-spe- Haplogroup determination follows the acquisition of SNPs. A
cific SNPs in the control region have also been developed for population number of mtDNA databases and tools are available online to facilitate
screening [127] and are well-suited for degraded samples. A midi-sized the process.
enrichment technique used for the CR was later applied to the enrich- Phylotree is a regularly updated online phylogenetic tree of the all
ment of the entire mtGenome [128]. known mtDNA haplogroups [142], currently comprising over 5400
Sequencing of the whole mtGenome revealed that sequence varia- haplogroups (Build 17). Displayed on a web page, the SNP mutations
tions can exist in the mtDNA when the derived CR haplotypes are are shown at each haplogroup branch. For most terminal haplogroups,
identical [129,130]. These SNPs outside the hypervariable regions GenBank mtDNA sequences are provided for reference. Phylotree is a
provide additional resolution for mtDNA analysis. tree diagram and does not provide any query functions, users find a
Whole mtDNA sequencing allows analysis at maximum resolution haplotype by tracing a path through the branching SNPs.
and recent studies using the MPS approach have successfully generated HaploGrep is an online mtDNA haplogroup classifier that uses
haplotypes from blood [124,131] and single hair shaft samples [128]. phylogenetic information from PhyloTree to determine the haplogroup
Heteroplasmy, the presence of more than one mtDNA haplotype at a of mtDNA profiles [143,144]. File formats such as VCF and FASTA are
nucleotide position, can be detected at lower levels using MPS due to its accepted as input and the haplotype is automatically calculated. Hap-
high output [132,133]. The increased ability to assess low-level/minor loGrep 2 produces a lineage diagram of the mtDNA haplotype showing
components also allows the study of the impact of DNA damage on the the mutations leading to the particular type. There are also tests that

123
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

can be run within the framework to detect recombination, phantom- contribute valuable information to a forensic case, laboratories may
mutations and haplogroup discordance. wish to have the bioinformatic facility to examine all three in a routine
There are many other online mtDNA haplogroup classifiers that are manner. MPS, now affordable, is a compelling platform that can be used
based on Phylotree. MitoTool [145] is available both as an online query to analyse all three of these markers. MPS offers higher-throughput, a
system and as a downloadable tool; it supports queries in sequences or larger amount of data over Sanger sequencing methods and the se-
variants and generates a haplotype by alignment against rCRS and quence data generated is more informative than length data generated
RSRS. The tool accepts FASTA files containing the mtDNA sequence as by capillary electrophoresis.
input and allows queries of specific regions such as the control region, This paper provides an overview of the processing of raw sequence
or the entire mtGenome. HmtDB [146] is a database containing human data into file formats accepted by a variety of analysis tools, common
mtDNA annotated with population and variability data generated with workflows and tools, and the relevant databases that aid in inter-
the MToolBox tool [135]. The database offers queries in various criteria pretation.
such as geographical origin, haplogroups, haplotype-defining SNPs, Whole mtDNA is currently analysed post-alignment, while STRs and
subject age group, and tissue type. mtDNAmanager [147] offers a web SNPs could be analysed with, or without alignment. As a result, pipe-
interface to analyse quality, query haplotypes and store human mtDNA lines and tools for the unified processing of SNPs and STRs have begun
sequences. It is limited to the control region sequences. A list of more to emerge [14,101].
mtDNA tools is available online (https://isogg.org/wiki/MtDNA_tools). Many tools have been introduced, especially for STRs. Although
The majority of haplogrouping tools base assignments on hap- each tool was designed with theoretical strengths and weaknesses, it is
logroup-defining mutations, ignoring the extra information existing in difficult to make a declarative statement that one tool or pipeline
the form of private mutations. The sole reliance and focus on the functions better than another without a quantitative and qualitative
haplogroup-defining mutations without considering other distinctive comparison of their output. A future study on the comprehensive
information may lead to incorrect haplogroup assignment. EMMA comparison between the available tools would steer the forensic com-
[148] was designed to take private mutations into account during munity towards the optimal software solution.
haplotype estimation. Two databases are used for the estimation: one As sequencers make base-calling errors, databases and population
comprised of full mtGenome sequences obtained from GenBank and the data have been made useful in the conception of statistical values that
other being the virtual haplotypes, each with haplogroup-defining could be used for error correction. Currently, mtDNA analysis appears
mutations, derived from Phylotree. Phylotree was used to draft haplo- to be the most well-established of the three markers in terms of analysis
type estimates, while the whole mtGenome sequences were used to protocols and databases. STRs as an MPS marker still require further
calculate mutational stability that is applied to the haplotyping calcu- work in setting new nomenclature standards and repeat definitions.
lations. In conclusion, many bioinformatics tools for forensic analyses exist
EMPOP [38] is a forensic mtDNA database that holds high-quality for the analyses of each marker system. As many as there are already,
and directly accessible data. EMPOP is the main hub for mtDNA ana- tools and algorithms are constantly being developed in search of lower
lyses and provides a number of tools for analysis and interpretation. A processing times and novel algorithm paradigms.
string search algorithm, SAM, was devised for EMPOP database queries
that does not suffer bias when the query and database entry are an- Acknowledgements
notated differently [149]. The query function is provided for finding
mtDNA sequences within the database by SNPs. The matches are shown This work was supported by the ESR Strategic Science Investment
by origin and metapopulation with frequency values and graphed on an Fund. We would like to thank Lisa Melia and Bethany Forsythe for their
interactive world map. The population function allows users to search helpful comments in reviewing this paper. This work was conducted by
for published population data by fields such as accession number, Yao-Yuan Liu as part of the requirements for a Doctor of Philosophy
geographic affiliation, metapopulation and author. EMPOP does not degree for The University of Auckland, New Zealand.
currently support input in VCF or FASTA format.
EMPOP also hosts three mtDNA analysis and interpretation tools: References
the haplogroup browser, EMPcheck, and NETWORK (http://empop.
online/tools). The haplogroup browser (http://empop.online/hg_tree_ [1] A.D. Ambers, J.D. Churchill, J.L. King, M. Stoljarova, H. Gill-king, M. Assidi,
browser) is a searchable database of Phylotree haplotypes populated M. Abu-elmagd, A. Buhmeida, B. Budowle, More comprehensive forensic genetic
marker analyses for accurate human remains identification using massively par-
with the mtDNA sequences from EMPOP by EMMA [148]. A hap- allel DNA sequencing, BMC Genomics 17 (2016), http://dx.doi.org/10.1186/
logroup or haplotype is chosen and on an interactive world map, the s12864-016-3087-2.
locations of where it occurs are graphed. Haplogroup-defining SNPs [2] D.M. Bornman, M.E. Hester, J.M. Schuetter, M.D. Kasoji, A. Minard-Smith,
C.A. Barden, S.C. Nelson, G.D. Godbold, C.H. Baker, B. Yang, J.E. Walther,
could also be entered individually to find the haplogroup. I.E. Tornes, P.S. Yan, B. Rodriguez, R. Bundschuh, M.L. Dickens, B.A. Young,
The EMPcheck tool (http://empop.online/empcheck) is used for the S.A. Faith, Short-read, high-throughput sequencing technology for STR geno-
validation of the EMPOP format file (.emp) to ensure that the standards typing, Biotech Rapid Dispatches 1–6 (2012).
[3] A. Juras, M. Chyleński, M. Krenz-Niedbała, H. Malmström, E. Ehler, Ł. Pospieszny,
are met before submission of data to the database. S. Łukasik, J. Bednarczyk, J. Piontek, M. Jakobsson, M. Dabert, Investigating
The NETWORK tool (http://empop.online/network) is used to ex- kinship of Neolithic post-LBK human remains from Krusza Zamkowa, Poland using
amine the quality of mtDNA data by drawing a quasi-median (QM) ancient DNA, Forensic Sci. Int. Genet. 26 (2017) 30–39, http://dx.doi.org/10.
1016/j.fsigen.2016.10.008.
network, a web-like representation of the haplotypes that branch out
[4] E. Tasker, B. LaRue, C. Beherec, D. Gangitano, S. Hughes-Stamm, Analysis of DNA
from a root haplotype-based on mutation events, with virtual haplo- from post-blast pipe bomb fragments for identification and determination of an-
types (quasi-medians) joining the haplotypes. The QM network can be cestry, Forensic Sci. Int. Genet. 28 (2017) 195–202, http://dx.doi.org/10.1016/j.
used to confirm mutations or discover phantom mutations by con- fsigen.2017.02.016.
[5] X. Zeng, J.L. King, M. Stoljarova, D.H. Warshauer, B.L. LaRue, A. Sajantila,
necting two samples and examining the shape of the graph. The QM are J. Patel, D.R. Storts, B. Budowle, High sensitivity multiplex short tandem repeat
the primary points of interest, and appearance of complex reticulations loci analyses with massively parallel sequencing, Forensic Sci. Int. Genet. 16
due to a QM may be an error which should be confirmed by checking (2015) 38–47, http://dx.doi.org/10.1016/j.fsigen.2014.11.022.
[6] B. Budowle, D.H. Warshauer, S.B. Seo, J.L. King, C. Davis, B. LaRue, Deep se-
the raw data. quencing provides comprehensive multiplex capabilities, Forensic Sci. Int. Genet.
Suppl. Ser. 4 (2013) e334–e335, http://dx.doi.org/10.1016/j.fsigss.2013.10.170.
6. Discussion and conclusions [7] S.B. Seo, J.L. King, D.H. Warshauer, C.P. Davis, J. Ge, B. Budowle, Single nu-
cleotide polymorphism typing with massively parallel sequencing for human
identification, Int. J. Legal Med. 127 (2013) 1079–1086, http://dx.doi.org/10.
As each of the three marker types, mtDNA, STRs, and SNPs may

124
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

1007/s00414-013-0879-7. K. Turteltaub, M.A. Batzer, Inference of human geographic origins using alu in-
[8] C. Børsting, N. Morling, Next generation sequencing and its applications in for- sertion polymorphisms, Forensic Sci. Int. 153 (2005) 117–124, http://dx.doi.org/
ensic genetics, Forensic Sci. Int. Genet. 18 (2015) 78–89, http://dx.doi.org/10. 10.1016/j.forsciint.2004.10.017.
1016/j.fsigen.2015.02.002. [32] S.R. Tridico, D.C. Murray, J. Addison, K.P. Kirkbride, M. Bunce, Metagenomic
[9] G. Highnam, C. Franck, A. Martin, C. Stephens, A. Puthige, D. Mittelman, Accurate analyses of bacteria on human hairs: a qualitative assessment for applications in
human microsatellite genotypes from high-throughput resequencing data using forensic science, Investig. Genet. 5 (2014) 16, http://dx.doi.org/10.1186/s13323-
informed error profiles, Nucleic Acids Res. 41 (2013) 1–7, http://dx.doi.org/10. 014-0016-5.
1093/nar/gks981. [33] J.T. Hampton-Marcell, J.V. Lopez, J.A. Gilbert, The human microbiome: an
[10] S. Seneca, K. Vancampenhout, R. Van Coster, J. Smet, W. Lissens, A. Vanlander, emerging tool in forensics, Microb. Biotechnol. 10 (2017) 228–230, http://dx.doi.
B. De Paepe, A. Jonckheere, K. Stouffs, L. De Meirleir, Analysis of the whole mi- org/10.1111/1751-7915.12699.
tochondrial genome: translation of the Ion Torrent Personal Genome Machine [34] A.S. Khodakova, R.J. Smith, L. Burgoyne, D. Abarno, A. Linacre, Random whole
system to the diagnostic bench? Eur. J. Hum. Genet. (2014) 1–8, http://dx.doi. metagenomic sequencing for forensic discrimination of soils, PLoS One 9 (2014),
org/10.1038/ejhg.2014.49. http://dx.doi.org/10.1371/journal.pone.0104996.
[11] M.a. DePristo, E. Banks, R.E. Poplin, K.V. Garimella, J.R. Maguire, C. Hartl, [35] M. Arenas, F. Pereira, M. Oliveira, N. Pinto, A.M. Lopes, V. Gomes, A. Carracedo,
a.a. Philippakis, G. del Angel, M.a. Rivas, M. Hanna, A. McKenna, T.J. Fennell, A. Amorim, Forensic genetics and genomics: much more than just a human affair,
M.a. Kernytsky, Y.a. Sivachenko, K. Cibulskis, S.B. Gabriel, D. Altshuler, M.J. Daly, PLoS Genet. 13 (2017) 1–28, http://dx.doi.org/10.1371/journal.pgen.1006960.
A framework for variation discovery and genotyping using next- generation DNA [36] A. Auton, G.R. Abecasis, D.M. Altshuler, R.M. Durbin, D.R. Bentley, et al., A global
sequencing data, Nat. Genet. 43 (2011) 491–498, http://dx.doi.org/10.1038/ng. reference for human genetic variation, Nature 526 (2015) 68–74, http://dx.doi.
806.A. org/10.1038/nature15393.
[12] G. Highnam, J.J. Wang, D. Kusler, J. Zook, V. Vijayan, N. Leibovich, D. Mittelman, [37] P.H. Sudmant, T. Rausch, E.J. Gardner, R.E. Handsaker, A. Abyzov, et al., An in-
An analytical framework for optimizing variant discovery from personal genomes, tegrated map of structural variation in 2504 human genomes, Nature 526 (2015)
Nat. Commun. 6 (2015) 6275, http://dx.doi.org/10.1038/ncomms7275. 75–81, http://dx.doi.org/10.1038/nature15394.
[13] J. Hoogenboom, K.J. van der Gaag, R.H. de Leeuw, T. Sijen, P. de Knijff, [38] W. Parson, A. Dür, EMPOP-a forensic mtDNA database, Forensic Sci. Int. Genet. 1
J.F.J. Laros, FDSTools: a software package for analysis of massively parallel se- (2007) 88–92, http://dx.doi.org/10.1016/j.fsigen.2007.01.018.
quencing data with the ability to recognise and correct STR stutter and other PCR [39] H. Rajeevan, U. Soundararajan, J.R. Kidd, A.J. Pakstis, K.K. Kidd, ALFRED: an
or sequencing noise, Forensic Sci. Int. Genet. 27 (2017) 27–40, http://dx.doi.org/ allele frequency resource for research and teaching, Nucleic Acids Res. 40 (2012)
10.1016/j.fsigen.2016.11.007. 1–6, http://dx.doi.org/10.1093/nar/gkr924.
[14] C. Van Neste, M. Vandewoestyne, W. Van Criekinge, D. Deforce, F. Van [40] M. Bodner, I. Bastisch, J.M. Butler, R. Fimmers, P. Gill, L. Gusmão, N. Morling,
Nieuwerburgh, My-Forensic-Loci-queries (MyFLq) framework for analysis of for- C. Phillips, M. Prinz, P.M. Schneider, W. Parson, Recommendations of the DNA
ensic STR data generated by massive parallel sequencing, Forensic Sci. Int. Genet. Commission of the International Society for Forensic Genetics (ISFG) on quality
9 (2014) 1–8, http://dx.doi.org/10.1016/j.fsigen.2013.10.012. control of autosomal Short Tandem Repeat allele frequency databasing (STRidER),
[15] A.E. Woerner, J.L. King, B. Budowle, Fast STR allele identification with STRait Forensic Sci. Int. Genet. 24 (2016) 97–102, http://dx.doi.org/10.1016/j.fsigen.
razor 3.0, Forensic Sci. Int. Genet. 30 (2017) 18–23, http://dx.doi.org/10.1016/j. 2016.06.008.
fsigen.2017.05.008. [41] K.B. Gettings, L.A. Borsuk, D. Ballard, M. Bodner, B. Budowle, L. Devesse, J. King,
[16] S.L. Friis, A. Buchard, E. Rockenbauer, C. Børsting, N. Morling, Introduction of the W. Parson, C. Phillips, P.M. Vallone, STRSeq: a catalog of sequence diversity at
Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and human identification Short Tandem Repeat loci, Forensic Sci. Int. Genet. 31 (2017)
expansion of the Danish STR sequence database to 11 STRs, Forensic Sci. Int. 111–117, http://dx.doi.org/10.1016/j.fsigen.2017.08.017.
Genet. 21 (2016) 68–75, http://dx.doi.org/10.1016/j.fsigen.2015.12.006. [42] P.J.A. Cock, C.J. Fields, N. Goto, M.L. Heuer, P.M. Rice, The Sanger FASTQ file
[17] K. Sturk-Andreaggi, M.A. Peck, C. Boysen, P. Dekker, T.P. McMahon, format for sequences with quality scores, and the Solexa/Illumina FASTQ variants,
C.K. Marshall, AQME: a forensic mitochondrial DNA analysis tool for next-gen- Nucleic Acids Res. 38 (2009) 1767–1771, http://dx.doi.org/10.1093/nar/
eration sequencing data, Forensic Sci. Int. Genet. 31 (2017) 189–197, http://dx. gkp1137.
doi.org/10.1016/j.fsigen.2017.09.010. [43] S. Pabinger, A. Dander, M. Fischer, R. Snajder, M. Sperk, M. Efremova,
[18] J.M. Butler, Advanced Topics in Forensic DNA Typing: Methodology, (2012), B. Krabichler, M.R. Speicher, J. Zschocke, Z. Trajanoski, A survey of tools for
http://dx.doi.org/10.1016/B978-0-12-374513-2.00005-1. variant analysis of next-generation genome sequencing data, Brief. Bioinform. 15
[19] J.M. Butler, Advanced Topics in Forensic DNA Typing: Interpretation, (2015), (2014) 256–278, http://dx.doi.org/10.1093/bib/bbs086.
http://dx.doi.org/10.1016/B978-0-12-405213-0.00015-4. [44] M.P. Cox, D.A. Peterson, P.J. Biggs, SolexaQA: at-a-glance quality assessment of
[20] M.M. Bus, O. Karas, M. Allen, Multiplex pyrosequencing of InDel markers for Illumina second-generation sequencing data, BMC Bioinf. 11 (2010) 485, http://
forensic DNA analysis, Electrophoresis 37 (2016) 3039–3045, http://dx.doi.org/ dx.doi.org/10.1186/1471-2105-11-485.
10.1002/elps.201600255. [45] M.M. Rathbun, J.A. McElhoe, W. Parson, M.M. Holland, Considering DNA damage
[21] J.F. Ferragut, R. Pereira, J.A. Castro, C. Ramon, I. Nogueiro, A. Amorim, when interpreting mtDNA heteroplasmy in deep sequencing data, Forensic Sci. Int.
A. Picornell, Genetic diversity of 38 insertion-deletion polymorphisms in Jewish Genet. 26 (2017) 1–11, http://dx.doi.org/10.1016/j.fsigen.2016.09.008.
populations, Forensic Sci. Int. Genet. 21 (2016) 1–4, http://dx.doi.org/10.1016/j. [46] A. Cornish, C. Guda, A comparison of variant calling pipelines using genome in a
fsigen.2015.11.003. bottle as a reference, BioMed Res. Int. 2015 (2015), http://dx.doi.org/10.1155/
[22] R. Klein, C. Neumann, R. Roy, Detection of insertion/deletion polymorphisms 2015/456479.
from challenged samples using the Investigator DIPplex® Kit, Forensic Sci. Int. [47] S. Hwang, E. Kim, I. Lee, E.M. Marcotte, Systematic comparison of variant calling
Genet. 16 (2015) 29–37, http://dx.doi.org/10.1016/j.fsigen.2014.11.020. pipelines using gold standard personal exome variants, Sci. Rep. 5 (2015) 17875,
[23] F. Kader, M. Ghai, DNA methylation and application in forensic sciences, Forensic http://dx.doi.org/10.1038/srep17875.
Sci. Int. 249 (2015) 255–265, http://dx.doi.org/10.1016/j.forsciint.2015.01.037. [48] Bowtie Langmead, Nat. Methods 9 (2013) 357–359, http://dx.doi.org/10.1038/
[24] H.Y. Lee, S.E. Jung, E.H. Lee, W.I. Yang, K.J. Shin, DNA methylation profiling for a nmeth.1923.Fast.
confirmatory test for blood, saliva, semen, vaginal fluid and menstrual blood, [49] H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-
Forensic Sci. Int. Genet. 24 (2016) 75–82, http://dx.doi.org/10.1016/j.fsigen. MEM, arXiv Prepr. arXiv. 0 (2013) 3. arXiv:1303.3997 [q-bio.GN].
2016.06.007. [50] T.J. Treangen, S.L. Salzberg, Repetitive DNA and next-generation sequencing:
[25] D.S.B.S. Silva, J. Antunes, K. Balamurugan, G. Duncan, C.S. Alho, B. McCord, computational challenges and solutions, Nat. Rev. Genet. 13 (2013) 36–46, http://
Developmental validation studies of epigenetic DNA methylation markers for the dx.doi.org/10.1038/nrg3117.Repetitive.
detection of blood, semen and saliva samples, Forensic Sci. Int. Genet. 23 (2016) [51] S. Caboche, C. Audebert, Y. Lemoine, D. Hot, Comparison of mapping algorithms
55–63, http://dx.doi.org/10.1016/j.fsigen.2016.01.017. used in high-throughput sequencing: application to Ion Torrent data, BMC
[26] A. Vidaki, C.D. López, E. Carnero-Montoro, A. Ralf, K. Ward, T. Spector, J.T. Bell, Genomics 15 (2014) 264, http://dx.doi.org/10.1186/1471-2164-15-264.
M. Kayser, Epigenetic discrimination of identical twins from blood under the [52] N.A. Fonseca, J. Rung, A. Brazma, J.C. Marioni, Tools for mapping high-
forensic scenario, Forensic Sci. Int. Genet. 0 (2017) 67–80, http://dx.doi.org/10. throughput sequencing data, Bioinformatics 28 (2012) 3169–3177, http://dx.doi.
1016/j.fsigen.2017.07.014. org/10.1093/bioinformatics/bts605.
[27] A. Vidaki, D. Ballard, A. Aliferi, T.H. Miller, L.P. Barron, D. Syndercombe Court, [53] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth,
DNA methylation-based forensic age prediction using artificial neural networks G. Abecasis, R. Durbin, The sequence alignment/map format and SAMtools,
and next generation sequencing, Forensic Sci. Int. Genet. 28 (2017) 225–236, Bioinformatics 25 (2009) 2078–2079, http://dx.doi.org/10.1093/bioinformatics/
http://dx.doi.org/10.1016/j.fsigen.2017.02.009. btp352.
[28] S. Chadli, M. Wajih, H. Izaabel, Analysis of alu insertion polymorphism in South [54] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky,
Morocco (Souss): use of markers in forensic science, Open Forensic Sci. J. 2 (2009) K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M.A. DePristo, The genome analysis
1–5, http://dx.doi.org/10.2174/1874402800902010001. toolkit: a MapReduce framework for analyzing next-generation DNA sequencing
[29] A. Fazi, B. Gobeski, D. Foran, Development of two highly sensitive forensic sex data, Genome Res. 20 (2010) 1297–1303, http://dx.doi.org/10.1101/gr.107524.
determination assays based on human DYZ1 and alu repetitive DNA elements, 110.
Electrophoresis 35 (2014) 3028–3035, http://dx.doi.org/10.1002/elps. [55] P. Danecek, A. Auton, G. Abecasis, C.A. Albers, E. Banks, M.A. DePristo,
201400103. R.E. Handsaker, G. Lunter, G.T. Marth, S.T. Sherry, G. McVean, R. Durbin, The
[30] B.L. LaRue, S.K. Sinha, A.H. Montgomery, R. Thompson, L. Klaskala, J. Ge, J. King, variant call format and VCFtools, Bioinformatics 27 (2011) 2156–2158, http://dx.
M. Turnbough, B. Budowle, INNULs: a novel design amplification strategy for doi.org/10.1093/bioinformatics/btr330.
retrotransposable elements for studying population variation, Hum. Heredity 74 [56] R.A. Gibbs, J.W. Belmont, P. Hardenbol, T.D. Willis, F. Yu, H. Yang, L.-Y. Ch’ang,
(2012) 27–35, http://dx.doi.org/10.1159/000343050. W. Huang, B. Liu, Y. Shen, The international HapMap project, Nature 426 (2003)
[31] D.A. Ray, J.A. Walker, A. Hall, B. Llewellyn, J. Ballantyne, A.T. Christian, 789–796.

125
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

[57] B. Budowle, A. Van Daal, Forensically relevant SNP classes, Biotechniques 44 [79] Y. Ma, J.-Z. Kuang, T.-G. Nie, W. Zhu, Z. Yang, Next generation sequencing: im-
(2008) 603–610, http://dx.doi.org/10.2144/000112806. proved resolution for paternal/maternal duos analysis, Forensic Sci. Int. Genet. 24
[58] B. Sobrino, M. Brión, A. Carracedo, SNPs in forensic genetics: a review on SNP (2016) 83–85, http://dx.doi.org/10.1016/j.fsigen.2016.05.015.
typing methodologies, Forensic Sci. Int. 154 (2005) 181–194, http://dx.doi.org/ [80] A.C. Jäger, M.L. Alvarez, C.P. Davis, E. Guzmán, Y. Han, L. Way, P. Walichiewicz,
10.1016/j.forsciint.2004.10.020. D. Silva, N. Pham, G. Caves, J. Bruand, F. Schlesinger, S.J.K. Pond, J. Varlaro,
[59] Applied Biosystems, SNaPshot ® Multiplex System for SNP genotyping, (2012) 1–4. K.M. Stephens, C.L. Holt, Developmental validation of the MiSeq FGx forensic
10.1093/gbe/evt066. genomics system for targeted next generation sequencing in forensic DNA case-
[60] B. Mehta, R. Daniel, C. Phillips, D. McNevin, Forensically relevant SNaPshot® as- work and database laboratories, Forensic Sci. Int. Genet. 28 (2017) 52–70, http://
says for human DNA SNP analysis: a review, Int. J. Legal Med. 131 (2017) 21–37, dx.doi.org/10.1016/j.fsigen.2017.01.011.
http://dx.doi.org/10.1007/s00414-016-1490-5. [81] F. Liu, K. van Duijn, J.R. Vingerling, A. Hofman, A.G. Uitterlinden,
[61] C. Phillips, R. Fang, D. Ballard, M. Fondevila, C. Harrison, F. Hyland, E. Musgrave- A.C.J.W. Janssens, M. Kayser, Eye color and the prediction of complex phenotypes
Brown, C. Proff, E. Ramos-Luis, B. Sobrino, A. Carracedo, M.R. Furtado, D.S. Court, from genotypes, Curr. Biol. 19 (2009) R192–R193, http://dx.doi.org/10.1016/j.
P.M. Schneider, Evaluation of the Genplex SNP typing system and a 49plex for- cub.2009.01.027.
ensic marker panel, Forensic Sci. Int. Genet. 1 (2007) 180–185, http://dx.doi.org/ [82] S. Walsh, F. Liu, A. Wollstein, L. Kovatsi, A. Ralf, A. Kosiniak-Kamysz, W. Branicki,
10.1016/j.fsigen.2007.02.007. M. Kayser, The HIrisPlex system for simultaneous prediction of hair and eye colour
[62] R. Daniel, C. Santos, C. Phillips, M. Fondevila, R.A.H. Van Oorschot, from DNA, Forensic Sci. Int. Genet. 7 (2013) 98–115, http://dx.doi.org/10.1016/j.
M.V. Carracedo, D. Lareu, McNevin: a SNaPshot of next generation sequencing for fsigen.2012.07.005.
forensic SNP analysis, Forensic Sci. Int. Genet. 14 (2015) 50–60, http://dx.doi. [83] C.M. Ruitberg, D.J. Reeder, J.M. Butler, STRBase: a short tandem repeat DNA
org/10.1016/j.fsigen.2014.08.013. database for the human identity testing community, Nucleic Acids Res. 29 (2001)
[63] M. Eduardoff, T.E. Gross, C. Santos, M. de la Puente, D. Ballard, C. Strobl, 320–322, http://dx.doi.org/10.1093/nar/29.1.320.
C. Børsting, N. Morling, L. Fusco, C. Hussing, B. Egyed, L. Souto, J. Uacyisrael, [84] S.T. Sherry, dbSNP: the NCBI database of genetic variation, Nucleic Acids Res. 29
D. Syndercombe Court, Á. Carracedo, M.V. Lareu, P. Schneider, W. Parson, (2001) 308–311, http://dx.doi.org/10.1093/nar/29.1.308.
C. Phillips, W. Parson, C. Phillips, Inter-laboratory evaluation of the [85] H. Rajeevan, U. Soundararajan, A.J. Pakstis, K.K. Kidd, Introducing the forensic
EUROFORGEN global ancestry-informative SNP panel by massively parallel se- research/reference on genetics knowledge base, FROG-kb, Investig. Genet. 3
quencing using the Ion PGM™, Forensic Sci. Int. Genet. 23 (2016) 178–189, http:// (2012) 18, http://dx.doi.org/10.1186/2041-2223-3-18.
dx.doi.org/10.1016/j.fsigen.2016.04.008. [86] J.M. Butler, Short tandem repeat typing technologies used in human identity
[64] C.X. Li, A.J. Pakstis, L. Jiang, Y.L. Wei, Q.F. Sun, H. Wu, O. Bulbul, P. Wang, testing, Biotechniques 43 (2007), http://dx.doi.org/10.2144/000112582.
L.L. Kang, J.R. Kidd, K.K. Kidd, A panel of 74 AISNPs: improved ancestry inference [87] D.Y. Wang, S. Gopinath, R.E. Lagacé, W. Norona, L.K. Hennessy, M.L. Short,
within Eastern Asia, Forensic Sci. Int. Genet. 23 (2016) 101–110, http://dx.doi. J.J. Mulero, Developmental validation of the GlobalFiler® express PCR amplifica-
org/10.1016/j.fsigen.2016.04.002. tion kit: a 6-dye multiplex assay for the direct amplification of reference samples,
[65] S. Walsh, L. Chaitanya, L. Clarisse, L. Wirken, J. Draus-Barini, L. Kovatsi, Forensic Sci. Int. Genet. 19 (2015) 148–155, http://dx.doi.org/10.1016/j.fsigen.
H. Maeda, T. Ishikawa, T. Sijen, P. de Knijff, W. Branicki, F. Liu, M. Kayser, 2015.07.013.
Developmental validation of the HIrisPlex system: DNA-based eye and hair colour [88] M.G. Ensenberger, C.R. Hill, R.S. McLaren, C.J. Sprecher, D.R. Storts,
prediction for forensic and anthropological usage, Forensic Sci. Int. Genet. 9 Developmental validation of the PowerPlex® 21 system, Forensic Sci. Int. Genet. 9
(2014) 150–161, http://dx.doi.org/10.1016/j.fsigen.2013.12.006. (2014) 169–178, http://dx.doi.org/10.1016/j.fsigen.2013.12.005.
[66] C. Santos, C. Phillips, M. Fondevila, R. Daniel, R.A.H. Van Oorschot, [89] M. Kraemer, A. Prochnow, M. Bussmann, M. Scherer, R. Peist, C. Steffen,
E.G. Burchard, M.S. Schanfield, L. Souto, J. Uacyisrael, M. Via, Á. Carracedo, Developmental validation of QIAGEN Investigator ® 24plex QS Kit and Investigator
®
M.V. Lareu, Pacifiplex: an ancestry-informative SNP panel centred on Australia 24plex GO! Kit: two 6-dye multiplex assays for the extended CODIS core loci,
and the Pacific region, Forensic Sci. Int. Genet. 20 (2016) 71–80, http://dx.doi. Forensic Sci. Int. Genet. 29 (2017) 9–20, http://dx.doi.org/10.1016/j.fsigen.2017.
org/10.1016/j.fsigen.2015.10.003. 03.012.
[67] K.K. Kidd, A.J. Pakstis, W.C. Speed, E.L. Grigorenko, S.L.B. Kajuna, N.J. Karoma, [90] J.D. Churchill, S.E. Schmedes, J.L. King, B. Budowle, Evaluation of the Illumina ®
S. Kungulilo, J.J. Kim, R.B. Lu, A. Odunsi, F. Okonofua, J. Parnas, L.O. Schulz, beta version ForenSeq™ DNA signature prep kit for use in genetic profiling,
O.V. Zhukova, J.R. Kidd, Developing a SNP panel for forensic identification of Forensic Sci. Int. Genet. 20 (2016) 20–29, http://dx.doi.org/10.1016/j.fsigen.
individualsi, Forensic Sci. Int. 164 (2006) 20–32, http://dx.doi.org/10.1016/j. 2015.09.009.
forsciint.2005.11.017. [91] X. Zeng, J. King, S. Hermanson, J. Patel, D.R. Storts, B. Budowle, An evaluation of
[68] S. Zhang, Y. Bian, A. Chen, H. Zheng, Y. Gao, Y. Hou, C. Li, Developmental vali- the PowerSeq™ auto system: a multiplex short tandem repeat marker kit compa-
dation of a custom panel including 273 SNPs for forensic application using Ion tible with massively parallel sequencing, Forensic Sci. Int. Genet. 19 (2015)
Torrent PGM, Forensic Sci. Int. Genet. 27 (2017) 50–57, http://dx.doi.org/10. 172–179, http://dx.doi.org/10.1016/j.fsigen.2015.07.015.
1016/j.fsigen.2016.12.003. [92] K.J. van der Gaag, R.H. de Leeuw, J. Hoogenboom, J. Patel, D.R. Storts,
[69] A. Sarkar, M.R. Nandineni, Development of a SNP-based panel for human iden- J.F.J. Laros, P. de Knijff, Massively Parallel sequencing of short tandem re-
tification for Indian populations, Forensic Sci. Int. Genet. 27 (2017) 58–66, http:// peats—population data and mixture analysis results for the PowerSeq™ system,
dx.doi.org/10.1016/j.fsigen.2016.12.002. Forensic Sci. Int. Genet. 24 (2016) 86–96, http://dx.doi.org/10.1016/j.fsigen.
[70] V. Pereira, H.S. Mogensen, C. Børsting, N. Morling, Evaluation of the precision ID 2016.05.016.
Ancestry Panel for crime case work: a SNP typing assay developed for typing of [93] C. Van Neste, D. Deforce, F. Van Nieuwerburgh, Effect of multiple allelic drop-outs
165 ancestral informative markers, Forensic Sci. Int. Genet. 28 (2017) 138–145, in forensic RMNE calculations, Forensic Sci. Int. Genet. 19 (2015) 243–249,
http://dx.doi.org/10.1016/j.fsigen.2017.02.013. http://dx.doi.org/10.1016/j.fsigen.2015.08.001.
[71] J.M. Butler, M.D. Coble, P.M. Vallone, STRs vs. SNPs: thoughts on the future of [94] D. Zubakov, I. Kokmeijer, A. Ralf, N. Rajagopalan, L. Calandro, S. Wootton,
forensic DNA testing, Forensic Sci. Med. Pathol. 3 (2007) 200–205, http://dx.doi. R. Langit, C. Chang, R. Lagace, M. Kayser, Towards simultaneous individual and
org/10.1007/s12024-007-0018-1. tissue identification: a proof-of-principle study on parallel sequencing of STRs,
[72] K.K. Kidd, A.J. Pakstis, W.C. Speed, R. Lagace, J. Chang, S. Wootton, N. Ihuegbu, amelogenin, and mRNAs with the Ion Torrent PGM, Forensic Sci. Int. Genet. 17
Microhaplotype loci are a powerful new type of forensic marker, Forensic Sci. Int. (2015) 122–128, http://dx.doi.org/10.1016/j.fsigen.2015.04.002.
Genet. Suppl. Ser. 4 (2013) e123–e124, http://dx.doi.org/10.1016/j.fsigss.2013. [95] K.B. Gettings, K.M. Kiesler, S.A. Faith, E. Montano, C.H. Baker, B.A. Young,
10.063. R.A. Guerrieri, P.M. Vallone, Sequence variation of 22 autosomal STR loci de-
[73] K.K. Kidd, W.C. Speed, Criteria for selecting microhaplotypes: mixture detection tected by next generation sequencing, Forensic Sci. Int. Genet. 21 (2016) 15–21,
and deconvolution, Investig. Genet. 6 (2015) 1, http://dx.doi.org/10.1186/ http://dx.doi.org/10.1016/j.fsigen.2015.11.005.
s13323-014-0018-3. [96] F.R. Wendt, J.L. King, N.M.M.M. Novroski, J.D. Churchill, J. Ng, R.F. Oldt,
[74] K.K. Kidd, W.C. Speed, A.J. Pakstis, D.S. Podini, R. Lagacé, J. Chang, S. Wootton, K.L. McCulloh, J.A. Weise, D.G. Smith, S. Kanthaswamy, B. Budowle, Flanking
E. Haigh, U. Soundararajan, Evaluating 130 microhaplotypes across a global set of region variation of ForenSeq™ DNA signature prep kit STR and SNP loci in Yavapai
83 populations, Forensic Sci. Int. Genet. 29 (2017) 29–37, http://dx.doi.org/10. Native Americans, Forensic Sci. Int. Genet. 28 (2017) 146–154, http://dx.doi.org/
1016/j.fsigen.2017.03.014. 10.1016/j.fsigen.2017.02.014.
[75] C. Santos, C. Phillips, A. Gomez-Tato, J. Alvarez-Dios, Á. Carracedo, M.V. Lareu, [97] N.M.M. Novroski, J.L. King, J.D. Churchill, L.H. Seah, B. Budowle,
Inference of ancestry in forensic analysis II: analysis of genetic data, Methods Mol. Characterization of genetic sequence variation of 58 STR loci in four major po-
Biol. U. S. (2016) 255–285, http://dx.doi.org/10.1007/978-1-4939-3597-0_19. pulation groups, Forensic Sci. Int. Genet. 25 (2016) 214–226, http://dx.doi.org/
[76] K.B. Gettings, K.M. Kiesler, P.M. Vallone, Performance of a next generation se- 10.1016/j.fsigen.2016.09.007.
quencing SNP assay on degraded DNA, Forensic Sci. Int. Genet. 19 (2015) 1–9, [98] F. Casals, R. Anglada, N. Bonet, R. Rasal, K.J. van der Gaag, J. Hoogenboom,
http://dx.doi.org/10.1016/j.fsigen.2015.04.010. N. Solé-Morata, D. Comas, F. Calafell, Length and repeat-sequence variation in 58
[77] F.R. Wendt, J.D. Churchill, N.M.M. Novroski, J.L. King, J. Ng, R.F. Oldt, STRs and 94 SNPs in two Spanish populations, Forensic Sci. Int. Genet. 30 (2017)
K.L. McCulloh, J.A. Weise, D.G. Smith, S. Kanthaswamy, B. Budowle, Genetic 66–70, http://dx.doi.org/10.1016/j.fsigen.2017.06.006.
analysis of the Yavapai Native Americans from West-Central Arizona using the [99] M. Gymrek, D. Golan, S. Rosset, Y. Erlich, lobSTR: a short tandem repeat profiler
Illumina MiSeq FGx™ forensic genomics system, Forensic Sci. Int. Genet. 24 (2016) for personal genomes lobSTR: a short tandem repeat profiler for personal genomes,
18–23, http://dx.doi.org/10.1016/j.fsigen.2016.05.008. Genome Res. (2012) 1154–1162, http://dx.doi.org/10.1101/gr.135780.111.
[78] M. Eduardoff, C. Santos, M. De La Puente, T.E. Gross, M. Fondevila, C. Strobl, [100] D.H. Warshauer, D. Lin, K. Hari, R. Jain, C. Davis, B. LaRue, J.L. King, B. Budowle,
B. Sobrino, D. Ballard, P.M. Schneider, A. Carracedo, M.V. Lareu, W. Parson, STRait razor: a length-based forensic STR allele-calling tool for use with second
C. Phillips, Inter-laboratory evaluation of SNP-based forensic identification by generation sequencing data, Forensic Sci. Int. Genet. 7 (2013) 409–417, http://dx.
massively parallel sequencing using the Ion PGM™, Forensic Sci. Int. Genet. 17 doi.org/10.1016/j.fsigen.2013.04.005.
(2015) 110–121, http://dx.doi.org/10.1016/j.fsigen.2015.04.007. [101] J.L. King, F.R. Wendt, J. Sun, B. Budowle, STRait Razor v2s: advancing sequence-

126
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

based STR allele reporting and beyond to other marker systems, Forensic Sci. Int. [124] J.L. King, B.L. LaRue, N.M. Novroski, M. Stoljarova, S.B. Seo, X. Zeng,
Genet. 29 (2017) 21–28, http://dx.doi.org/10.1016/j.fsigen.2017.03.013. D.H. Warshauer, C.P. Davis, W. Parson, A. Sajantila, B. Budowle, High-quality and
[102] S.Y. Anvar, K.J. Van Der Gaag, J.W.F. Van Der Heijden, Veltrop M.H.A.M, high-throughput massively parallel sequencing of the human mitochondrial
R.H.A.M. Vossen, R.H. De Leeuw, C. Breukel, H.P.J. Buermans, J.S. Verbeek, P. De genome using the Illumina MiSeq, Forensic Sci. Int. Genet. 12 (2014) 128–135,
Knijff, J.T. Den Dunnen, J.F.J. Laros, TSSV: a tool for characterization of complex http://dx.doi.org/10.1016/j.fsigen.2014.06.001.
allelic variants in pure and mixed genomes, Bioinformatics 30 (2014) 1651–1659, [125] W. Parson, L. Gusmão, D.R. Hares, J.A. Irwin, W.R. Mayr, N. Morling, E. Pokorak,
http://dx.doi.org/10.1093/bioinformatics/btu068. M. Prinz, A. Salas, P.M. Schneider, T.J. Parsons, DNA Commission of the
[103] J.C.I. Lee, B. Tseng, L.K. Chang, A. Linacre, SEQ Mapper: a DNA sequence International Society for Forensic Genetics: revised and extended guidelines for
searching tool for massively parallel sequencing data, Forensic Sci. Int. Genet. 26 mitochondrial DNA typing, Forensic Sci. Int. Genet. 13 (2014) 134–142, http://dx.
(2017) 66–69, http://dx.doi.org/10.1016/j.fsigen.2016.10.006. doi.org/10.1016/j.fsigen.2014.07.010.
[104] K.J. van der Gaag, P. de Knijff, Forensic nomenclature for short tandem repeats [126] E.Y. Lee, H.Y. Lee, S.Y. Oh, S.E. Jung, I.S. Yang, Y.H. Lee, W.I. Yang, K.J. Shin,
updated for sequencing, Forensic Sci. Int. Genet. Suppl. Ser. 5 (2015) e542–e544, Massively parallel sequencing of the entire control region and targeted coding
http://dx.doi.org/10.1016/j.fsigss.2015.09.214. region SNPs of degraded mtDNA using a simplified library preparation method,
[105] W. Parson, D. Ballard, B. Budowle, J.M. Butler, K.B. Gettings, P. Gill, L. Gusmão, Forensic Sci. Int. Genet. 22 (2016) 37–43, http://dx.doi.org/10.1016/j.fsigen.
D.R. Hares, J.A. Irwin, J.L. King, P. De Knijff, N. Morling, M. Prinz, P.M. Schneider, 2016.01.014.
C. Van Neste, S. Willuweit, C. Phillips, Massively parallel sequencing of forensic [127] N.E.C. Weiler, G. de Vries, T. Sijen, Development of a control region-based mtDNA
STRs: considerations of the DNA commission of the International Society for SNaPshot™ selection tool, integrated into a mini amplicon sequencing method, Sci.
Forensic Genetics (ISFG) on minimal nomenclature requirements, Forensic Sci. Int. Justice 56 (2016) 96–103, http://dx.doi.org/10.1016/j.scijus.2015.11.003.
Genet. 22 (2016) 54–63, http://dx.doi.org/10.1016/j.fsigen.2016.01.009. [128] W. Parson, G. Huber, L. Moreno, M.B. Madel, M.D. Brandhagen, S. Nagl, C. Xavier,
[106] E.H. Kim, H.Y. Lee, I.S. Yang, S.E. Jung, W.I. Yang, K.J. Shin, Massively parallel M. Eduardoff, T.C. Callaghan, J.A. Irwin, Massively parallel sequencing of com-
sequencing of 17 commonly used forensic autosomal STRs and amelogenin with plete mitochondrial genomes from hair shaft samples, Forensic Sci. Int. Genet. 15
small amplicons, Forensic Sci. Int. Genet. 22 (2016) 1–7, http://dx.doi.org/10. (2015) 8–15, http://dx.doi.org/10.1016/j.fsigen.2014.11.009.
1016/j.fsigen.2016.01.001. [129] M.D. Coble, R.S. Just, J.E. O’Callaghan, I.H. Letmanyi, C.T. Peterson, J.A. Irwin,
[107] F. Guo, Y. Zhou, F. Liu, J. Yu, H. Song, H. Shen, B. Zhao, F. Jia, G. Hou, X. Jiang, T.J. Parsons, Single nucleotide polymorphisms over the entire mtDNA genome
Evaluation of the early access STR kit v1 on the Ion Torrent PGM™ platform, that increase the power of forensic testing in Caucasians, Int. J. Legal Med. 118
Forensic Sci. Int. Genet. 23 (2016) 111–120, http://dx.doi.org/10.1016/j.fsigen. (2004) 137–146, http://dx.doi.org/10.1007/s00414-004-0427-6.
2016.04.004. [130] R.S. Just, M.D. Leney, S.M. Barritt, C.W. Los, B.C. Smith, T.D. Holland,
[108] S. Lim, J.P. Youn, S.O. Moon, Y.H. Nam, S.B. Hong, D. Choi, M. Han, S.Y. Hwang, T.J. Parsons, The use of mitochondrial DNA single nucleotide polymorphisms to
Characterization of human short tandem repeats (STRs) for individual identifica- assist in the resolution of three challenging forensic cases, J. Forensic Sci. 54
tion using the Ion Torrent, BioChip J. 9 (2015) 164–172, http://dx.doi.org/10. (2009) 887–891, http://dx.doi.org/10.1111/j.1556-4029.2009.01069.x.
1007/s13206-015-9210-7. [131] M. Bodner, A. Iuvaro, C. Strobl, S. Nagl, G. Huber, S. Pelotti, D. Pettener,
[109] H. Tae, K.W. McMahon, R.E. Settlage, J.H. Bavarva, H.R. Garner, ReviSTER: an D. Luiselli, W. Parson, Helena, the hidden beauty: resolving the most common
automated pipeline to revise misaligned reads to simple tandem repeats, West Eurasian mtDNA control region haplotype by massively parallel sequencing
Bioinformatics 29 (2013) 1734–1741, http://dx.doi.org/10.1093/bioinformatics/ an Italian population sample, Forensic Sci. Int. Genet. 15 (2015) 21–26, http://dx.
btt277. doi.org/10.1016/j.fsigen.2014.09.012.
[110] K. Kojima, Y. Kawai, K. Misawa, T. Mimori, M. Nagasaki, STR-realigner: a rea- [132] R.S. Just, J.A. Irwin, W. Parson, Mitochondrial DNA heteroplasmy in the emerging
lignment method for short tandem repeat regions, BMC Genomics 17 (2016) 991, field of massively parallel sequencing, Forensic Sci. Int. Genet. 18 (2015) 131–139,
http://dx.doi.org/10.1186/s12864-016-3294-x. http://dx.doi.org/10.1016/j.fsigen.2015.05.003.
[111] M.D. Cao, E. Tasker, K. Willadsen, M. Imelfort, S. Vishwanathan, S. Sureshkumar, [133] K. Skonieczna, B. Malyarchuk, A. Jawień, A. Marszałek, Z. Banaszkiewicz,
S. Balasubramanian, M. Boden, Inferring short tandem repeat variation from P. Jarmocik, M. Borcz, P. Bała, T. Grzybowski, Heteroplasmic substitutions in the
paired-end short reads, Nucleic Acids Res. 42 (2014), http://dx.doi.org/10.1093/ entire mitochondrial genomes of human colon cells detected by ultra-deep 454
nar/gkt1313. sequencing, Forensic Sci. Int. Genet. 15 (2015) 16–20, http://dx.doi.org/10.1016/
[112] K. Kojima, Y. Kawai, N. Nariai, T. Mimori, T. Hasegawa, M. Nagasaki, Short j.fsigen.2014.10.021.
tandem repeat number estimation from paired-end reads for multiple individuals [134] J.L. King, A. Sajantila, B. Budowle, MitoSAVE: mitochondrial sequence analysis of
by considering coalescent tree, BMC Genomics 17 (2016) 494, http://dx.doi.org/ variants in Excel, Forensic Sci. Int. Genet. 12 (2014) 122–125, http://dx.doi.org/
10.1186/s12864-016-2821-0. 10.1016/j.fsigen.2014.05.013.
[113] T. Willems, D. Zielinski, J. Yuan, A. Gordon, M. Gymrek, Y. Erlich, Genome-wide [135] C. Calabrese, D. Simone, M.A. Diroma, M. Santorsola, C. Gutta, G. Gasparre,
profiling of heritable and de novo STR variations, Nat. Methods 14 (2017) E. Picardi, G. Pesole, M. Attimonelli, MToolBox: a highly automated pipeline for
590–592, http://dx.doi.org/10.1038/nmeth.4267. heteroplasmy annotation and prioritization analysis of human mitochondrial
[114] D.H. Warshauer, J.L. King, B. Budowle, STRait razor v2.0: the improved STR allele variants in high-throughput sequencing, Bioinformatics 30 (2014) 3115–3117,
identification tool-razor, Forensic Sci. Int. Genet. 14 (2015) 182–186, http://dx. http://dx.doi.org/10.1093/bioinformatics/btu483.
doi.org/10.1016/j.fsigen.2014.10.011. [136] Y. Zhou, F. Guo, J. Yu, F. Liu, J. Zhao, H. Shen, B. Zhao, F. Jia, Z. Sun, H. Song,
[115] S.P. Sadedin, B. Pope, A. Oshlack, Bpipe: a tool for running and managing X. Jiang, Strategies for complete mitochondrial genome sequencing on Ion Torrent
bioinformatics pipelines, Bioinformatics 28 (2012) 1525–1526, http://dx.doi.org/ PGM™ platform in forensic sciences, Forensic Sci. Int. Genet. 22 (2016) 11–21,
10.1093/bioinformatics/bts167. http://dx.doi.org/10.1016/j.fsigen.2016.01.004.
[116] C. Van Neste, Y. Gansemans, D. De Coninck, D. Van Hoofstat, W. Van Criekinge, [137] M.A. Peck, M.D. Brandhagen, C. Marshall, T.M. Diegoli, J.A. Irwin, K. Sturk-
D. Deforce, F. Van Nieuwerburgh, Forensic massively parallel sequencing data Andreaggi, Concordance and reproducibility of a next generation mtGenome se-
analysis tool: implementation of MyFLq as a standalone web- and Illumina quencing method for high-quality samples using the Illumina MiSeq, Forensic Sci.
BaseSpace®-application, Forensic Sci. Int. Genet. 15 (2015) 2–7, http://dx.doi.org/ Int. Genet. 24 (2016) 103–111, http://dx.doi.org/10.1016/j.fsigen.2016.06.003.
10.1016/j.fsigen.2014.10.006. [138] M.M. Holland, E.D. Pack, J.A. McElhoe, Evaluation of GeneMarker ® HTS for im-
[117] S. Zhang, Y. Bian, H. Tian, Z. Wang, Z. Hu, C. Li, Development and validation of a proved alignment of mtDNA MPS data, haplotype determination, and hetero-
new STR 25-plex typing system, Forensic Sci. Int. Genet. 17 (2015) 61–69, http:// plasmy assessment, Forensic Sci. Int. Genet. 28 (2017) 90–98, http://dx.doi.org/
dx.doi.org/10.1016/j.fsigen.2015.03.008. 10.1016/j.fsigen.2017.01.016.
[118] E. Lyons, M. Scheible, K. Andreaggi, J. Irwin, R. Just, A high-throughput Sanger [139] H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler
strategy for human mitochondrial genome sequencing, BMC Genomics 14 (2013) transform, Bioinformatics 25 (2009) 1754–1760, http://dx.doi.org/10.1093/
881, http://dx.doi.org/10.1186/1471-2164-14-881. bioinformatics/btp324.
[119] S. Anderson, A.T. Bankier, B.G. Barrell, M.H. de Bruijn, A.R. Coulson, J. Drouin, [140] J.A. McElhoe, M.M. Holland, K.D. Makova, M.S.-W. Su, I.M. Paul, C.H. Baker,
I.C. Eperon, D.P. Nierlich, B.A. Roe, F. Sanger, P.H. Schreier, A.J. Smith, R. Staden, S.A. Faith, B. Young, Development and assessment of an optimized next-genera-
I.G. Young, Sequence and organization of the human mitochondrial genome, tion DNA sequencing approach for the mtgenome using the Illumina MiSeq,
Nature 290 (1981) 457–465, http://dx.doi.org/10.1038/290457a0. Forensic Sci. Int. Genet. 13 (2014) 20–29, http://dx.doi.org/10.1016/j.fsigen.
[120] R.M. Andrews, I. Kubacka, P.F. Chinnery, R.N. Lightowlers, D.M. Turnbull, 2014.05.007.
N. Howell, Reanalysis and revision of the Cambridge reference sequence for [141] W. Parson, C. Strobl, G. Huber, B. Zimmermann, S.M. Gomes, L. Souto, L. Fendt,
human mitochondrial DNA, Nat. Genet. 23 (1999) 147, http://dx.doi.org/10. R. Delport, R. Langit, S. Wootton, R. Lagacé, J. Irwin, Evaluation of next genera-
1038/13779. tion mtGenome sequencing using the Ion Torrent personal genome machine
[121] B.C. Levin, H. Cheng, D.J. Reeder, A human mitochondrial DNA standard re- (PGM), Forensic Sci. Int. Genet. 7 (2013) 632–639, http://dx.doi.org/10.1016/j.
ference material for quality control in forensic identification, medical diagnosis, fsigen.2013.09.007.
and mutation detection, Genomics 55 (1999) 135–146, http://dx.doi.org/10. [142] M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global
1006/geno.1998.5513. human mitochondrial DNA variation, Hum. Mutat. 30 (2009) 386–394, http://dx.
[122] W. Parson, H.J. Bandelt, Extended guidelines for mtDNA typing of population data doi.org/10.1002/humu.20921.
in forensic science, Forensic Sci. Int. Genet. 1 (2007) 13–19, http://dx.doi.org/10. [143] A. Kloss-Brandstätter, D. Pacher, S. Schönherr, H. Weissensteiner, R. Binna,
1016/j.fsigen.2006.11.003. G. Specht, F. Kronenberg, HaploGrep: a fast and reliable algorithm for automatic
[123] M. Scheible, R. Just, K. Sturk-Andreaggi, J. Saunier, W. Parson, T. Parsons, classification of mitochondrial DNA haplogroups, Hum. Mutat. 32 (2011) 25–32,
M. Coble, J. Irwin, The mitochondrial landscape of African Americans: an ex- http://dx.doi.org/10.1002/humu.21382.
amination of morethan 2500 control region haplotypes from 22 U.S. locations, [144] H. Weissensteiner, D. Pacher, A. Kloss-Brandstätter, L. Forer, G. Specht, H.-
Forensic Sci. Int. Genet. 22 (2016) 139–148, http://dx.doi.org/10.1016/j.fsigen. J. Bandelt, F. Kronenberg, A. Salas, S. Schönherr, HaploGrep 2: mitochondrial
2016.01.002. haplogroup classification in the era of high-throughput sequencing, Nucleic Acids

127
Y.-Y. Liu, S. Harbison Forensic Science International: Genetics 33 (2018) 117–128

Res. 44 (2016), http://dx.doi.org/10.1093/nar/gkw233 W58-63. based tool for the management and quality analysis of mitochondrial DNA control-
[145] L. Fan, Y.G. Yao, MitoTool: a web server for the analysis and retrieval of human region sequences, BMC Bioinf. 9 (2008) 483, http://dx.doi.org/10.1186/1471-
mitochondrial DNA sequence variations, Mitochondrion 11 (2011) 351–356, 2105-9-483.
http://dx.doi.org/10.1016/j.mito.2010.09.013. [148] A.W. Röck, A. Dür, M. Van Oven, W. Parson, Concept for estimating mitochondrial
[146] F. Rubino, R. Piredda, F.M. Calabrese, D. Simone, M. Lang, C. Calabrese, DNA haplogroups using a maximum likelihood approach (EMMA), Forensic Sci.
V. Petruzzella, M. Tommaseo-Ponzetta, G. Gasparre, M. Attimonelli, HmtDB: a Int. Genet. 7 (2013) 601–609, http://dx.doi.org/10.1016/j.fsigen.2013.07.005.
genomic resource for mitochondrion-based human variability studies, Nucleic [149] A. Röck, J. Irwin, A. Dür, T. Parsons, W. Parson, SAM: string-based sequence
Acids Res. 40 (2012) 1150–1159, http://dx.doi.org/10.1093/nar/gkr1086. search algorithm for mitochondrial DNA database queries, Forensic Sci. Int. Genet.
[147] H.Y. Lee, I. Song, E. Ha, S.-B. Cho, W.I. Yang, K.-J. Shin, mtDNAmanager: a web- 5 (2011) 126–132, http://dx.doi.org/10.1016/j.fsigen.2010.10.006.

128

You might also like