Professional Documents
Culture Documents
SEE COMMENTARY
K. C. Allen Chana,b,1, Peiyong Jianga,b,1, Kun Suna,b,1, Yvonne K. Y. Chengc, Yu K. Tonga,b, Suk Hang Chenga,b,
Ada I. C. Wonga,b, Irena Hudecovaa,b, Tak Y. Leungc, Rossa W. K. Chiua,b,2, and Yuk Ming Dennis Loa,b,2
a
Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong; bDepartment of Chemical Pathology,
The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong; and cDepartment of Obstetrics and Gynaecology, The
Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong
Contributed by Yuk Ming Dennis Lo, October 4, 2016 (sent for review August 24, 2016; reviewed by Mary E. Norton and Erik A. Sistermans)
Plasma DNA obtained from a pregnant woman was sequenced to a the PPV to 0.438%, although the sensitivity was reduced to 38.6%.
depth of 270× haploid genome coverage. Comparing the maternal These data, thus, indicate the enormous challenge of detecting fetal
plasma DNA sequencing data with the parental genomic DNA data de novo mutations on a genome-wide scale using NIPT. In par-
and using a series of bioinformatics filters, fetal de novo mutations ticular, dramatic improvement in the PPV would be needed for
were detected at a sensitivity of 85% and a positive predictive value such an approach to be clinically practical.
of 74%. These results represent a 169-fold improvement in the pos- A second area that needs improvement concerns the detection
itive predictive value over previous attempts. Improvements in the of sequences that the fetus has inherited from its mother. Previous
interpretation of the sequence information of every base position efforts in elucidating the maternal inheritance of the fetus on a
in the genome allowed us to interrogate the maternal inheritance genome-wide scale have generally used a haplotype-based strat-
of the fetus for 618,271 of 656,676 (94.2%) heterozygous SNPs within egy, which has been referred to as the relative haplotype dosage
the maternal genome. The fetal genotype at each of these sites was
(RHDO) approach (8–10). Hence, for a pregnant woman who has
deduced individually, unlike previously, where the inheritance was
two haplotypes in a particular chromosomal region, she would
determined for a collection of sites within a haplotype. These results
represent a 90-fold enhancement in the resolution in determining the
pass one of these onto her fetus. Because her plasma contains a
fetus’s maternal inheritance. Selected genomic locations were more
likely to be found at the ends of plasma DNA molecules. We found Significance
that a subset of such preferred ends exhibited selectivity for fetal- or
maternal-derived DNA in maternal plasma. The ratio of the number of We explored the limit of noninvasive prenatal testing by per-
maternal plasma DNA molecules with fetal preferred ends to those forming genome-wide sequencing of maternal plasma DNA at
with maternal preferred ends showed a correlation with the fetal DNA 195× and 270× haploid genome coverages. Combined with the
fraction. Finally, this second generation approach for noninvasive fetal use of a series of bioinformatics filters, fetal de novo mutations
whole-genome analysis was validated in a pregnancy diagnosed with could be detected with a positive predictive value that was two
cardiofaciocutaneous syndrome with maternal plasma DNA sequenced orders of magnitude higher than previously reported. A de novo
MEDICAL SCIENCES
to 195× coverage. The causative de novo BRAF mutation was success- BRAF mutation was noninvasively detected in a case with car-
fully detected through the maternal plasma DNA analysis. diofaciocutaneous syndrome. The maternal inheritance of the
fetus could be ascertained on a genome-wide level without the
|
noninvasive prenatal testing massively parallel sequencing | use of maternal haplotypes, hence greatly increasing the reso-
DNA fragmentation patterns lution of such analysis. Finally, we showed that certain genomic
locations were overrepresented at the ends of plasma DNA
fragments with fetal or maternal selectivity.
T he discovery of cell-free fetal DNA in maternal plasma has
enabled the development of noninvasive prenatal testing
(NIPT) (1). Over the last few years, NIPT has been implemented
Author contributions: K.C.A.C., R.W.K.C., and Y.M.D.L. designed research; K.C.A.C., P.J., K.S.,
Y.K.Y.C., Y.K.T., S.H.C., A.I.C.W., I.H., and T.Y.L. performed research; K.C.A.C., P.J., K.S.,
globally for the noninvasive prenatal investigation of fetal chro- R.W.K.C., and Y.M.D.L. analyzed data; Y.K.Y.C. and T.Y.L. recruited subjects and ana-
mosomal aneuploidies (2–5). With higher depth of sequencing and lyzed clinical data; and K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. wrote the paper.
improved bioinformatics analyses, NIPT has now been extended to Reviewers: M.E.N., University of California, San Francisco; and E.A.S., VU University
the detection of a variety of subchromosomal aberrations (6, 7). Medical Center.
Further expanding the applications of NIPT, we showed in 2010 Conflict of interest statement: R.W.K.C. and Y.M.D.L. received research support from
that it was possible to deduce the fetal genome by deep sequencing Sequenom, Inc. R.W.K.C. and Y.M.D.L. were consultants to Sequenom, Inc. K.C.A.C., R.W.K.C.,
and Y.M.D.L. hold equities in Sequenom, Inc. K.C.A.C., R.W.K.C., and Y.M.D.L. are founders
of maternal plasma (8). Work by Fan et al. (9) and Kitzman et al. of Xcelom and Cirina. K.C.A.C., P.J., and R.W.K.C. are consultants to Xcelom. P.J. is a
(10) confirmed these results. In these previous efforts, the depths consultant to Cirina. K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. have filed patent applications
of maternal plasma DNA sequencing ranged from 52.7× to 78× (PCT/CN2016/073753 and PCT/CN2016/091531) based on the data generated from this
haploid human genome coverages (8–10). work, which have been licensed to Cirina.
There are a number of limitations in these previous studies. For Freely available online through the PNAS open access option.
example, Kitzman et al. (10) explored the possibility of detecting Data deposition: The sequence data for the subjects studied in this work who had con-
sented to data archiving have been deposited in the European Genome-Phenome Archive
fetal de novo mutations on a genome-wide level from the maternal (EGA), https://www.ebi.ac.uk/ega/, hosted by the European Bioinformatics Institute (EBI;
plasma DNA sequencing data. In one variation of bioinformatics accession no. EGAS00001001882).
analysis, they found 2.5 × 107 candidate fetal de novo mutation See Commentary on page 14173.
sites in the plasma DNA sequencing data. Only 39 of these were 1
K.C.A.C., P.J., and K.S. contributed equally to this work.
true fetal de novo mutations. Because the studied fetus had a total 2
To whom correspondence may be addressed. Email: rossachiu@cuhk.edu.hk or loym@
of 44 de novo mutations, the positive predictive value (PPV) cuhk.edu.hk.
was 0.000156%, and the sensitivity was 88.6%. With additional
Downloaded by guest on July 8, 2021
Maternal Inheritance
Principle Haplotype-based by RHDO* SNP-based by GRAD*
Resoluon 300k − over 1M Single nucleode
1. Nucleosomal paern Plasma DNA preferred end sites
2. Size difference between maternal- Genomic coordinates
and fetal-derived fragments
Fetal-
derived
DNA
166 fragments
Frequency (%)
Fig. 1. Summary of the key differences between the first and second generation approaches for noninvasive fetal whole-genome analysis. aThe per-
Downloaded by guest on July 8, 2021
formance characteristics quoted for first generation fetal whole-genome analysis were from refs. 8–10. *RHDO represents RHDO analysis, and GRAD
represents GRAD analysis.
SEE COMMENTARY
carrying the fetal-specific allele and the allele shared by the fetus number of times that a particular nucleotide was actually se-
and the mother at a particular SNP (Fig. S1). Such size information quenced and the sequencing error rate, so that the theoretical
would be useful for determining the likelihood of any variant allele probability of a putative de novo mutation being a sequencing
detected in maternal plasma being a fetal de novo mutation. The error would be less than 1 in 3 × 109. Using this dynamic cutoff
median difference between fragments carrying the fetal-specific algorithm, 60,111 candidate de novo mutations were identified,
alleles and the shared alleles was 22 bp (Fig. S2A). At 90% of these and these candidate de novo mutations included 54 of 56 (i.e.,
SNP loci, the difference was larger than 10 bp (Fig. S2A). 96%) de novo mutations present in the cord blood (Fig. 2).
Then, we realigned the sequenced reads covering the 60,111
Detection of Fetal de Novo Mutations from Maternal Plasma. Fifty-six candidate de novo mutation sites to the reference human genome
single-nucleotide variants were found to be present in the umbilical using the BOWTIE2 software (19). The initial alignment software,
cord blood cells and absent in the parents. These 56 single- Short Oligonucleotide Alignment Program 2 (SOAP2) (20), and
nucleotide variants were thus deemed de novo mutations possessed the realignment software, BOWTIE2, were based on two different
by the fetus. Deep genome-wide sequencing is needed for the matching algorithms, namely the Burrows–Wheeler transform al-
screening of fetal de novo mutations from maternal plasma. At this gorithm and Smith–Waterman-like algorithm, respectively (19, 20).
level of sequencing, a vast number of sequencing errors would be A variant would be removed from being considered as a candidate
generated and hinder the sensitive and specific detection of true de novo mutation if the two alignment algorithms do not map the
fetal de novo mutations. To tackle the challenge, we designed a variant to the same genomic location. This realignment procedure
bioinformatics protocol that used a series of steps to filter out the significantly reduced the number of false-positive results caused by
false-positive mutations (Fig. 2). The bioinformatics protocol made alignment errors. After this realignment process, the number of
use of steps that identified high-quality base sequences in combi- candidate de novo mutations was reduced to 148 and contained
nation with steps that took into account the known biological 52 (93%) of the de novo mutations detected in the cord blood
characteristics of cell-free fetal DNA. (Fig. 2). This result is equivalent to a PPV of 35%.
First, we identified genomic locations where the maternal To further reduce the number of false positives, we applied a
plasma DNA sequencing showed at least one sequence variant, filter based on the fractional concentration of the variants. Because
whereas the parental DNA analysis showed that both parents the fetal DNA fraction in the sample was 31.3%, 95% of the var-
shared the same sequence. Second, we applied a dynamic cutoff iants would be expected to be present at a level of ≥10% of the
algorithm to determine if the variant detected in the maternal sequenced reads covering a particular nucleotide based on the
Poisson distribution. Hence, we filtered out the candidate muta-
tions that were present at a fraction of <10%. Eighty-five candidate
de novo mutations were present at ≥10% of the sequenced reads
Dynamic cutoff covering the nucleotide and included 51 (91%) of the mutations
Variant frequency higher than sequencing error rate detected in the cord blood (Fig. 2). This result represents a PPV
60,111 of 60%.
Next, we included a filter based on the size of the plasma DNA
MEDICAL SCIENCES
Detection rate: 54/56 (96%); PPV: 0.089%
molecules carrying the candidate de novo mutations. As illustrated
in Fig. S2A, the difference between the mean fragment size for
fragments carrying the fetal-specific alleles and the shared alleles
was larger than 10 bp for 90% of loci at which the fetus was
Realignment heterozygous and the mother was homozygous. Thus, we applied a
Realign all reads covering putative mutation sites size filter to the candidate de novo mutations, requiring the mean
148 size of the plasma DNA fragments carrying the mutants to be at
Detection rate: 52/56(93%); PPV: 35% least 10 bp shorter than that of the fragments carrying the parental
alleles. Sixty-five candidates passed this size-filtering criterion that
included 48 (85%) of the mutations detected in the cord blood
(Fig. 2). This result represents a PPV of 74%.
Fractional concentration
Variant frequency compatible with fetal DNA fraction Paternal Inheritance Analysis. For paternal inheritance analysis, we
85 focused on SNPs that were heterozygous in the father and homo-
Detection rate: 51/56 (91%); PPV: 60% zygous in the mother. In principle, the presence or absence of the
paternal-specific allele in the maternal plasma would indicate the
inheritance of paternal-specific allele or the shared parental allele
by the fetus, respectively. To enhance the accuracy of deducing the
paternal inheritance, we used a binomial mixture model, which
Plasma DNA fragment size took into account the fetal DNA concentration, the sequencing
Mean size of fragments carrying the variant 10 bp shorter than error rate, and the number of times that the paternal-specific allele
the mean size of fragments carrying the parental alleles was observed in the maternal plasma (details are in Materials and
65 Methods) (21). For 667,586 SNPs for which the mother was ho-
Detection rate: 48/56 (85%); PPV: 74% mozygous and the father was heterozygous, the overall accuracy
was 99.99% for the deduction of the paternal inheritance.
Fig. 2. Detection of the fetal de novo mutations through the analysis of the
sequencing data of maternal plasma DNA. The number in bold in each step Genome-Wide Relative Allele Dosage Analysis. Previous efforts in
prenatally elucidating the maternal inheritance of the fetus on a
Downloaded by guest on July 8, 2021
of fragments carrying the fetal-specific allele and the allele shared maternal plasma DNA preferred end positions from one third
SEE COMMENTARY
Shared allele
Fetal-specific allele
8
4
Fraction of
fragments ending
8
on the nucleotide
8
position (%)
4
4
-300 -200 -100 0 100 200 300 -300 -200 -100 0 100 200 300
Relative genomic coordinates (bases) Relative genomic coordinates (bases)
Fig. 3. Left shows an illustrative example of the nonrandom fragmentation patterns of plasma DNA carrying a fetal-specific allele and an allele shared by the
mother and the fetus. In Upper, each horizontal line represents one sequenced DNA fragment. The ends of the DNA fragments represent the ending position of
the sequenced read. The fragments are sorted according to the coordinate of the left outermost nucleotide (genomic coordinate with the lower numerical value).
In Lower, the percentage of fragments ending on a particular position is shown. Upper Right shows the span of all of the sequenced DNA fragments aligned to
the same SNP obtained from the sonicated blood cell DNA of the same pregnant woman. Lower Right shows the percentage of fragments ending on a particular
position. The x axes show the relative genomic coordinates. The SNP of interest is located in the middle (marked by a dotted line).
trimester case, we explored if such ends would be detectable in fetus. Sequencing libraries were prepared from these maternal
samples of other pregnancies and whether the relative abundance plasma samples using the KAPA Library Preparation Kits (Kapa
of plasma DNA ending at these sets of nucleotide positions would Biosystems), which included a PCR amplification step to enrich the
reflect the fetal DNA fraction. We sequenced the plasma DNA of adaptor-ligated molecules. The median number of mapped reads
26 first trimester (10–13 wk) pregnancies, each involving a male per sample was 16 million (range = 12–22 million). The proportion
A Informative SNP that mother was homozygous B Informative SNP that mother was heterozygous
and fetus was heterozygous and fetus was homozygous
MEDICAL SCIENCES
Sequencing reads carrying Sequencing reads carrying
8
shared allele
8
Significant ending
position specific for
Significant ending position specific for
fragments carrying Significant ending
position common for fragments carrying
a genomic coordinate (%)
shared alleles
a genomic coordinate (%)
0
4
carrying a shared allele and a maternal-specific allele are shown in red and blue, respectively. The x axes of the graphs show the relative genomic coordinates.
The position of the SNP of interest is marked by a dotted line.
DNA molecules with preferred fetal (Set A) and maternal (Set X) ends, and
tially be generalized to other pregnant cases. fetal DNA fraction.
Maternal- Maternal-
preferred preferred In this work, we performed ultradeep genome-wide sequencing of
SEE COMMENTARY
the plasma DNA obtained from two pregnant women to 195× and
270× haploid genome coverages. We believe that these are the
deepest genome-wide sequencings yet reported for plasma DNA
ΔS obtained from any one pregnancy. Previous efforts in the genome-
wide sequencing of plasma DNA obtained from pregnant women
had attempted sequencing depths from 52.7× to 78× haploid ge-
nome coverages (8–10). The depth of sequencing achieved in our
Size (bp) Size (bp) analyzed case has allowed us to push forward the limit and expand
C D the applications of NIPT.
First, we investigated the possibility of searching for fetal de
novo mutations on a genome-wide scale using the maternal
plasma DNA sequencing data. It has been reported that there are
∆S (%)
∆S (%)
∼74 de novo point mutations per person and that the mutation
rate increases with paternal age (23). It is now increasingly rec-
ognized that de novo mutations play an important role in human
genetic diseases and that diseases associated with de novo muta-
tions are not rare collectively (23). Thus, it is clinically relevant to
be able to screen for de novo mutations prenatally. Efforts in the
noninvasive prenatal detection of fetal de novo mutations are
Size (bp) Size (bp) limited to diseases associated with structural abnormalities de-
tectable on ultrasound and where those diseases are known to be
Fig. 7. (A) Plasma DNA size distributions for fragments ending on fetal-pre-
associated with hotspot de novo mutations (24). For example,
ferred ending positions (Set A; blue) and fragments ending on maternal-pre-
ferred ending positions (Set X; red). A shorter size distribution was observed
for fragments ending on Set A positions compared with those ending on Set X
positions. (B) Cumulative plot for the size distributions for the two sets of
fragments. The difference in the cumulative frequencies of the two sets of A B
Maternal- Maternal-
end coordinates are shifted by 0 to −5 bp with regard to the genomic coor- preferred preferred
dinates of the preferred end.
MEDICAL SCIENCES
distribution of plasma DNA fragments carrying the alleles shared ΔS
by the fetus and the mother is dependent on the fractional fetal
DNA concentration in plasma, the size difference required for
filtering de novo mutation would need to be determined on a case
by case basis. In this case, at least 8-bp size difference between the Size (bp) Size (bp)
variant and the parental alleles was used as the criterion for C D
qualifying the variant as a candidate de novo mutation; 75 can-
didate de novo mutations were identified in the plasma, of which
47 were confirmed to be present in the placenta. Because a total of
ΔS (%)
analysis had a detection rate of 81% and a PPV of 62% (Fig. S4).
The de novo BRAF mutation was among 47 variants detected by
the plasma DNA analysis. Using GRAD analysis, the maternal
inheritance of the fetus was deduced in 528,008 (68% of 775,456
SNPs where the mother was heterozygous) SNPs with an accuracy
of 96.8%. Regarding preferred end sites, plasma DNA fragments
carrying the most prevalent 0.5% of plasma DNA ends repre-
Size (bp) Size (bp)
sented 2.4% of all DNA fragments. Strikingly, 59% of the most
prevalent 0.5% plasma DNA end sites overlapped with those of Fig. 8. (A) Plasma DNA size distributions in a pooled plasma DNA sample from
the third trimester case. Based on the analysis of informative 26 first trimester pregnant women for fragments ending on fetal-preferred
SNPs, 10,401 fetal-preferred (Set A) (Fig. S5A) and 6,562 ma- ending positions (Set A; blue) and fragments ending on maternal-preferred
ternal-preferred (Set X) (Fig. S5B) end sites were identified. The ending positions (Set X; red). A shorter size distribution was observed for
preferred end sites for fragments carrying alleles shared between fragments ending on Set A positions compared with those ending on Set X
the mother and the fetus were also identified (Set B, Set C, Set Y, positions. (B) Cumulative plot for the size distributions for the two sets of
fragments. The difference in the cumulative frequencies of the two sets of
and Set Z) (Fig. S5). The genomic coordinates of each set of
fragments (ΔS) against fragment size is shown. (C) A plot of ΔS against size
preferred positions of this case are shown in Dataset S2. In 26 first when the end coordinates are shifted by 0 to +5 bp with regard to the genomic
trimester maternal plasma samples, a positive correlation could be coordinates of the preferred end. (D) A plot of ΔS against size when the end
observed between the relative abundance (i.e., the F/M ratio) of coordinates are shifted by 0 to −5 bp with regard to the genomic coordinates of
Downloaded by guest on July 8, 2021
the plasma DNA molecules with fetal-preferred (Set A) and the preferred end.
when conventional PCR-amplified DNA libraries were sequenced starting sequences within regions 73 bp upstream and downstream
SEE COMMENTARY
molecules spanning a particular genomic window minus those genomic DNA samples, including the mother’s buffy coat DNA, the father’s
molecules with an end point within the window. Conclusions buffy coat DNA, and the cord blood buffy coat DNA, 1 μg sonicated DNA was
drawn from those studies are that plasma DNA cleavages or used for library construction. The library concentrations ranged from 34 to
breakages cluster around locations in the genome in relation to 58 nM in a 20-μL library. For the first trimester, plasma DNA sequencing libraries
positions of nucleosome arrays and open chromatin domains. were constructed using a KAPA Library Preparation Kit (Kapa Biosystems).
However, as shown in Figs. 7 and 8, the preferred plasma DNA
Sequencing of DNA Libraries. The libraries for maternal plasma DNA and ge-
ending sites identified by this study are exquisitely sensitive to the
nomic DNA were sequenced using the HiSeq2500 and the HiSeq1500 Se-
genomic location. We have, therefore, shown that there is another quencing Systems (Illumina), respectively, using a paired end sequencing
layer of complexity associated with the nonrandom nature of protocol. Seventy-five nucleotides were sequenced from each end.
plasma DNA fragmentation that is beyond factors just related to
genome architecture or structural domains. Additional studies are Alignment of Sequencing Data. The paired end sequencing data were analyzed
needed to understand the relationship between the various factors by means of the SOAP2 in the paired end mode (20). The paired end reads were
and mechanisms resulting in the highly orchestrated and location- aligned to the nonrepeat-masked reference human genome (hg19). Up to two
specific patterns of plasma DNA fragmentation. We also believe nucleotide mismatches were allowed for the alignment of each end. The ge-
that a catalog of such tissue-specific end sites would provide an nomic coordinates of these potential alignments for the two ends were then
analyzed to determine whether any combination would allow the two ends to
alternative approach for elucidating the origin of plasma DNA in
be aligned to the same chromosome with the correct orientation spanning an
addition to methods based on methylation analysis (29) and nu- insert size ≤600 bp and mapping to a single location in the reference human
cleosome foot printing (16). genome. On average, 86% of the reads were aligned to the reference human
In summary, we have developed a second generation approach genome (hg19). For the detection of fetal de novo mutations from maternal
that produces noninvasive fetal genomes at high resolution using plasma, an additional step of realignment using the BOWTIE2 software was
maternal plasma DNA sequencing. We believe that this work has performed (19). All sequenced reads covering the candidate mutation sites
significantly pushed forward the limit of NIPT by showing the and exhibiting the variant alleles were realigned using BOWTIE2.
feasibility of detecting fetal de novo mutations on a genome-wide
level from maternal plasma. We have also substantially increased Binomial Mixture Model Analysis for Paternal Inheritance. To deduce the pa-
ternal inheritance of the fetus using maternal plasma DNA, we focused on SNPs
the resolution in which one could determine the maternal in-
at which the mother was homozygous (genotype AA) and the father was
heritance of the fetus using a nonhaplotype-based approach. heterozygous (genotype AB). At these SNPs, the genotype of the fetus would be
Currently, the potential clinical implementation of this approach is either AA or AB. Hence, we used a two-component binomial mixture model to
limited by the costs involved in sequencing maternal plasma DNA fit the observed allelic counts at each SNP locus (21). For each SNP locus i, the
to the depth reported here (195× and above; e.g., over US $40,000 counts for the A and B alleles are denoted as ai and bi, respectively, and Ni is
in our center). However, with the continual reduction in the costs the total number of reads covering the locus. Binomial mixture model was
of massively parallel sequencing, it is envisioned that this cost then used to determine if the fetus would be homozygous (AA) or hetero-
barrier would eventually be removed. Finally, our ΔS demonstra- zygous (AB) for each locus i based on the observed allelic counts ai and bi, the
sequencing error rate, and the fetal DNA fraction (21).
tion of preferred end sites in plasma DNA that exhibit specificity of
MEDICAL SCIENCES
their origin (e.g., the placenta of the fetus) has opened up nu-
GRAD Analysis for Maternal Inheritance. For SNPs in which the mother was
merous avenues of investigation. In the context of NIPT, it would heterozygous, we tested whether the two alleles were present in maternal
allow a relatively simple approach for estimating the fetal DNA plasma at the same or different concentrations using the sequential probability
fraction. Additional work would need to determine the relative ratio test (SPRT). An odds ratio of 20 was used for the calculation of the
merit between this approach and others previously reported [e.g., threshold for accepting the null or the alternative hypothesis. The null hy-
using plasma DNA size (28) and nucleosomal profiles (17)]. Ac- pothesis for each SPRT analysis was the absence of dosage imbalance between
tually, these methods could potentially be used in combination for the read counts for the two maternal alleles. The alternative hypothesis was the
NIPT. It would also be of great interest to explore the presence of overrepresentation of one allele. The calculation of the upper and lower
boundaries of the SPRT curves was as previously described (8).
such recurrent end positions in clinical and pathological scenarios
outside of the pregnancy context (e.g., cancer).
Dynamic Cutoff Analysis for de Novo Mutation. Dynamic cutoff filtering criteria
were developed to distinguish de novo mutations of the fetus from sequencing
Materials and Methods
errors. The sequencing error rate was assumed to be 0.3% (30). The probability
Clinical Samples. The study was approved by the Joint Chinese University of that the same variant would be observed in a number of sequenced reads
Hong Kong and Hospital Authority New Territories East Cluster Clinical Re- because of sequencing errors (Perr) would be calculated as
search Ethics Committee. For each of the two cases whose plasma DNA samples
e ðbi −1Þ
were sequenced to high depths, 20 mL maternal venous peripheral blood was Ni
collected; 10 mL paternal venous peripheral blood was also collected. For the Perr = ×e× ,
bi 3
third trimester pregnant case, 3 mL cord blood was collected after delivery. For
Ni
the case with cardiofaciocutaneous syndrome, the placenta was collected where represents a mathematical combination function Ni !=bi !ðNi − bi Þ!, e
bi
after therapeutic abortion. Twenty-six first trimester pregnant women were represents the sequencing error rate, Ni represents the total number of se-
recruited for fetal DNA fraction analysis. quenced reads covering the locus i, and bi represents the number of sequenced
reads carrying the variant. The dynamic cutoff values were determined based
Sample Processing. Blood samples were centrifuged at 1,600 × g for 10 min at on a theoretical false-positive rate of <1 in 3 × 109. In other words, the the-
4 °C. The plasma portion was harvested and recentrifuged at 16,000 × g for oretical number of false-positive sites caused by sequencing errors would be
10 min at 4 °C to remove the blood cells. The blood cell portion was recentri- less than one for a haploid genome.
fuged at 2,500 × g, and any residual plasma was removed. DNA from the blood
cells and that from maternal plasma were extracted with the blood and body Plasma DNA End Position Analysis. To screen for genomic coordinates that were
fluid protocol of the QIAamp DNA Blood Mini Kit and the QIAamp DSP DNA preferred ending positions for maternally and fetally derived DNA fragments,
Blood Mini Kit (Qiagen), respectively. DNA from the placenta was extracted fetal-specific informative SNP sites (i.e., at which the mother was homozygous
Downloaded by guest on July 8, 2021
with the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s and the fetus was heterozygous) and maternal-specific informative SNP sites
tissue protocol. (i.e., at which the fetus was homozygous and the mother was heterozygous)
ability function:
where %(size) is the proportion of fragments having the particular size.
Therefore, the overall probability of finding two fragments with identical
P value = Poisson Nactual , Npredict ,
ends on both sides (Pboth) can be calculated as
where Poisson() is the Poisson probability function, Nactual is the actual number
X
∞
of reads ending at the particular nucleotide, and Npredict is the total number of n
Pboth = PðnÞ × × Psize ,
2
reads divided by 166. n=2
For the SNPs where the mother was heterozygous (AB) and the fetus was
ACKNOWLEDGMENTS. This work was supported by the Hong Kong Research
homozygous, the reads carrying the maternal-specific allele (B allele) and the Grants Council Theme-Based Research Scheme T12-403/15. Y.M.D.L. was
shared alleles were analyzed. supported by an endowed chair from the Li Ka Shing Foundation.
1. Lo YM, et al. (1997) Presence of fetal DNA in maternal plasma and serum. Lancet 17. Straver R, Oudejans CB, Sistermans EA, Reinders MJ (2016) Calculating the fetal
350(9076):485–487. fraction for noninvasive prenatal testing based on genome-wide nucleosome profiles.
2. Chiu RW, et al. (2008) Noninvasive prenatal diagnosis of fetal chromosomal aneu- Prenat Diagn 36(7):614–621.
ploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc 18. Chandrananda D, Thorne NP, Bahlo M (2015) High-resolution characterization of
Natl Acad Sci USA 105(51):20458–20463. sequence signatures due to non-random cleavage of cell-free DNA. BMC Med
3. Chiu RW, et al. (2011) Non-invasive prenatal assessment of trisomy 21 by multiplexed Genomics 8:29.
maternal plasma DNA sequencing: Large scale validity study. BMJ 342:c7401. 19. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat
4. Bianchi DW, et al.; CARE Study Group (2014) DNA sequencing versus standard pre- Methods 9(4):357–359.
natal aneuploidy screening. N Engl J Med 370(9):799–808. 20. Li R, et al. (2009) SOAP2: An improved ultrafast tool for short read alignment.
5. Norton ME, et al. (2015) Cell-free DNA analysis for noninvasive examination of tri- Bioinformatics 25(15):1966–1967.
somy. N Engl J Med 372(17):1589–1597. 21. Jiang P, et al. (2012) FetalQuant: Deducing fractional fetal DNA concentration from
6. Srinivasan A, Bianchi DW, Huang H, Sehnert AJ, Rava RP (2013) Noninvasive detection massively parallel sequencing of DNA in maternal plasma. Bioinformatics 28(22):
of fetal subchromosome abnormalities via deep sequencing of maternal plasma. Am J 2883–2890.
Hum Genet 92(2):167–176. 22. Lun FM, et al. (2008) Noninvasive prenatal diagnosis of monogenic diseases by digital
7. Yu SC, et al. (2013) Noninvasive prenatal molecular karyotyping from maternal size selection and relative mutation dosage on DNA in maternal plasma. Proc Natl
plasma. PLoS One 8(4):e60968. Acad Sci USA 105(50):19920–19925.
8. Lo YM, et al. (2010) Maternal plasma DNA sequencing reveals the genome-wide 23. Veltman JA, Brunner HG (2012) De novo mutations in human genetic disease. Nat Rev
genetic and mutational profile of the fetus. Sci Transl Med 2(61):61ra91. Genet 13(8):565–575.
9. Fan HC, et al. (2012) Non-invasive prenatal measurement of the fetal genome. Nature 24. Chitty LS, et al. (2015) Non-invasive prenatal diagnosis of achondroplasia and tha-
487(7407):320–324. natophoric dysplasia: Next-generation sequencing allows for a safer, more accurate,
10. Kitzman JO, et al. (2012) Noninvasive whole-genome sequencing of a human fetus. and comprehensive approach. Prenat Diagn 35(7):656–662.
Sci Transl Med 4(137):137ra76. 25. Vajo Z, Francomano CA, Wilkin DJ (2000) The molecular and genetic basis of fibro-
11. Fan HC, Wang J, Potanina A, Quake SR (2011) Whole-genome molecular haplotyping blast growth factor receptor 3 disorders: The achondroplasia family of skeletal dys-
of single cells. Nat Biotechnol 29(1):51–57. plasias, Muenke craniosynostosis, and Crouzon syndrome with acanthosis nigricans.
12. Kitzman JO, et al. (2011) Haplotype-resolved genome sequencing of a Gujarati Indian Endocr Rev 21(1):23–39.
individual. Nat Biotechnol 29(1):59–63. 26. Saito H, Sekizawa A, Morimoto T, Suzuki M, Yanaihara T (2000) Prenatal DNA di-
13. Lam KW, et al. (2012) Noninvasive prenatal diagnosis of monogenic diseases by targeted agnosis of a single-gene disorder from maternal plasma. Lancet 356(9236):1170.
massively parallel sequencing of maternal plasma: Application to β-thalassemia. Clin 27. Chen S, et al. (2013) Haplotype-assisted accurate non-invasive fetal whole genome
Chem 58(10):1467–1475. recovery through maternal plasma sequencing. Genome Med 5(2):18.
14. New MI, et al. (2014) Noninvasive prenatal diagnosis of congenital adrenal hyper- 28. Yu SC, et al. (2014) Size-based molecular diagnostics using plasma DNA for non-
plasia using cell-free fetal DNA in maternal plasma. J Clin Endocrinol Metab 99(6): invasive prenatal testing. Proc Natl Acad Sci USA 111(23):8583–8588.
E1022–E1030. 29. Sun K, et al. (2015) Plasma DNA tissue mapping by genome-wide methylation se-
15. Zeevi DA, et al. (2015) Proof-of-principle rapid noninvasive prenatal diagnosis of quencing for noninvasive prenatal, cancer, and transplantation assessments. Proc Natl
autosomal recessive founder mutations. J Clin Invest 125(10):3757–3765. Acad Sci USA 112(40):E5503–E5512.
16. Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J (2016) Cell-free DNA Comprises 30. Ross MG, et al. (2013) Characterizing and measuring bias in sequence data.
an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164(1-2):57–68. Genome Biol 14(5):R51.
Downloaded by guest on July 8, 2021