Professional Documents
Culture Documents
© The Author(s) 2020. Published by Oxford University Press For The Infectious Diseases Society of
© The Author(s) 2020. Published by Oxford University Press For The Infectious Diseases Society of
Zijie Shen 1,2#, Yan Xiao 3#, Lu Kang 1,2#, Wentai Ma 1,2#, Leisheng Shi 1,2, Li Zhang1,
Zhang6, Hongru Li7, Yu Xu5, Mingwei Chen8, Zhancheng Gao5, Jianwei Wang 3, Lili
© The Author(s) 2020. Published by Oxford University Press for the Infectious Diseases Society of
America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.
9. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of
Sciences, Kunming, 650223, China;
# Author Z.S., Y.X., L.K., and W.M. contributed equally to this manuscript.
Summary
An elevated level of viral diversity was found in some SARS-CoV-2 infected patients,
indicating the risk of rapid evolution of the virus. Although no evidence for the
transmission of intra-host variants was found, the risk should not be overlooked.
Corresponding author:
Mingkun Li
E-mail: limk@big.ac.cn
Lili Ren
E-mail: renliliipb@163.com
Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union
Medical College
Building 7, Di Sheng Bei Jie Street 1, Yizhuang District, Beijing 100730, China;
Tel/Fax: 86-10-67837321
2
Abstract:
individuals and spread to over 20 countries. It is still unclear how fast the virus evolved
Results The median number of intra-host variants was 1-4 in SARS-CoV-2 infected
variants on genes was similar to those observed in the population data (110 sequences).
current polymorphism data. Although current evidence did not support the transmission
The microbiota in SARS-CoV-2 infected patients was similar to those in CAP, either
dominated by the pathogens or with elevated levels of oral and upper respiratory
commensal bacteria.
Conclusion SARS-CoV-2 evolves in vivo after infection, which may affect its virulence,
infectivity, and transmissibility. Although how the intra-host variant spreads in the
3
evolution in the population and associated clinical changes.
Negative controls.
4
Introduction
virus had spread to more than 20 countries, resulting in over 75,000 cases and more
than 2,300 deaths (Until Feb 22, 2020) [1, 2]. The basic reproduction number was
public health. Recent studies have identified bat as the possible origin of SARS-CoV-
2, and the virus likely uses the same cell surface receptor as SARS-CoV [4], namely
ACE2. These studies have advanced our understanding of SARS-CoV-2. However, our
The virus undergoes a strong immunologic pressure in humans, and may thus
accumulate mutations to outmaneuver the immune system [5]. These mutations could
imperative to investigate the pattern and frequency of mutations occurred. Aside from
the pathogen, microbiota in the lung is associated with disease susceptibility and
severity [7]. Alterations of lung microbiota could potentially modify immune response
against the viral and secondary bacterial infection [8, 9]. Thus, understanding the
microbiota, which comprises bacteria that could cause secondary infection or exert
effects on the mucosal immune system, might help to predict the outcome and reduce
complications.
lavage fluid (BALF) samples from 8 subjects with Coronavirus disease 2019 (COVID-
19, the disease caused by SARS-CoV-2) patients. We found that the number of intra-
5
host variants ranged from 0 to 51 with a median number of 4, suggesting a high
Results
Data summary
water, NC). For comparison, the metatranscriptome sequencing data with similar
determined by at least 100 viral reads and 10-fold higher than those in the NC), 20
healthy controls without any known pulmonary diseases (Healthy), and two extra NCs
(two saline solutions passing through the bronchoscope) were used in this study.
Table 1.
After quality control, a median number of 55,571 microbial reads were generated
for each sample. nCoV had the highest proportion of microbial reads compared to CAP
and Healthy (nCoV: median proportion of 7%, CAP: 0.8%, Healthy:0.1%, p < 0.001,
Figure 1A), and 49% of the microbial reads could be mapped to SARS-CoV-2, which
was not different from the viral proportion in CAP (Figure 1B). Only SARS-CoV-2 was
6
Betacoronavirus. Moreover, besides the detection of HCoV-OC43 in one Healthy and
proved the authenticity of the data and methods used in our analysis.
fold in nCoV1, with more than 80% of the genome covered by at least 50-fold in five
samples (Figure 2A, Supplementary Table 2). In total, 84 intra-host variants were
identified with minor allele frequency (MAF) greater than 5%, and 25 variants were
with MAF greater than 20% (Supplementary Table 3, Figure 2B, nCoV5 was excluded
from the analysis due to large gaps on its genome coverage). Notably, the number of
variants was not associated with the sequencing depth (Supplementary Figure 1). The
overall Ka/Ks ratio was significantly smaller than 1, which was similar for intra-host
variants and the polymorphisms observed in the population data, suggesting a purifying
selection acting on both types of mutations (Table 1). The numbers of variants observed
in the gene were proportional to gene lengths (cor = 0.950, p = 8E-06 for the intra-host
variant; cor = 0.957, p = 4E-06 for the polymorphisms). Although only a small fraction
of the variants was observed in multiple patients (2 out of 84, Figure 2C), some
positions were more prone to mutate or variants were transmitting in the population,
such as position 10779, where the mutant allele A was observed in all seven patients,
The number of intra-host variants per individual showed a large variation (0 to 51,
median 4 for variants with MAF ≥ 5%; 0 to 19, median 1 for variants with MAF ≥ 20%),
7
which could not be explained by the batch effect, coverage variance, or contamination
(Supplementary Figure 1; nCoV1-4 were in one batch, nCoV5-8 were in another batch;
most mutations were not observed in the population data). We also noted that the
number of variations was not relevant to the days after symptom onset or the age of
extremely high level of variants in nCoV6 (51 variants). A larger population size is
needed to investigate how frequent such outliers are, and whether they are associated
with the level of host immune response or the viral replication rate. We also noted
similar outliers for other viruses [11]. Of note, the origin of variants could be either
Among the eight COVID-19 patients, nCoV4 and nCoV7 were from the same
household, with dates of symptom onset differing by five days; thus a transmission from
nCoV4 to nCoV7 is highly suspected, especially considering that only nCoV4 had been
to the Huanan seafood market in Wuhan, which is the starting point of the outbreak and
suspected to be the source. First, the consensus sequence of the virus was the same for
two samples, and all four intra-host variants passing the selection criteria in nCoV4
were not detected in nCoV7 (Table 2). We further expanded the investigation to all
variants with MAF ≥ 2% and supported by at least 3 reads. By doing so, we detected
seven variants (out of 25) shared between the two samples. However, the MAF in both
nCoV4 and nCoV7 were similar to those in other samples, suggesting that these
positions were either error-prone or mutation-prone; hence they cannot support the
8
transmission of these variants.
Meanwhile, among all 84 intra-host variants, only three of them were found to be
polymorphic in the population data (position 7866 G/T; 27493 C/T; 28253 C/T). This
small number of overlap also suggests that intra-host variants were rarely transmitted
composition was observed among the nCoV, CAP, and Healthy groups (R2 = 0.07, p =
0.001; Figure 3A). However, the clustering of some samples with NC indicated a barren
microbiota in some samples. After removing the problematic samples and ambiguous
components, we still found that nCoV and CAP were both different from the healthy
controls (nCoV vs. Healthy: R2 = 0.45, p = 0.001; CAP vs. Healthy: R2 = 0.10, p =
classified into three different types (Figure 3B). In particular, the microbiota in cluster
clusters were more diverse. By further inspecting the species belonging to each cluster
(Supplementary Table 4-5), we found that bacteria in Type III were mainly commensal
species frequently observed in the oral and respiratory tract, whereas bacteria in Type
9
Therefore, the microbiota was either pathogen-enriched (Type I) or commensal-
enriched (Type III) or undetermined due to low microbial load (Type II).
The microbiota in six nCoV samples were pathogen-enriched, and the other two
were commensal-enriched (Figure 3B). Moreover, two nCoV samples (2, 6) with an
enriched microbiota. The overwhelming proportion of the virus may associate with a
higher replication rate, and could also potentially stimulate the intense immune
response against the virus, under which circumstance, an excess number of intra-host
mutations would be expected. However, as only eight nCoV patients were included in
this analysis, and the absolute microbial load was unknown, more data is needed for
further investigation.
Discussion
RNA viruses have a high mutation rate due to the lack of proofreading activity of
polymerases. Consequently, RNA viruses are prone to evolve resistance to drugs and
escape from immune surveillance. The mutation rate of SARS-CoV-2 is still unclear.
However, considering that the median number of pairwise sequence differences was 4
(Interquartile Range: 3-6) for 110 sequences collected between Dec 24, 2019 and Feb
9, 2020, the mutation rate should be at the same order of magnitude in SARS-CoV
(0.80-2.38×10-3 nucleotide substitution per site per year)[14]. The high mutation rate
also results in a high level of intra-host variants in RNA viruses [11, 15]. The median
number of intra-host variant in COVID-19 patients was 4 for variant with frequency ≥
5%, and this incidence was not significantly different from that reported in a study on
10
Ebola (655 variants with frequency ≥ 5% in 134 samples) (p>0.05)[11], suggesting that
CoV[16, 17], and we noted that all three key motifs in the gene were identical between
polymorphism nor intra-host variant was detected in these motifs, suggesting that the
gene is highly conserved, and thereby it could be a potential target for antiviral therapy.
Although we did not find any mutation hotspot genes in either polymorphism or intra-
host variants, the observation of shared intra-host variants among different individuals
implied the possibility of adaptive evolution of the virus in patients, which could
potentially affect the antigenicity, virulence, and infectivity of the virus [6].
It is worth noting that the SARS-CoV-2 genome in patients could be highly diverse,
which was also observed in other viruses [11]. The high diversity could potentially
increase the fitness of the viral population, making it hard to be eliminated[15]. Further
studies are needed to explore how this may influence the immune response towards the
virus and whether there is a selection acting on different strains in the human body or
whether these intra-host variants occurred before the transmission or after the
may be involved in the transmission, which could also result in the loss of diversity
11
emphasized the possibility of rapid-evolving of this virus.
Recent studies have shown that the microbiota in the lung contributed to the
Meanwhile, the lung microbiota could also be regulated by invading viruses [9, 19].
pneumonia than that in healthy controls (Figure 3B), we did not identify any specific
microbiota pattern shared among COVID-19 patients, neither for CAP patients. A
possible reason for this could be the use of antibiotics in pneumonia patients. However,
this was not true for all pneumonia samples, as a substantial proportion of bacteria were
observed in some samples, including two COVID-19 patients. It is well known that a
bacterial infection often results in a significant increase in morbidity [20]. Thus, the
elevated level of bacteria in the BALF of some COVID-19 patients might increase the
risk of secondary infection. In the clinical data, the secondary infection rate for COVID-
19 was between 1%-10% [2, 21]. However, the quantitative relationship between
Overall, our study has revealed the evolution of SARS-CoV-2 in the patient, a
common feature shared by most RNA viruses. How these variants influence the fitness
of viruses and genetic diversity in the population awaits further investigation. Currently,
only limited sequences are shared in public databases (Supplementary Table 6); hence
there is an urgent need to accumulate more sequences to trace the evolution of the viral
genome and associate the changes with clinical symptoms and outcomes.
12
Methods.
Eight COVID-19 pneumonia samples were collected from hospitals in Wuhan from
samples were collected from Beijing Peking University People's Hospital, The
Shenzhen Third People's Hospital, Fujian Provincial Hospital, and The First-affiliated
hospital of Xi'an Jiaotong University between 2014 and 2018. CAP was diagnosed
following the guidelines of the Infectious Diseases Society of America and the
American Thoracic Society [22]. Pneumonia patients with chronic pulmonary diseases
were excluded. Meanwhile, BALF from 20 healthy volunteers were collected and used
in Supplementary Table 1.
For each patient, BALF samples were collected using a bronchoscope as part of
normal clinical management. The volume of BALF samples ranged between 5ml and
30ml, most of which were used for bacterial culture and the remnant were aliquoted
Metatranscriptome sequencing
A 200 ul aliquot of each SARS-CoV-2 infected whole-BALF sample was used to extract
RNA using Direct-zol RNA Miniprep kit (Zymo Research, Irvine, CA, USA) and Trizol
LS (Thermo Fisher Scientific, Carlsbad, CA, USA) in biosafety III laboratory, and the
rest samples were operated following the same protocol in biosafety II laboratory. The
13
RNA was then reverse transcribed, and amplified using an Ovation Trio RNA-Seq
library preparation kit (NuGEN, CA, USA) and was sequenced on an Illumina HiSeq
Data availability
data have also been submitted to NCBI Sequence Read Archive (SRA) database under
Quality control processes included adapter trimming, low quality reads removal, short
reads removal by fastp (-l 70, -x, --cut-tail, --cut_tail_mean_quality 20, version:
0.20.0)[24], low complexity reads removal by Komplexity (-F, -k 8, -t 0.2, version: Nov
The resultant reads were mapped against NCBI nt database (version: Jul 1 2019)
done by MEGAN using lowest common ancestor algorithm (-ms 100, -supp 0, -me 0.01,
-top 10, -mrc 60, version: 6.11.0)[30]. After performing an overall PCoA and
Permanova test, samples and microorganisms were filtered for further analyses with the
14
following criteria. Samples with less than 5000 microbial reads were discarded.
raw data and filtered data; 3) supported by at least 100 reads; 4) abundance higher than
generated by samtools (version 1.8)[33], and intra-host variants were called using
VarScan (version: 2.3.9)[34] and an in-house scripts. All variants had to satisfy the
Minor allele frequency ≥ 2% on each strand; 4) Minor allele count ≥ 5 on each strand;
5) The minor allele was supported by the inner part of the read (excluding 10 bp on
each end); 6) Both alleles could be identified in at least 3 reads that specifically assigned
to genus Betacoronavirus.
Statistical analysis.
Pearson’s chi-square test or Fisher’s exact test was used for categorical variables, and
15
the Mann-Whitney U test or Kruskal-Wallis rank sum test was used for continuous
variables that do not follow a normal distribution. A comparison of microbiota was done
by Permanova test.
The study was approved by the Institutional Review Board of Beijing Peking
Provincial Hospital, and The First-affiliated hospital of Xi'an Jiaotong University. The
data collection for the COVID-19 patients were deemed by the National Health
Commission of the People’s Republic of China as the contents of the public health
outbreak investigation. Written informed consent was obtained from other pneumonia
Acknowledgments
We thank Dr. Xue Yongbiao and colleagues from National Genomics Data Center for
helpful discussion and computational resource support. We thank Dr. Huang Yanyi
(Peking University, Beijing, China) and Wang Jianbin (Tsinghua University, Beijing,
China) for providing the sequencing platform. We also thank Dr. Huang Chaolin for
assist in sample collection. We gratefully acknowledge the Authors, the Originating and
Submitting Laboratories for their sequence and metadata shared through GISAID, on
which some of our analysis is based, a full name list of all submitters was given in Table
S6.
16
Funding
This work was supported by grants from Innovation Fund for Medical Sciences [2016-
I2M-1-014], the National Major Science & Technology Project for Control and
Foundation of China [31670169, 31871263]; and the Open Project of Key Laboratory
17
Reference
1. Zhu N, Zhang D, Wang W, et al. A Novel Coronavirus from Patients with Pneumonia in
2. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel
(2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of
4. Zhou P, Yang, X., Wang, X. et al. A pneumonia outbreak associated with a new
7. O'Dwyer DN, Dickson RP, Moore BB. The Lung Microbiome, Immunity, and the
2640.
9. Huffnagle GB, Dickson RP, Lukacs NW. The respiratory tract microbiome and lung
18
10. Subissi L, Imbert I, Ferron F, et al. SARS-CoV ORF1b-encoded nonstructural proteins
12-16: replicative enzymes as antiviral targets. Antiviral Res 2014; 101: 122-30.
11. Ni M, Chen C, Qian J, et al. Intra-host dynamics of Ebola virus during 2014. Nat
Association with Bacterial Biomass and Host Inflammatory Status. mSystems 2018;
3(5).
13. Segal LN, Clemente JC, Tsay JC, et al. Enrichment of the lung microbiome with oral
taxa is associated with lung inflammation of a Th17 phenotype. Nat Microbiol 2016; 1:
16031.
14. Zhao Z, Li H, Wu X, et al. Moderate mutation rate in the SARS coronavirus genome
15. Domingo E, Sheldon J, Perales C. Viral quasispecies evolution. Microbiol Mol Biol Rev
16. Minskaia E, Hertzig T, Gorbalenya AE, et al. Discovery of an RNA virus 3'->5'
exoribonuclease that is critically involved in coronavirus RNA synthesis. Proc Natl Acad
17. Smith EC, Blanc H, Surdel MC, Vignuzzi M, Denison MR. Coronaviruses lacking
18. Sigal D, Reid JNS, Wahl LM. Effects of Transmission Bottlenecks on the Diversity of
19
19. Tsang TK, Lee KH, Foxman B, et al. Association between the respiratory microbiome
20. Hendaus MA, Jomha FA, Alhammadi AH. Virus-induced secondary bacterial infection:
2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. The Lancet
2020.
22. Mandell LA, Wunderink RG, Anzueto A, et al. Infectious Diseases Society of
23. National Genomics Data Center M, Partners. Database Resources of the National
Genomics Data Center in 2020. Nucleic Acids Res 2020; 48(D1): D24-D33.
25. Clarke EL, Taylor LJ, Zhao C, et al. Sunbeam: an extensible pipeline for analyzing
26. ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/bmtagger.
27. Wang J, Wang W, Li R, et al. The diploid genome sequence of an Asian individual.
28. Kopylova E, Noe L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal
20
29. Camacho C, Coulouris G, Avagyan V, et al. BLAST+: architecture and applications.
30. Huson DH, Beier S, Flade I, et al. MEGAN Community Edition - Interactive Exploration
and Analysis of Large-Scale Microbiome Sequencing Data. PLoS Comput Biol 2016;
31. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.
32. http://broadinstitute.github.io/picard.
34. Koboldt DC, Zhang Q, Larson DE, et al. VarScan 2: somatic mutation and copy number
alteration discovery in cancer by exome sequencing. Genome Res 2012; 22(3): 568-
76.
36. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data - from vision
37. Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KaKs_Calculator: calculating Ka and
38. Salter SJ, Cox MJ, Turek EM, et al. Reagent and laboratory contamination can critically
21
Table 1. The number of intra-host variants and polymorphisms in the genome of
SARS-CoV-2
P-
Intra-host variants Polymorphisms
value2
Gene length
NS S Ka/Ks1 NS S Ka/Ks
22
Table 2. The allele frequency changes in transmission from nCoV4 to nCoV7
POS1 Ref_nCoV42 Alt_nCoV43 FRE Ref_nCoV7 Alt_nCoV7 FRE P-value4
376 119 177 0.598 9 0 0.000 0.0003
769 777 17 0.021 16 0 0.000 1
2037 1496 33 0.022 8 0 0.000 1
3290 2249 112 0.047 17 0 0.000 1
23
Figure legends
Figure 1. Overview of the sequencing data. (A) The proportion of microbial reads in
different groups; (B) Proportion of the viral read in patients infected with different
viruses.
24
Downloaded from https://academic.oup.com/cid/article-abstract/doi/10.1093/cid/ciaa203/5780800 by guest on 16 April 2020
25
Figure 1
Downloaded from https://academic.oup.com/cid/article-abstract/doi/10.1093/cid/ciaa203/5780800 by guest on 16 April 2020
26
Figure 2
Downloaded from https://academic.oup.com/cid/article-abstract/doi/10.1093/cid/ciaa203/5780800 by guest on 16 April 2020
27
Figure 3