You are on page 1of 11

16S Microbial analysis with Nanopore data

Deborah Shirleen - 19010037

Introduction
Maintaining health soils is a priority to protect ecological balance, thus our
well-being too. Because changes in microbial populations generally lead to changes in
soil physical and chemical properties, monitoring their health can help forecast how they
will evolve in the future, allowing for the development of measures to reduce ecosystem
damage. Soil pH, the carbon-nitrogen ratio (C:N), and the co-concentrations of Olsen P
(a measure of plant accessible phosphorus), aluminum, and copper are among the
environmental variables that have demonstrated the strongest link with changes in the
composition of bacterial communities. Many bacteria, on the other hand, form
complicated symbiotic interactions with plant beings. The rhizosphere is the portion of
the substrate that is immediately impacted by root fluids and related bacteria. Bacillus,
Pseudomonas, and Burkholderia bacteria, for example, are found connected with plant
roots, protecting them from pathogenic microbes.
In this study, four soil samples were taken: (i & ii) soil surface & deep sample and
(iii & iv) rhizosphere surface & deep sample. The DNA was first extracted using the
Zymo Research Kit, followed by PCR amplification of 16S rRNA genes. Sequences
from the MinION sequencer (Oxford Nanopore Technologies) will be used to achieve
two goals: 1) assess the health of soil samples; 2) investigate how microbial populations
are influenced by their interactions with plant roots. The tutorial was provided on
https://training.galaxyproject.org/training-material/topics/metagenomics/tutorials/nanopore-16S-
metagenomics/tutorial.html

Methods, Results & Discussion

Import Data
Upload the data into the Galaxy interface by copy-pasting these fasta files onto the
collection tab:
https://zenodo.org/record/4274812/files/bulk_bottom.fastq.gz
https://zenodo.org/record/4274812/files/bulk_top.fastq.gz
https://zenodo.org/record/4274812/files/rhizosphere_bottom.fastq.gz
https://zenodo.org/record/4274812/files/rhizosphere_top.fastq.gz

Assess datasets quality


This is step aim to obtain a meaningful downstream analysis.
Quality control using FastQC and MultiQC
FastQC provides information on various parameters, such as the range of quality values across
all bases at each position. MultiQC allows summarizing the output of different outputs from
FastQC.
1. FastQC Tool with the following parameters:
○ param-collection
○ “Dataset collection”: soil collection

2. Rename the outputs as FastQC unprocessed: Raw and FastQC unprocessed: Web
3. MultiQC Tool with the following parameters:
○ In “Results”:
■ “Which tool was used generate logs?”: FastQC
■ “Dataset collection” : FastQC unprocessed: Raw

4. Click on the galaxy-eye (eye) icon and inspect the generated HTML file
Result :
https://usegalaxy.org/datasets/bbd44e69cb8906b532ba3eda638464cb/display?to_ext=ht
ml
Result Figure 1. Sequence length distribution. The main peak (1699 bp), corresponds approximately to the length
of the gene coding for 16S rRNA. There is also a secondary peak (299 bp), which may be due to truncated
amplifications, or as a result of non-specific hybridization of primers used for PCR.

Result Figure 2. Per base sequence quality. Up to 3000 bp the quality of our sequencing data is relatively low
(around a Phred score of 12), because Nanopore reads pose high error rates in the base called reads. However,
Nanopore sequencing generates very long reads (in theory only limited by the mechanisms of extraction of the
genetic material), enabling the sequencing of the complete 16S rRNA gene, which makes it possible to identify
bacterial taxa at higher resolution.
Result Figure 3. Per sequence GC content. Bimodal peaks are observed, and may indicate: First possible
explanation is the presence of adapters in the sequence. Another possible cause is some kind of contamination, such
as chimeras.

Improve the Dataset Quality

Adapter and chimera removal with porechop


Nanopore sequencing technology necessitates the attachment of adapters to both ends
of genetic material in order to assist strand capture and the loading of a processive enzyme at
the 5'end, hence improving the sequencing process' effectiveness. Because adapter sequences
can interfere with read alignment to the 16S rRNA gene reference database, which we will use
the porechop tool for, they should be eliminated (Wick 2017). Chimeric sequences, on the other
hand, are regarded as a contaminant and should be deleted since they might artificially inflate
microbial diversity. Porechop can help to get rid of them.
1. Porechop Tool: with the following parameters:
○ “Input FASTA/FASTQ”: soil collection (Data collection)
○ “Output format for the reads”: fastq

2. Rename the output as soil collection trimmed

Filter sequences with fastp


Reads with lengths between 1000 bp and 2000 bp are selected to increase the
specificity of the analysis, because they include both preserved and hypervariable regions of the
16S rRNA gene, thus are more informative from a taxonomic point of view. In addition,
sequences will be filtered on a minimum average read quality score of 9, according to the
recommendations from Nygaard et al. 2020. This stage will be carried out through the use of
fastp (Chen et al. 2018), an open-source tool designed to process FASTQ files.
1. fastp Tool: with the following parameters:
○ “Single-end or paired reads”: Single-end
■ “Dataset collection”: soil collection trimmed
■ In “Adapter Trimming Options”:
■ “Disable adapter trimming”: Yes
○ In “Filter Options”:
■ In “Quality filtering options”:
■ “Qualified quality phred”: 9
■ In “Length filtering options”:
■ “Length required”: 1000
■ “Maximum length”: 2000

○ In “Read Modification Options”:


■ “PolyG tail trimming”: Disable polyG tail trimming

2. Rename the output as soil collection processed

Re-evaluate datasets quality


This step is to confirm if the anomalies that had been detected are corrected.
1. FastQC Tool: with the following parameters:
○ “Dataset collection”: soil collection processed (Data collection)

2. Rename the outputs as FastQC processed: Raw and FastQC processed: Web
3. MultiQC Tool: with the following parameters:
○ In “Results”:
■ “Which tool was used generate logs?”: FastQC
■ param-collection
■ “Dataset collection”:
■ FastQC processed: Raw

4. Click on the galaxy-eye (eye) icon and inspect the generated HTML file
Result :
https://usegalaxy.org/datasets/bbd44e69cb8906b5be652c1232c50dfe/display?to_ext=ht
ml

Result Figure 4. Per sequence GC content in processed samples. After processing the samples, the GC content
presents a unimodal distribution, which indicates that the anomalies in the sequences have been successfully
eliminated

Assign taxonomic classifications


Taxonomic classification tools are based on microbial genome databases to identify the origin of
each sequence.

Taxonomic classification with Kraken2


Kraken2 Tool: with the following parameters:
● “Single or paired reads”: Single
● “Dataset collection”: soil collection processed
● “Print scientific names instead of just taxids”: Yes
● “Confidence”: 0.1
● In “Create Report”:
○ “Print a report with aggregrate counts/clade to file”: Yes
○ “Format report output like Kraken 1’s kraken-mpa-report”: Yes
● “Select a Kraken2 database”: Silva (Created: 2020-06-24T164526Z, kmer-len=35,
minimizer-len=31, minimizer-spaces=6)

Analyze taxonomic assignment


Before visualizing the data with the Krona pie chart tool, we need to adjust the format of the
data output from Kraken2.
1. Reverse Tool: with the following parameters:
○ “Dataset collection”: Report: Kraken2 on collection
2. Replace Text Tool: with the following parameters:
○ “Dataset collection”: Reverse on collection
○ In “Replacement”:
■ “Insert Replacement”
■ “Find pattern”: \|
■ “Replace with”: \t

3. Remove beginning Tool: with the following parameters:


○ “Dataset collection”: Replace Text on collection

Visualize the taxonomic classification with Krona


Krona pie chart Tool: with the following parameters:
○ “What is the type of your input data”: Tabular
○ “Dataset collection”: Remove beginning on collection
○ “Provide a name for the basal rank”: Bacteria
Result Figure 5. Krona pie chart of bulk_top.fastq.gz

Result Figure 6. Krona pie chart of bulk_bottom.fastq.gz


Result Figure 7. Krona pie chart of rhizos_top.fastq.gz

Result Figure 8. Krona pie chart of rhizos_bottom.fastq.gz


The abundance of Bacteroidetes and Gammaproteobacteria, as well as the low
presence of Alphaproteobacteria (including members of the order Rhizobiales), indicate that the
soil is significantly exposed to phosphorus. As a significant component of agrochemicals, this
mineral is strongly linked to anthropogenic activities. Then, between the soil and rhizosphere
samples, significant variations in the organization of the bacterial communities can also be
observed. The increase in the phylum Planctomycetes, which is normally abundant in the
rhizosphere, is particularly noteworthy.

Conclusion
MinION Nanopore sequencing data was used to investigate the health of soil samples and the
structure of bacterial populations in this study. As a result of their exposure to agrochemicals,
the soil suffers some erosion, according to the findings. We were also able to investigate how
the composition of microbial communities changes when plant species are present.

References
Fierer, N., and R. B. Jackson, 2006 The diversity and biogeography of soil bacterial communities.
Proceedings of the National Academy of Sciences 103: 626–631. 10.1073/pnas.0507535103
Ondov, B. D., N. H. Bergman, and A. M. Phillippy, 2011 Interactive metagenomic visualization in a Web
browser. BMC Bioinformatics 12: 10.1186/1471-2105-12-385
Quast, C., E. Pruesse, P. Yilmaz, J. Gerken, T. Schweer et al., 2012 The SILVA ribosomal RNA gene
database project: improved data processing and web-based tools. Nucleic Acids Research 41:
D590–D596. 10.1093/nar/gks1219
Wood, D. E., and S. L. Salzberg, 2014 Kraken: ultrafast metagenomic sequence classification using exact
alignments. Genome Biology 15: R46. 10.1186/gb-2014-15-3-r46
Hermans, S. M., H. L. Buckley, B. S. Case, F. Curran-Cournane, M. Taylor et al., 2016 Bacteria as
Emerging Indicators of Soil Condition (F. E. Loeffler, Ed.). Applied and Environmental Microbiology
83: 10.1128/aem.02826-16
Jain, M., H. E. Olsen, B. Paten, and M. Akeson, 2016 The Oxford Nanopore MinION: delivery of nanopore
sequencing to the genomics community. Genome Biology 17: 10.1186/s13059-016-1103-0
Brown, B. L., M. Watson, S. S. Minot, M. C. Rivera, and R. B. Franklin, 2017 MinION™ nanopore
sequencing of environmental metagenomes: a synthetic approach. GigaScience 6:
10.1093/gigascience/gix007
Wick, R., 2017 Porechop. GitHub. https://github.com/rrwick/Porechop
Chen, S., Y. Zhou, Y. Chen, and J. Gu, 2018 fastp: an ultra-fast all-in-one FASTQ preprocessor.
10.1101/274100
Wood, D. E., J. Lu, and B. Langmead, 2019 Improved metagenomic analysis with Kraken 2. Genome
Biology 20: 10.1186/s13059-019-1891-0
Nygaard, A. B., H. S. Tunsjø, R. Meisal, and C. Charnock, 2020 A preliminary study on the potential of
Nanopore MinION and Illumina MiSeq 16S rRNA gene sequencing to characterize building-dust
microbiomes. Scientific Reports 10: 3209. 10.1038/s41598-020-59771-0

You might also like