Professional Documents
Culture Documents
1
2
Bioinformatics, YYYY, 0–0
3
doi: 10.1093/bioinformatics/xxxxx
4
Advance Access Publication Date: DD Month YYYY
5
Manuscript Category
6
7
8
1
2
linear regression, non-negative least squares, Bayesian inference, and sim-
3 ulated annealing, respectively. Here, we have developed a standard format
4 1 Introduction
for inputs and outputs for easy interoperability and effective comparison,
5 Mutational signature analysis provides an operative framework to respectively. With our previous experience in visualization of genomic
6 understand the somatic evolution of cancer from normal tissue (Robinson data (Lan et al., 2014), we have implemented standard visualizations for
7 et al 2020; Alexandrov et al, 2020; Brunner et al., 2019; Yoshida et al., the results of all mutational signature packages to ensure easy analysis.
8 2020; Moore et al.,2020). From the earliest phases of neoplastic changes,
1
2
3
4
5
6
7 TCGA-AB-2804
Sigflow
COSMIC V3 SBS COSMIC Legacy SBS
1.0
9
deconstructSigs 0.4
sigfit
0.2
mutationalPatterns_strict
deconstructSigs
mutationalPatterns
10
Sigflow
sigfit
mutationalPatterns_strict
deconstructSigs
mutationalPatterns
Sigflow
sigfit
11
12
13
14
15
16
17
18
19
20
21 Fig. 1. Workflow and results for MetaMutationalSigs. A) The workflow for mutational signature analysis starts with pre-processing steps (shown in blue boxes) required by the user to
conduct variant calling, filtering, and annotation on a BAM file of a sequenced genome or exome. Our tool, MetaMutationalSigs, conducts the steps shown in green boxes and analyzes the
22
signatures found in a VCF. B) RMSE between the reconstructed signal (from the reference signatures) and the overall signature, plotted as dots for each of the 188 patient samples in the
23 Acute Myeloid Leukemia (TCGA, GDAC Firehose Legacy, Study ID: laml_tcga) dataset, for each tool (lower values are better) and for signature type (V2 and V3 SBS and IDs; no tool
24 predicted DBS for the samples used). While RMSE does not change for deconstructSigs and Sigfit, the RMSE significantly drops for mutationalPatterns and Sigflow with COSMIC v3. C)
25 Heatmap of cosine similarity between the predicted contributions of COSMIC v3 SBS vs. COSMIC legacy SBS signatures by different tools for the same TCGA patient sample. With the
26 legacy signatures, tools are generally less in agreement in their resulting signature contributions, while with COSMIC v3 signatures, the standard use tools are all in agreement with each
other. Sigflow had the lowest RMSE and was selected for analysis in D. D) Heatmaps of Sigflow analysis of COSMIC v3 SBS vs. COSMIC legacy SBS mutational signature contributions
27
using whole-exome sequence data from three TCGA patients with acute myeloid leukemia. Here, each row is a patient sample. Left. COSMIC v3 SBS refitting provides different dominant
28 signature contributions, TCGA-AB-2804: unknown etiology, TCGA-AB-2805 and TCGA-AB-2806: unknown chemotherapy and different DNA mismatch repair signatures, SBS20 and
29 26, respectively. COSMIC Legacy SBS refitting provides signature 3 (failure of double-strand break-repair by homologous recombination) as the dominant signature for all samples. The
30 COSMIC v3 SBS refitting reveals multiple mutational processes may be playing a role in the overall signature contribution than is found with the COSMIC Legacy SBS refitting.
31
32 Our package outputs several data files in comma separated values Atlas portal (Weinstein, 2013). RMSE is a performance metric com-
33 (CSV) format ready for further analysis and visualization using external monly used in signal processing (Rosen, 2007). In Fig 1c, we plot
34 packages along with visualizations of the signature contributions as de- heatmaps of distances between methods for predicting signature contri-
35 scribed in Table 1. In Fig. 1a, we illustrate the workflow of the analysis butions for SBS V2 (Legacy) and V3. In Fig 1d, we plot the heatmap
36 (including pre-processing steps in blue and our package’s steps in of SBS contributions to overall signature reconstruction for three of the
37 green). In Fig 1b, we compare packages using the root mean squared LAML patient samples.
38 error (RMSE) between the reconstructed and actual signals for 188 my-
39 eloid leukemia (LAML) patients obtained from The Cancer Genome
40
41 Table 1. Summary of result files. Signature Version corresponds to COSMIC- Legacy or V3. Tool_name corresponds mutationalPatterns, sigfit, sigflow,
42 and deconstructSigs.
43
44 File Name Format Description
45
46 Heatmap_contributions_all_sigs_[sig- svg Contributions from all signatures of the [signature_version] to the overall signature.
47 nature_version].svg
48 Heatmap_[signature_version].svg svg Heatmap of cosine similarity between the predicted contributions by different tools for [signa-
49 ture_version].
[signature_version]_bar_charts.html html Bar charts of signature contributions per sample and per tool for [signature_version].
50
rmse_box_plot.svg svg Box plot of RMSE between the reconstructed signal (from the reference signatures) and the over-
51 all signature
52 [tool_name]\[signature_version]_sam- csv Data about the difference between reconstructed and signal for each signature of [signature_ver-
53 ple_error.csv sion] for each [tool_name] for each sample. This is used to create rmse_box_plot.svg
54 [tool_name]\[signature_version]_con- csv Data about the contribution of each signature of [signature_version] for each [tool_name] for each sample. This
55 tribution.csv is used to create the Heatmap_contributions_all_sigs_[singature_version].svg, Heatmap_[signature_ver-
56 sion].svg and [signature_version]_bar_charts.html.
57
58
59
60
Bioinformatics Page 4 of 4
1
2 Chung, J., Maruvka, Y. E., Sudhaman, S., Kelly, J., Haradhvala, N. J., Bianchi, V.,
3 … Tabori, U. (2020). DNA polymerase and mismatch repair exert distinct mi-
4 crosatellite instability signatures in normal and malignant human cells. Cancer
Discovery, CD-20-0790. https://doi.org/10.1158/2159-8290.cd-20-0790
5 3 Discussion Forbes, S. A., Beare, D., Boutselakis, H., Bamford, S., Bindal, N., Tate, J., … Camp-
6 The quick brown fox jumps over the lazy dog. The massive increase bell, P. J. (2016). COSMIC: somatic cancer genetics at high-resolution. Nucleic
7 in the number of software packages has made managing dependencies Acids Research, 45(D1), D777–D783. https://doi.org/10.1093/nar/gkw1121
Gori, K., & Baez-Ortega, A. (2018). sigfit: flexible Bayesian inference of mutational
8