You are on page 1of 4

Page 1 of 4 Bioinformatics

1
2
Bioinformatics, YYYY, 0–0
3
doi: 10.1093/bioinformatics/xxxxx
4
Advance Access Publication Date: DD Month YYYY
5
Manuscript Category
6
7
8

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac091/6528318 by guest on 16 February 2022


9 Application Note
10
11 MetaMutationalSigs: Comparison of mutational
12
13
signature refitting results made easy
14 Sanjeevani Arora1,2*, Gail Rosen3* and Palash Pandey1,3
15
16
1Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, PA, USA; 2Depart-
17 ment of Radiation Oncology, Fox Chase Cancer Center, Philadelphia, PA, USA; 3Ecological and Evo-
18 lutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engi-
19 neering, College of Engineering, Drexel University, Philadelphia, PA, USA
20 *Correspondence:
21 Gail Rosen, PhD
22 Professor
Ecological and Evolutionary Signal-processing and Informatics Laboratory
23 Department of Electrical and Computer Engineering
24 College of Engineering
3141 Chestnut St.
25 Drexel University
26 Philadelphia, PA, 19104
27 USA
Tel. 215-895-0400
28 Email: glr26@drexel.edu
29
OR
30
31 Sanjeevani Arora, PhD
32 Assistant Professor
Cancer Prevention and Control Program
33 Fox Chase Cancer Center
34 333 Cottman Avenue
Philadelphia, PA 19111
35 Tel. 215-214-3956
36 Email: Sanjeevani.Arora@fccc.edu
37 Associate Editor: XXXXXXX
38
Received on XXXXX; revised on XXXXX; accepted on XXXXX
39
40
41 Abstract
42 Motivation: The analysis of mutational signatures is becoming increasingly common in cancer genet-
ics, with emerging implications in cancer evolution, classification, treatment decision and prognosis.
43
Recently, several packages have been developed for mutational signature analysis, with each using
44
different methodology and yielding significantly different results. Because of the nontrivial differences
45 in tools’ refitting results, researchers may desire to survey and compare the available tools, in order to
46 objectively evaluate the results for their specific research question, such as which mutational signatures
47 are prevalent in different cancer types.
48 Results:
49 Due to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly
50 software that can aggregate and visually present results from different refitting packages.
51 Availability: MetaMutationalSigs is implemented using R and python and is available for installation
52 using Docker and available at: https://github.com/EESI/MetaMutationalSigs
53 Contact: Gail Rosen (glr26@drexel.edu)
Supplementary information: More information about the package including test data and results are avail-
54
able at https://github.com/EESI/MetaMutationalSigs
55
56
57
58 © The Author(s) 2022. Published by Oxford University Press.
59 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
60 (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any
medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Bioinformatics Page 2 of 4

1
2
linear regression, non-negative least squares, Bayesian inference, and sim-
3 ulated annealing, respectively. Here, we have developed a standard format
4 1 Introduction
for inputs and outputs for easy interoperability and effective comparison,
5 Mutational signature analysis provides an operative framework to respectively. With our previous experience in visualization of genomic
6 understand the somatic evolution of cancer from normal tissue (Robinson data (Lan et al., 2014), we have implemented standard visualizations for
7 et al 2020; Alexandrov et al, 2020; Brunner et al., 2019; Yoshida et al., the results of all mutational signature packages to ensure easy analysis.
8 2020; Moore et al.,2020). From the earliest phases of neoplastic changes,

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac091/6528318 by guest on 16 February 2022


MetaMutationalSigs software is easy to install and use through Docker.
cells may acquire several types of mutations in the form of single nucleo-
9
tide variants, insertions and deletions, copy number changes and chromo-
10
somal aberrations. These mutations are caused by multiple mutational 2 Approach
11 processes operative in cancer leaving behind specific footprints in the The two major methods typically used for mutational signature anal-
12 DNA that can by captured by mutational signature analysis (Alexandrov ysis are signature refitting and de-novo signature extraction. Signature re-
13 et al.,2013; Alexandrov et al, 2020). It is becoming increasingly evident fitting methods try to reconstruct the observed mutational pattern in the
14 that these mutational signatures are not only important for understanding sample (the frequencies of 96 types of mutations) using linear combina-
15 cancer evolution but also may have therapeutic implications, thus this a tions of known signatures (COSMIC Legacy SBS and COSMIC V3 SBS,
16 very active and important area of research (Iqbal et al., 2020; Campbell et ID, DBS, etc.), these methods work quite well on small sample sizes (such
17 al., 2017; Chung et al., 2020; Alexandrov et al., 2020). as single samples) and are widely used with small datasets (Omichessan
18 The basic idea behind mutational signatures is that mutational pro- et al., 2019). Signature extraction methods infer signatures from a given
19 cesses create specific patterns of mutations. Thus, it follows that if one can dataset, and then compare the extracted signatures with known reference
identify these patterns in a given sample then they can essentially detect
20 signatures. Each extracted signature is assigned to a known signature if
the corresponding mutational processes. The possible mutations are their cosine similarity exceeds a set threshold, otherwise signatures with
21
grouped into 6 single mutation types based on the base where the mutation similarity less than the threshold are ignored (Alexandrov et al.,2013).
22 was observed. These 6 single mutation types are C>A, C>G, C>T, T>A, There are a few important caveats to signature extraction as recently dis-
23 T>C, and T>G. Now, these 6 types of single mutations are further divided cussed in (Omichessan et al., 2019): 1) a novel signature can be very sim-
24 based on their context, e.g., one base preceding and one base following ilar to several reference signatures and the assignment is not always per-
25 the single mutation type, resulting in 4^2*6 = 96 mutation types. Alexan- fect and 2) the threshold for assignment plays a crucial role but is not
26 drov et al. first developed and applied this idea to cancer data acquired widely agreed upon and using a different threshold can change the assign-
27 from many datasets and identified the first iteration of 30 unique single ment (Omichessan et al., 2019).
28 base substitution (SBS) mutational signatures, which are common patterns We chose signature refitting as our primary task because refitting
29 of occurrences of the 96 mutation types and were compiled into the Cata- techniques use COSMIC signatures that are well-established, are able to
30 logue Of Somatic Mutations In Cancer (COSMIC) (Alexandrov et analyze signatures in smaller sets of samples than de novo techniques, and
al.,2013). COSMIC came to be used as the de facto reference for signature
31 are computationally less intense than de novo techniques. We imple-
refitting, we refer to these V2 signatures as COSMIC legacy SBS signa- mented high performing packages as identified in (Omichessan et al.,
32
tures (Campbell et al., 2020; Forbes et al., 2017). The initial study was 2019) that were implemented in R using a common input matrix generated
33 then expanded to the analysis of data from the Pan-Cancer Analysis of using SigProfilerMatrixGenerator (Bergstrom et al., 2019). While other
34 Whole Genomes (PCAWG) project (Campbell et al., 2020), resulting in techniques exist, including convenient web-based tools, such as Mutalisk
35 two additional variant classes, doublet base substitutions (DBS) signatures (Lee et al., 2018), that refits using a maximum likelihood estimation of the
36 and insertions/deletion (ID) signatures. With these added to the SBS, the signature contributions), and Signal (Degasperi et al., 2020), which uses
37 3 mutational signatures are in the COSMIC V3 catalog, which are in (Ale- quadratic programming or simulated annealing, there may be a desire to
38 xandrov et al., 2018). run the mutational signature analysis on local machines, that do not require
39 The mutational signature analysis workflow involves multiple steps uploading data to a third party. To address this problem, we implement a
40 that require different amounts of time and processing power. Briefly, pre- software package to compare 4 mutational signature analyzers -- Decon-
41 processing steps to take Binary Alignment Format (BAM) files, align structSigs (Rosenthal et al., 2016), MutationalPatterns (Blokzijil et. al.
them to a reference genome, and use a variant caller step to output Variant
42 2018), Sigfit (Gori et. al., 2018), Sigminer (Wang et al., 2020), which
Calling Format (VCF) files, are required from the user. These steps are build up on other tools such as (Mayakonda et al., 2018; Huang et al.,
43
usually very resource-intensive and thus do not allow for much experi- 2018). DeconstructSigs is the most cited method and uses a multiple lin-
44 mentation on personal computers; the downstream steps of variant filter- ear regression model, with coefficients constrained to positive values, to
45 ing and annotation are much faster. The final step, the mutational signature find contributions of mutational signatures to an overall signature. Muta-
46 analysis, is the least resource-intensive and, therefore, is easier for users tionalPatterns is also popular and uses non-negative least-squares (NNLS)
47 to compare multiple methods on their desktop. Therefore, to facilitate optimization to estimate the mutational signature weighting. Sigfit uses
48 comprehensive mutational signature refitting analyses, we developed the Bayesian inference to perform fitting. Sigminer uses a simulated anneal-
49 package, MetaMutationalSigs, to analyze the mutational signatures in the ing (SA) method to mutational signature fitting. Between multiple linear
50 VCF files. We developed a wrapper for 4 typically used refitting packages regression, optimization via NNLS and SA, and Bayesian estimation, us-
51 (Rosenthal et al., 2016; Blokzijil et. al. 2018; Gori et. al., 2018; Wang et ers can survey how a variety of techniques can estimate COSMIC signa-
52 al., 2020), that have diverse underlying methodologies, including multiple tures in sample(s).
53
54
55
56
57
58
59
60
Page 3 of 4 Bioinformatics

1
2
3
4
5
6
7 TCGA-AB-2804
Sigflow
COSMIC V3 SBS COSMIC Legacy SBS
1.0

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac091/6528318 by guest on 16 February 2022


mutationalPatterns 0.8
mutationalPatterns_strict 0.6

9
deconstructSigs 0.4
sigfit
0.2

mutationalPatterns_strict

deconstructSigs
mutationalPatterns
10

Sigflow

sigfit
mutationalPatterns_strict

deconstructSigs
mutationalPatterns
Sigflow

sigfit
11
12
13
14
15
16
17
18
19
20
21 Fig. 1. Workflow and results for MetaMutationalSigs. A) The workflow for mutational signature analysis starts with pre-processing steps (shown in blue boxes) required by the user to
conduct variant calling, filtering, and annotation on a BAM file of a sequenced genome or exome. Our tool, MetaMutationalSigs, conducts the steps shown in green boxes and analyzes the
22
signatures found in a VCF. B) RMSE between the reconstructed signal (from the reference signatures) and the overall signature, plotted as dots for each of the 188 patient samples in the
23 Acute Myeloid Leukemia (TCGA, GDAC Firehose Legacy, Study ID: laml_tcga) dataset, for each tool (lower values are better) and for signature type (V2 and V3 SBS and IDs; no tool
24 predicted DBS for the samples used). While RMSE does not change for deconstructSigs and Sigfit, the RMSE significantly drops for mutationalPatterns and Sigflow with COSMIC v3. C)
25 Heatmap of cosine similarity between the predicted contributions of COSMIC v3 SBS vs. COSMIC legacy SBS signatures by different tools for the same TCGA patient sample. With the
26 legacy signatures, tools are generally less in agreement in their resulting signature contributions, while with COSMIC v3 signatures, the standard use tools are all in agreement with each
other. Sigflow had the lowest RMSE and was selected for analysis in D. D) Heatmaps of Sigflow analysis of COSMIC v3 SBS vs. COSMIC legacy SBS mutational signature contributions
27
using whole-exome sequence data from three TCGA patients with acute myeloid leukemia. Here, each row is a patient sample. Left. COSMIC v3 SBS refitting provides different dominant
28 signature contributions, TCGA-AB-2804: unknown etiology, TCGA-AB-2805 and TCGA-AB-2806: unknown chemotherapy and different DNA mismatch repair signatures, SBS20 and
29 26, respectively. COSMIC Legacy SBS refitting provides signature 3 (failure of double-strand break-repair by homologous recombination) as the dominant signature for all samples. The
30 COSMIC v3 SBS refitting reveals multiple mutational processes may be playing a role in the overall signature contribution than is found with the COSMIC Legacy SBS refitting.
31
32 Our package outputs several data files in comma separated values Atlas portal (Weinstein, 2013). RMSE is a performance metric com-
33 (CSV) format ready for further analysis and visualization using external monly used in signal processing (Rosen, 2007). In Fig 1c, we plot
34 packages along with visualizations of the signature contributions as de- heatmaps of distances between methods for predicting signature contri-
35 scribed in Table 1. In Fig. 1a, we illustrate the workflow of the analysis butions for SBS V2 (Legacy) and V3. In Fig 1d, we plot the heatmap
36 (including pre-processing steps in blue and our package’s steps in of SBS contributions to overall signature reconstruction for three of the
37 green). In Fig 1b, we compare packages using the root mean squared LAML patient samples.
38 error (RMSE) between the reconstructed and actual signals for 188 my-
39 eloid leukemia (LAML) patients obtained from The Cancer Genome
40
41 Table 1. Summary of result files. Signature Version corresponds to COSMIC- Legacy or V3. Tool_name corresponds mutationalPatterns, sigfit, sigflow,
42 and deconstructSigs.
43
44 File Name Format Description
45
46 Heatmap_contributions_all_sigs_[sig- svg Contributions from all signatures of the [signature_version] to the overall signature.
47 nature_version].svg
48 Heatmap_[signature_version].svg svg Heatmap of cosine similarity between the predicted contributions by different tools for [signa-
49 ture_version].
[signature_version]_bar_charts.html html Bar charts of signature contributions per sample and per tool for [signature_version].
50
rmse_box_plot.svg svg Box plot of RMSE between the reconstructed signal (from the reference signatures) and the over-
51 all signature
52 [tool_name]\[signature_version]_sam- csv Data about the difference between reconstructed and signal for each signature of [signature_ver-
53 ple_error.csv sion] for each [tool_name] for each sample. This is used to create rmse_box_plot.svg
54 [tool_name]\[signature_version]_con- csv Data about the contribution of each signature of [signature_version] for each [tool_name] for each sample. This
55 tribution.csv is used to create the Heatmap_contributions_all_sigs_[singature_version].svg, Heatmap_[signature_ver-
56 sion].svg and [signature_version]_bar_charts.html.
57
58
59
60
Bioinformatics Page 4 of 4

1
2 Chung, J., Maruvka, Y. E., Sudhaman, S., Kelly, J., Haradhvala, N. J., Bianchi, V.,
3 … Tabori, U. (2020). DNA polymerase and mismatch repair exert distinct mi-
4 crosatellite instability signatures in normal and malignant human cells. Cancer
Discovery, CD-20-0790. https://doi.org/10.1158/2159-8290.cd-20-0790
5 3 Discussion Forbes, S. A., Beare, D., Boutselakis, H., Bamford, S., Bindal, N., Tate, J., … Camp-
6 The quick brown fox jumps over the lazy dog. The massive increase bell, P. J. (2016). COSMIC: somatic cancer genetics at high-resolution. Nucleic
7 in the number of software packages has made managing dependencies Acids Research, 45(D1), D777–D783. https://doi.org/10.1093/nar/gkw1121
Gori, K., & Baez-Ortega, A. (2018). sigfit: flexible Bayesian inference of mutational
8

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac091/6528318 by guest on 16 February 2022


quite burdensome, coupled with incompatible data formats for signature
signatures. Cold Spring Harbor Laboratory. https://doi.org/10.1101/372896
9 matrices can make mutational signature refitting results difficult and hard Huang, X., Wojtowicz, D., & Przytycka, T. M. (2017). Detecting presence of muta-
10 to compare. Our package, MetaMutationalSigs, provides a simplified ap- tional signatures in cancer with confidence. Bioinformatics, 34(2), 330–337.
proach for performing the setup related tasks so that more focus can be https://doi.org/10.1093/bioinformatics/btx604
11 Iqbal, W., Demidova, E. V., Serrao, S., ValizadehAslani, T., Rosen, G., & Arora, S.
placed on the analysis. Investigators should keep in mind that refitting ap-
12 (2021). RRM2B Is Frequently Amplified Across Multiple Tumor Types: Impli-
proaches need a priori knowledge about the samples and each package for
13 cations for DNA Repair, Cellular Survival, and Cancer Therapy. Frontiers in Ge-
effective interpretation (Maura et al., 2019), and the results should not be netics, 12. https://doi.org/10.3389/fgene.2021.628758
14 used as-is without an assessment of the cell biology and genomics. Lan, Y., Morrison, J. C., Hershberg, R., & Rosen, G. L. (2013). POGO-DB—a data-
15 Future work for this project would focus on expanding the tool to base of pairwise-comparisons of genomes and conserved orthologous genes. Nu-
16 work with more packages and keep the reference signatures updated as
cleic Acids Research, 42(D1), D625–D632. https://doi.org/10.1093/nar/gkt1094
Lee, J., Lee, A.J., Lee, J.-K., Park, J., Kwon, Y., Park, S., Chun, H., Ju, Y.S., and
17 new versions are released. Due to the open-source nature of the project, Hong, D. (2018). Mutalisk: a web-based somatic MUTation AnaLyIS toolKit
18 we also welcome additional feature requests using the project link on for genomic, transcriptional and epigenomic signatures. Nucleic Acids Re-
19 GitHub https://github.com/EESI/MetaMutationalSigs search, 46(W1), W102-W108. https://doi.org/10.1093/nar/gky406
Lee-Six, H., Olafsson, S., Ellis, P., Osborne, R. J., Sanders, M. A., Moore, L., …
20 Stratton, M. R. (2019). The landscape of somatic mutation in normal colorectal
21 Funding epithelial cells. Nature, 574(7779), 532–537. https://doi.org/10.1038/s41586-
22 019-1672-7
P.P. and G.R. were supported by NSF awards #1936791 and #1919691, and P.P. Maura, F., Degasperi, A., Nadeu, F., Leongamornlert, D., Davies, H., Moore, L., …
23 Bolli, N. (2019). A practical guide for mutational signature analysis in hemato-
was also supported by Fox Chase Cancer Center Risk Assessment Program
24 Funds. S.A. was supported by the DOD W81XWH-18-1-0148 Award and a CEP
logical malignancies. Nature Communications, 10(1).
25 Award from the Yale Head and Neck Cancer NIH SPORE.
https://doi.org/10.1038/s41467-019-11037-8
Mayakonda, A., Lin, D.-C., Assenov, Y., Plass, C., & Koeffler, H. P. (2018).
26 Maftools: efficient and comprehensive analysis of somatic variants in cancer.
27 Conflict of Interest: S.A. performs collaborative research (with no funding) with Genome Research, 28(11), 1747–1756. https://doi.org/10.1101/gr.239244.118
28 Caris Life Sciences, Foundation Medicine, Inc., Ambry Genetics and Invitae Cor- Moore, L., Leongamornlert, D., Coorens, T. H. H., Sanders, M. A., Ellis, P., Dentro,
S. C., … Stratton, M. R. (2020). The mutational landscape of normal human
29 poration. S.A. has several patents and/or pending patents related to cancer di-
endometrial epithelium. Nature, 580(7805), 640–646.
30 agnostics/treatment. All other authors declare no competing interests. https://doi.org/10.1038/s41586-020-2214-z
31 Omichessan, H., Severi, G., & Perduca, V. (2019). Computational tools to detect
signatures of mutational processes in DNA from tumours: A review and empiri-
32 References cal comparison of performance. PLOS ONE, 14(9), e0221235.
33 Alexandrov, L. B., Kim, J., Haradhvala, N. J., Huang, M. N., … Tian Ng, A. W.
https://doi.org/10.1371/journal.pone.0221235
34 (2020). The repertoire of mutational signatures in human cancer. Nature,
Robinson, P. S., Coorens, T. H. H., Palles, C., Mitchell, E., Abascal, F., Olafsson, S.,
… Stratton, M. R. (2020). Elevated somatic mutation burdens in normal human
35 578(7793), 94–101. https://doi.org/10.1038/s41586-020-1943-3
cells due to defective DNA polymerases. Cold Spring Harbor Laboratory.
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J., & Stratton, M. R.
36 (2013). Deciphering Signatures of Mutational Processes Operative in Human
https://doi.org/10.1101/2020.06.23.167668
37 Cancer. Cell Reports, 3(1), 246–259.
Rosen, G. (2007). Comparison of Autoregressive Measures for DNA Sequence Sim-
ilarity. 2007 IEEE International Workshop on Genomic Signal Processing and
38 https://doi.org/10.1016/j.celrep.2012.12.008
Statistics. Presented at the 2007 IEEE International Workshop on Genomic Sig-
Alexandrov, L. B., Kim, J., Haradhvala, N. J., Huang, M. N., … Tian Ng, A. W.
39 (2020). The repertoire of mutational signatures in human cancer. Nature,
nal Processing and Statistics. https://doi.org/10.1109/gensips.2007.4365814
40 578(7793), 94–101. https://doi.org/10.1038/s41586-020-1943-3
Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S., & Swanton, C. (2016).
deconstructSigs: delineating mutational processes in single tumors distinguishes
41 Bergstrom, E. N., Huang, M. N., Mahto, U., Barnes, M., Stratton, M. R., Rozen, S.
DNA repair deficiencies and patterns of carcinoma evolution. Genome Biology,
G., & Alexandrov, L. B. (2019). SigProfilerMatrixGenerator: a tool for visualiz-
42 ing and exploring patterns of small mutational events. BMC Genomics, 20(1).
17(1). https://doi.org/10.1186/s13059-016-0893-4
43 https://doi.org/10.1186/s12864-019-6041-2
Wang, S., Li, H., Song, M., He, Z., Wu, T., Wang, X., … Liu, X.-S. (2020). Copy
number signature analyses in prostate cancer reveal distinct etiologies and clini-
44 Blokzijl, F., Janssen, R., van Boxtel, R., & Cuppen, E. (2018). MutationalPatterns:
cal outcomes. Cold Spring Harbor Laboratory.
comprehensive genome-wide analysis of mutational processes. Genome Medi-
45 cine, 10(1). https://doi.org/10.1186/s13073-018-0539-0
https://doi.org/10.1101/2020.04.27.20082404
46 Brunner, S. F., Roberts, N. D., Wylie, L. A., Moore, L., Aitken, S. J., Davies, S. E.,
Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A.,
… Stuart, J. M. (2013). The Cancer Genome Atlas Pan-Cancer analysis project.
47 … Campbell, P. J. (2019). Somatic mutations and clonal dynamics in healthy
Nature Genetics, 45(10), 1113–1120. https://doi.org/10.1038/ng.2764
and cirrhotic human liver. Nature, 574(7779), 538–542.
48 https://doi.org/10.1038/s41586-019-1670-9
Yoshida, K., Gowers, K. H. C., Lee-Six, H., Chandrasekharan, D. P., Coorens, T.,
49 Campbell, B. B., Light, N., Fabrizio, D., Zatzman, M., Fuligni, F., de Borja, R., …
Maughan, E. F., … Campbell, P. J. (2020). Tobacco smoking and somatic mu-
tations in human bronchial epithelium. Nature, 578(7794), 266–
50 Shlien, A. (2017). Comprehensive Analysis of Hypermutation in Human Cancer.
272.https://doi.org/10.1038/s41586-020-1961-1
Cell, 171(5), 1042-1056.e10. https://doi.org/10.1016/j.cell.2017.09.048
51 Campbell, P. J. et al. (2020). Pan-cancer analysis of whole genomes. Nature,
52 578(7793), 82–93. https://doi.org/10.1038/s41586-020-1969-6
53 Degasperi, A., Amarante, T. D., Czarnecki, J., Shooter, S., … Nik-Zainal, S. (2020).
A practical framework and online tool for mutational signature analyses show
54 intertissue variation and driver dependencies. Nature Cancer, 1(2), 249–263.
55 https://doi.org/10.1038/s43018-020-0027-5
56
57
58
59
60

You might also like