Professional Documents
Culture Documents
Transcriptome-wide mapping of
N6-methyladenosine by m6A-seq based on
immunocapturing and massively parallel
sequencing
Dan Dominissini1–3, Sharon Moshitch-Moshkovitz1,3, Mali Salmon-Divon1, Ninette Amariglio1 &
Gideon Rechavi1,2
1Cancer Research Center, Chaim Sheba Medical Center, Tel Hashomer, Israel. 2Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel. 3These authors contributed
equally to this work. Correspondence should be addressed to G.R. (gidi.rechavi@sheba.health.gov.il).
Introduction
Over 100 modifications are known to decorate all four canoni- laborious, of low throughput and require several iterations in
cal nucleotides of RNA to create a complexity suiting the versatile order to pinpoint a single site, but most importantly they are
nature of this molecule, now known to exceed its classic role as a hypothesis driven, narrowing the search to a specific transcript
carrier of genetic information1,2. Some of these modifications are or nucleotide.
of regulatory importance, similar to dynamically regulated DNA In the field of genetics, the localization-function relationship
and protein modifications3,4. has an equivalent importance to the more general structure-
Methylation of the N6 position of adenosine (m6A) is a wide- function relationship in fueling of discovery. The major break-
spread and enigmatic post-transcriptional RNA modification5; the throughs, such as those accomplished for 5-methylcytosine and
devastating phenotypic consequences of its obliteration have been recently for 5-hydroxymethylcytosine, are attributed to the ability
documented in a growing number of organisms6. Especially illu- to map the global landscapes of these modifications and then to
minating in that they provide a physiological context are its proven superimpose it on top of other regulatory layers.
role in gametogenesis of Saccharomyces cerevisiae7, as well as the By harnessing the advantages of two established and power-
recent discovery that the fat mass and obesity-associated (FTO) ful technologies—immunocapturing and massively parallel
gene, a central regulator of metabolism and an obesity risk gene, sequencing—we were able to develop a new, relatively simple
is an m6A-demethylase8. method for the transcriptome-wide localization of m6A in high
Although being the most prevalent internal modification resolution13. In summary, we used a highly m6A-specific antibody
in mRNA of eukaryotes5, until recently its study still lagged far to immunoprecipitate methylated RNA fragments out of a ran-
behind that of other common RNA and DNA modifications such as domly fragmented transcriptome. We then subjected these frag-
adenosine-to-inosine RNA editing9 and 5-methylcytosine10. ments to massively parallel sequencing and identified positions of
As m6A has no effect on Watson-Crick base pairing or known signal enrichment relative to input control.
chemical derivatization reactions, it could not be identified We used m6A-seq to profile the human and mouse transcrip-
by reverse transcription–based methods 11. These limitations tomes13. This study revealed thousands of methylated sites char-
prevented the development of robust and efficient procedures acterized by a typical consensus in over 7,000 gene transcripts.
for its global mapping. Existing methods for localizing m 6A Strikingly, m6A sites tend to cluster around stop codons and within
within a sequence context require that first a specific transcript long internal exons. This nonrandom distribution is highly con-
be isolated (by either pull-down or nuclease protection, using served between humans and mice, suggesting a fundamental role
a complementary probe), and then subsequently fragmented for this modification. The global overview offered by m6A-seq
to its constituent nucleotides and analyzed by any number of helped reveal these unifying principles and will hopefully
physicochemical techniques (thin-layer chromatography (TLC), provide the framework for detailed functional analyses to come.
high-performance liquid chromatography (HPLC), mass spec- Our results have since been confirmed by another group14, which
trometry, scintillation and so on)12. These methods are obviously used an almost identical approach. Their experimental protocol
align it back to the genome, in effect limiting the resolution of is an integral part of the protocol. We typically strive for a size
the method. Whereas resolution is admittedly not at the single- distribution centered on ~100 nt. In our experience, even small
nucleotide level, when combined with consensus information a changes in incubation time and temperature, or the presence
relatively high resolution is achieved. As the m6A-seq approach of residual EDTA and/or salts, will affect fragmentation effi-
also relies on enrichment of methylated RNA fragments and their ciency. Importantly, RNA concentration is a major determinant
physical separation from the rest of the fragments, stoichiometry of efficiency. We thus strongly recommend calibrating this step
information is largely lost, making it insensitive to the proportion before fragmenting your entire sample (Fig. 3). Small volumes
of methylated transcripts or sites. (20 µl), batches of no more than five tubes, thin-walled tubes
m6A-seq allows a transcriptome-wide, hypothesis-independent and a thermocycler (for accurate temperature setting) will help
identification of thousands of methylation sites, representing its ensure reproducible results. Any change in fragment size or
greatest advantage over previous methods. The latter involved
laborious biochemical procedures applied to individual puri- m
Steps 1–5
sample
accessible materials, equipment and software packages. m
m
m
m
Experimental design m
(~100 nt) m
m
suitable, as long as purified RNA does not contain EDTA Random primed cDNA
IP library generation,
or salts that interfere with downstream fragmentation. adaptor ligation and
Steps 26–28
(ii) RNA integrity is influential. As RNA is chemically Illumina sequencing
(MEME)
Peak annotation
Steps 39–42
purpose. However, currently there is not enough data to assess the
(PeakAnalyzer)
Steps 48 and 49 relative efficiencies of the two approaches.
Other Quality control. You may want to validate the success of the protocol
downstream Central enrichment
analysis up to this point before proceeding to preparation and sequencing of
analyses (GO,
comparison to (Centrimo) the libraries. Because of the extremely high success rates of the pro-
gene Steps 43–47
expression and so on)
tocol, we do not routinely implement this step; therefore, procedural
details for quality control (QC) are not included in the protocol.
A straightforward way would be to use reverse transcription–
quantitative PCR (RT-qPCR) to assess depletion of methylated
fragmentation method should be compatible with ensuing steps, transcript fragments from the supernatant that remains after
particularly library preparation. IP (see PROCEDURE Step 18) relative to input control. An
unmethylated transcript, for which no or only minimal depletion
Immunoprecipitation. Before proceeding to IP, save a few micro- is observed, is an adequate negative control. The works by us13 and
grams of fragmented RNA to serve as input control in RNA-seq. by the Jaffrey laboratory14 provide a host of human and mouse
The input library is required for determining signal enrichment in methylated transcripts to choose from. Be sure to choose a couple
the immunoprecipitated sample. of transcripts, as a specific one may not be methylated in your tissue
IP of m6A-containing RNA fragments is an important step that may or under the conditions of your experiment.
be susceptible to RNA degradation owing to possible RNase contami-
nation carried over with the antibody or protein A beads. It is extremely
Fragmented RNA
important to add RNase inhibitors: we typically use both RNasin Plus
A
N
lR
to ct
ta
ta
MATERIALS
REAGENTS • β-Mercaptoethanol (β-ME; Sigma-Aldrich, cat. no. M7522) ! CAUTION β-ME
• Cultured cells or tissues (any cell line or tissue can be used as a source for is highly toxic. Wear protective clothing, including gloves, eye and face
RNA) ! CAUTION Adhere to all relevant institutional ethics guidelines. masks when handling it.
• TBE buffer (10×; BioLab, cat. no. 201423) • RNaseKiller solution (5 PRIME, cat. no. 2900630)
• PerfectPure RNA cultured cell kit (5 PRIME, cat. no. 2302340) • mRNA-seq sample preparation kit (Illumina, cat. no. 1004814)
• ZnCl2 (Sigma-Aldrich, cat. no. 96468) • TruSeq RNA sample preparation kit (Illumina, cat. no. 15013136)
• Ultrapure water (Biological Industries, cat. no. 01-866-1B) • TruSeq SBS kit v5-GA, 36-cycles (Illumina, cat. no. 15013676)
• Tris-HCl (pH 7.0, 1 M; Sigma-Aldrich, cat. no. T2413) • TruSeq SR cluster kit v2-Bot-GA (Illumina, cat. no. 15019749)
• Tris-HCl (pH 7.4, 1 M; Sigma-Aldrich, cat. no. T2663) • GeneRuler low-range DNA ladder (Fermentas, cat. no. SM1193)
• GenElute mRNA miniprep kit (Sigma-Aldrich, cat. no. MRN70) • Agilent DNA 100 kit (Agilent, cat. no. 5067-1504)
• Sodium acetate (pH 5.2, 3 M; Sigma-Aldrich, cat. no. S7899) • QIAquick gel extraction kit (Qiagen, cat. no. 28704)
• EDTA (pH 8.0, 0.5 M; Sigma-Aldrich, cat. no. 03690) • QIAquick PCR purification kit (Qiagen, cat. no. 28104)
• Glycogen (5 mg ml − 1; Life Technologies, cat. no. AM9510) • MinElute PCR purification kit (Qiagen, cat. no. 28004)
• Ribonucleoside vanadyl complexes (RVC; 200 mM; Sigma-Aldrich, • Quant-iT RNA assay kit (100 assays; Life Technologies, cat. no. Q32852)
cat. no. R3380) • Agilent RNA 6000 Pico kit (Agilent, cat. no. 5067-1513)
• Agarose (Sigma-Aldrich, cat. no. A9539) • Test data set: m6A-seq of human hepatocarcinoma cell line (HepG2)
• RNasin Plus RNase inhibitor (Promega, cat. no. N2611) can be obtained from Gene Expression Omnibus (GEO) at accession
• NaCl (5 M; Sigma-Aldrich, cat. no. S6546) code GSE37003
• Igepal CA-630 (Sigma-Aldrich, cat. no. I8896) • Gene annotation from Ensembl in GTF format (ftp://ftp.ensembl.org/pub/
• Affinity purified anti-m6A rabbit polyclonal antibody (Synaptic Systems, release-67/gtf/homo_sapiens/Homo_sapiens.GRCh37.67.gtf.gz)
cat. no. 202 003) • Human reference genome sequence, build 37 (hg19), downloaded from the
• N6-Methyladenosine, 5′-monophosphate sodium salt (Sigma-Aldrich, University of California at Santa Cruz (UCSC) (http://hgdownload.cse.ucsc.
cat. no. M2780) edu/goldenPath/hg19/chromosomes/) or from Ensembl (ftp://ftp.ensembl.
• Ethanol (Sigma-Aldrich, cat. no. E7023) ! CAUTION Ethanol is highly org/pub/release-67/fasta/homo_sapiens/dna/). Concatenate the chromo-
flammable; keep flammable liquids away from all sources of ignition. somal fasta files into a single multifasta file called ‘hg19.fa’
• Ethidium bromide (Sigma-Aldrich, cat. no. E1510) ! CAUTION Ethidium • SRA toolkit (version 2.1.10) for SRA to .fastq conversion (http://www.ncbi.
bromide is a highly toxic carcinogen; wear gloves and lab coats when nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=so
handling it. ftware)
• Loading dye (Fermentas, cat. no. R0631) • FastQC tool (version 0.10.1) for quality checks of sequenced reads
• Immobilized recombinant protein A (Repligen, cat. no. IPA300) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
• BSA (20 mg ml − 1; Sigma-Aldrich, cat. no. B8667) • Bowtie26 (version 0.12.7) for read mapping (http://bowtie-bio.sourceforge.
• SuperScript II reverse transcriptase (Life Technologies, cat. no. 18064-014) net/index.shtml)
• Gel imager of m6A, 7 µl of RNasin Plus and 203 µl of water (use molecular biology–
• Magnetic separation rack grade, RNase-free water). Freshly prepare the buffer. Final concentrations are
• Thermocycler machine (96-well plate; Applied Biosystems or equivalent) 1× IP buffer and 6.7 mM m6A.
• Gel electrophoresis system m6A-specific antibody stock solution, 0.5 mg ml − 1 Reconstitute 50 µg of
• Weighing scale lyophilized affinity-purified m6A-specific antibody in 100 µl of molecular
• Weighing boats biology–grade, RNase-free water. Store the stock solution in aliquots of 25 µl
• NanoDrop spectrophotometer (NanoDrop Technologies, ND-1000, or at − 20 °C and use them within 12 months.
equivalent) EQUIPMENT SETUP
• Agilent 2100 Bioanalyzer or equivalent Bioinformatics Most of the commands given in the protocol can be run on
• Head-over-tail rotator the UNIX shell prompt, and are meant to be run from the example working
• 64-bit computer running Linux or Mac OS X; 4GB of RAM directory. We assume that the location of all software tools is defined in your
• Cell scraper PATH. Commands meant to be executed from the UNIX shell are prefixed
• Homogenizer with a ‘$’ character.
PROCEDURE
RNA isolation ● TIMING 3 h
1| Lyse the cells by adding PerfectPure RNA cultured cell kit lysis solution supplemented with 143 mM β-ME directly onto
the cells. Collect the lysed cells with a cell scraper and vortex until the sample is homogenous. Tissues should be thoroughly
homogenized in lysis solution with the aid of a homogenizer. Next, pass the lysate through an 18–21-gauge syringe needle
several times.
CRITICAL STEP Ensure that you have enough starting material, depending on your choice of RNA type for analysis
(see Experimental design).
CRITICAL STEP Passing the lysate through a needle will improve RNA yields and ensure shearing of genomic DNA.
Note that the antibody will also recognize m6A in the context of DNA if it is present in your organism.
PAUSE POINT Lysates can be stored at −80 °C until further processing for up to 6 months.
2| Isolate total RNA according to the manufacturer’s instructions, including on-column DNase treatment.
CRITICAL STEP Elute RNA in molecular biology–grade water, as elution buffers may interfere with subsequent RNA
fragmentation. Elution volume should be as low as possible. Do not omit DNase treatment, as contaminating DNA will
interfere with downstream analysis. DNase treatment is all the more essential in organisms with m6A in their DNA.
CRITICAL STEP Note that this RNA isolation method is not suitable for small RNAs (<150 nt), as they are lost during
the process.
PAUSE POINT Isolated RNA can be stored at −80 °C for up to 1 year until further use.
4| Validate RNA integrity by agarose gel electrophoresis or analysis on an Agilent 2100 Bioanalyzer.
CRITICAL STEP Metal-ion induced fragmentation later in the protocol can cause RNA degradation products, which are
smaller in size to begin with, to escape processing and analysis.
CRITICAL STEP We advise using an RNA-dedicated electrophoresis apparatus.
? TROUBLESHOOTING
5| If desired, enrich for polyadenylated RNA by at least one round of oligo-dT selection using the GenElute mRNA
miniprep kit. Depletion of rRNA using the RiboMinus transcriptome isolation kit is an alternative. Note that this step
is optional; m6A-seq works well when performed on total RNA (see Experimental design).
CRITICAL STEP Elution volumes should be large here (polyadenylated RNA from 500 µg of total RNA is eluted in 100 µl) in
order to ensure maximum yields, and they will typically result in low RNA concentration. Either concentrate RNA by ethanol
precipitation or recalibrate fragmentation (Step 6).
Fragmentation ● TIMING 3 h
6| Adjust the RNA concentration to ~1 µg µl − 1 with RNase-free water. Set up the following fragmentation reaction in a
thin-walled 200-µl PCR tube. Vortex and spin down the tube.
Total volume 20
CRITICAL STEP Adherence to the specified amounts and volumes is highly recommended, as scaling may affect
fragmentation efficiency and the resulting size distribution. Work quickly at this stage and immediately proceed to Step 7.
We advise working in batches of five tubes (300 µg of RNA require ~17 tubes). Substituting metal ion–induced fragmentation
with physical fragmentation by sonication is not advised, as it yields fragments >200 nt and might not be entirely random.
7| Incubate the tubes at 94 °C for 5 min in a preheated thermal cycler block with the heated lid closed. Remove the tubes
from the block and immediately add 2 µl of 0.5 M EDTA. Vortex and spin down the tubes and place them on ice.
CRITICAL STEP Time and temperature settings should be closely followed. Work quickly at this stage.
8| Repeat Steps 6 and 7 for each batch of five tubes until all of the RNA is fragmented.
9| Collect contents of all tubes, add one-tenth volumes of 3 M sodium acetate (pH 5.2), glycogen (100 µg ml−1 final) and
2.5 volumes of 100% ethanol. Mix the contents and incubate at −80 °C overnight.
CRITICAL STEP Do not use nucleic acids as carriers for precipitation, as they will interfere with downstream IP and sequencing.
PAUSE POINT RNA is stable in the precipitation mixture when stored at −80 °C for a long time period.
10| Centrifuge the tubes at 15,000g for 25 min at 4 °C. Discard the supernatant, taking care not to disrupt the pellet, which
is easily visible because of the presence of glycogen. Wash the pellet with 1 ml of 75% (vol/vol) ethanol and centrifuge
again at 15,000g for 15 min at 4 °C.
12| Validate RNA postfragmentation size distribution by measuring RNA concentration with a NanoDrop spectrophotometer
and running 0.5 µg of RNA on 1.5% (wt/vol) agarose gel for ~30 min. The outlined fragmentation procedure should produce
a distribution of RNA fragment sizes centered on ~100 nt (Fig. 3). Alternatively, fragmented RNA can be run according to
the manufacturer’s instructions on an Agilent 2100 Bioanalyzer with an Agilent RNA 6000 Pico kit.
CRITICAL STEP Validate RNA size distribution only after it has been ethanol-precipitated, as the presence of salts may
affect gel migration of the fragments. We advise using an RNA-dedicated electrophoresis apparatus.
? TROUBLESHOOTING
PAUSE POINT RNA can be stored at −80 °C at this stage until further use for up to 1 year.
Immunoprecipitation ● TIMING 5 h
13| Save a portion of untreated fragmented RNA to serve as input control in RNA-seq. To be on the safe side, we recommend
saving a few micrograms, although much less will also suffice.
PAUSE POINT RNA can be stored at −80 °C for up to 1 year before use in Step 26.
14| Adjust the volume of the remaining RNA to 755 µl with RNase-free water. Prepare the reaction mixture tabulated below
in a 1.7-ml low-binding microcentrifuge tube. Vortex and spin down the tube. We recommend setting up a parallel reaction
that includes the same amount of fragmented RNA but without the antibody. It will be treated in the same manner as the
IP sample throughout Steps 15–27 and will serve as a bead-only control to assess background levels and efficiency of RNA
elution at Steps 25 and 27.
Fragmented RNA (from Step 12) 755 Varies ( >5 µg of mRNA, >300 µg of total RNA)
IP buffer, 5× 200 1×
CRITICAL STEP We recommend using low-binding RNase/DNase-free microcentrifuge tubes from this step onward.
16| While the samples are incubating, wash 200 µl of recombinant protein A bead slurry twice in 1 ml of 1× IP buffer.
Resuspend the beads in 1 ml of 1× IP buffer supplemented with BSA (0.5 mg ml − 1) and incubate on a rotating wheel
for 2 h. Spin down, remove and discard the supernatant and wash twice in 1 ml of 1× IP buffer. Equally divide the beads
between two 1.7-ml microcentrifuge tubes (one for the IP sample and one for the bead-only control).
CRITICAL STEP Remember to supplement the IP buffer with RNasin and RVC. Do not exceed the specified quantity of
beads (already in excess), as it can influence background levels.
17| Transfer the reactions from Step 15 into the bead-containing tubes prepared in Step 16. Incubate the reaction mixtures
for 2 h on a rotating wheel at 4 °C.
18| Spin down the beads and carefully remove and retain the supernatant. Wash the beads with 1 ml of 1× IP buffer
three times.
CRITICAL STEP Remember to supplement the IP buffer with RNasin and RVC. Try to minimize bead loss, as the amount of
precipitated RNA is scarce.
CRITICAL STEP Bear in mind that the desired population of m6A-enriched RNA fragments is not in the supernatant, but
rather is still on the beads. The supernatant is saved for the sake of IP QC (see Experimental design and Step 25 below).
20| Spin down the beads and carefully remove and retain the supernatant (now containing eluted RNA fragments).
CRITICAL STEP Take special care not to aspirate the beads, as it will increase background noise.
21| Add 100 µl of 1× IP buffer to the sedimented beads and gently tap the tube to mix. Spin down the beads and carefully
remove and retain the supernatant.
CRITICAL STEP Remember to supplement IP buffer with RNasin and RVC.
23| Combine all eluates from the same sample (IP or bead-only control) and add one-tenth volumes of 3 M sodium acetate
(pH 5.2), and 2.5 volumes of 100% ethanol. Mix and incubate the sample at − 80 °C overnight.
CRITICAL STEP Do not add glycogen (or other carrier) to the precipitation mixture at this stage, as it can precipitate
free m6A, possibly interfering with downstream reactions and measurements.
PAUSE POINT RNA is stable in the precipitation mixture when stored at − 80 °C for a long period of time.
25| Measure RNA concentration (using 1 µl from Step 24) with the Quant-iT RNA assay kit.
© 2013 Nature America, Inc. All rights reserved.
CRITICAL STEP Assuming the minimal starting RNA amounts specified in the Experimental design section, expect yields
on the order of tens of nanograms, depending on the cell line or tissue of origin. The product of the bead-only control
reaction should be below the detection threshold of the Quant-iT RNA assay kit. Absorbance-based measurements of RNA
concentration are not sensitive enough for this application and should not be used.
CRITICAL STEP Optionally, at this point you might choose to QC your IP before proceeding to library generation and
sequencing (see Experimental design).
? TROUBLESHOOTING
PAUSE POINT RNA can be stored at −80 °C at this stage until further use for up to 1 year.
27| Proceed to second-strand cDNA synthesis, DNA end repair, adapter ligation and PCR amplification using mRNA-seq or
TruSeq sample preparation kits according to the manufacturer’s instructions.
CRITICAL STEP Note that library validation by Agilent Technologies 2100 Bioanalyzer should demonstrate that the
bead-only control sample did not produce a library.
CRITICAL STEP Size-selecting the library by gel excision is advised. The expected size range of the desired band depends
on fragmentation output (Step 12), as well as on adapter and sequencing primer lengths.
? TROUBLESHOOTING
PAUSE POINT Libraries can be stored at − 20 °C. According to the manufacturer’s instructions of the TruSeq sample
preparation kit, it is not recommended to store libraries for more than a week.
28| Subject libraries to cluster generation and next-generation sequencing on the Illumina GAIIx platform (or similar NGS
machine) via the 36-cycle sequencing module, according to the manufacturer ’s instructions. Longer reads can also be used.
CRITICAL STEP Assuming that m6A-seq is applied to total RNA and also that ~30 million reads per lane are obtained,
we recommend allocating separate lanes to IP and input samples. Multiplex sequencing by the use of indexed adapters can
be used as long as enough reads are obtained by the sequencing platform in use.
$ fastq-dump "SRR456551.sra"
$ fastq-dump "SRR456552.sra"
$ fastq-dump "SRR456553.sra"
$ fastq-dump "SRR456554.sra"
CRITICAL STEP The outlined analysis scheme that follows is applicable to any data set generated by m6A-seq. Here,
a stepwise analysis is demonstrated on a test data set that can be obtained from GEO accession no. GSE37003.
30| Perform simple QC checks to ensure that the raw data look good, with no biases that could affect results. Use the FastQC
tool, which provides summary graphs and tables.
$ fastqc *.fastq
32| Map reads to a reference genome, using any short-read aligner such as BWA30 or Bowtie26. Use Bowtie aligner allowing
up to five multi-hits for each read, meaning that all reads matching to more than five places are excluded.
$ bowtie -m 5 -a --sam --best --strata bowtie_index/hg19 IP.fastq
> IP.sam
$ bowtie -m 5 -a --sam --best --strata bowtie_index/hg19
Input.fastq > Input.sam
If you wish to focus on methylation sites overlapping with splice junctions, use a splicing-aware aligner such as TopHat31.
From our experience, a very small fraction of reads (<1%) map to exon-exon junctions, and hence we ignore these in down-
stream analysis. For reads longer than 50 nt, consider using Bowtie 2 (ref. 32), which will search for multiple alignments and
report the best one.
35| Use the output file ‘m6A_peaks.xls’ generated by MACS, containing information about called peaks, to sort out only
those peaks having an FDR < 5%, and save them in a separate file.
$ awk '{if($9 < = 5) print }' m6A_peaks.xls > m6A_sig_peaks.xls
38| Place the BigWig file in an http, https or ftp location, and then upload it as a custom track to the UCSC genome browser
(see http://genome.ucsc.edu/goldenPath/help/bigWig.html for detailed instructions).
© 2013 Nature America, Inc. All rights reserved.
40| Map the peak-summit regions to annotated genes in order to fetch sequences from the sense strand.
$ awk '{print "chr" $0}'
Homo_sapiens.GRCh37.67.gtf >
genes.gtf
$ intersectBed -wo -a bestPeaks.location -b genes.gtf | awk -v
OFS="\t" '{print $1,$2,$3,"*","*",$10}'|uniq > bestPeaks.bed
CRITICAL STEP If your peak file contains the string ‘chr’ before chromosome numbers, skip the first command.
41| Use the fastaFromBed utility (BedTools) to fetch sequences taking the strand information into consideration.
$ fastaFromBed -s -fi hg19.fa -bed bestPeaks.bed -fo bestPeaks.fa
42| Run MEME for de novo motif finding. The command below retrieves the top three motifs.
$ meme bestPeaks.fa -dna -nmotifs 3 -maxsize 1000000 -o
bestPeaks_meme
Determining peak summit-to-motif distance ● TIMING 10 min
43| Generate a location file containing sequences flanking the peak summits (±150 nt).
$ awk '{if($1~/[^#]/) {summit=$2-1+$5; print $1 "\t" summit-151
"\t" summit+150} }' m6A_sig_peaks.xls >
m6A_sig_peaks_summit.location
44| Intersect with annotated genes in order to fetch the sequences from the sense strand.
$ intersectBed -wo -a m6A_sig_peaks_summit.location -b genes.gtf
| awk -v OFS="\t" '{print $1,$2,$3,"*","*",$10}'|uniq >
m6A_sig_peaks_summit.bed
45| Fetch sequences from the .fasta file containing the sequence of the human genome.
$ fastaFromBed -s -fi hg19.fa -bed m6A_sig_peaks_summit.bed -
fo m6A_sig_peaks_summit.fa
46| Run CentriMo.
$ centrimo --motif 1 --o peaks_motif_centrimo --norc
m6A_sig_peaks_summit.fa bestPeaks_meme/meme.txt
representing the top MEME-deduced consensus motif for the 1,000 best-
scoring m6A peaks (Steps 39–42). The height of a nucleotide at each
position reflects its frequency. (b) CentriMo-generated (Steps 43–47) density
curves of the motif in a at positions flanking the peak summit (red) relative
Bits
1
to negative control peaks (blue). P values are indicated for each curve.
Probability (× 10–3)
mation on chromosome start and end coordinates
4 P = 8.9 × 10–1
(without any header lines).
$ awk '{if($1~/[^#]/) print $1 "\t" $2 "\t" $3}' 3
2
P = 3.3 × 10–704
1
49| Run the command line utility ‘PeakAnnotator’ of
PeakAnalyzer29. 0
$ java -jar PeakAnnotator.jar -u ndg -p –120 –90 –60 –30 m6A 30 60 90 120
Peak-summit
m6A_sig_peaks_PAinput.txt -a genes.gtf -g
Position of best site in sequence (nt)
all -o ./
CRITICAL STEP There are two points worth taking into consideration when performing annotation analysis: first, multiple
transcripts overlapping a given location are all reported. Hence, if you are interested in generating statistics regarding peak
locations within genes, this analysis can be performed on ‘canonical transcripts’ instead of on all isoforms in order to avoid
bias toward multi-isoform genes. Second, PeakAnnotator reports the overlap of the first, central and last nucleotide of a peak
inside a gene. As peak locations reported by MACS are not centered on the summit, you can use the summit location file
output by MACS (‘m6A_summits.bed’) as the input for PeakAnnotator in order to retrieve overlaps with summit regions
(assumed to be in close proximity to the actual methylated nucleotide, Fig. 4). Note that whereas the input for PeakAnalyzer
is 1-based the summit locations reported by MACS are 0-based.
? TROUBLESHOOTING
Troubleshooting advice can be found in Table 1.
3 Low RNA yield RNA was not fully eluted from the Re-elute RNA from column by adding RNase-free
column water preheated to 70 °C
Too little starting material Repeat RNA isolation with larger amount of starting
material
RNA is degraded Proceed to verify RNA integrity in Step 4
None of the above Repeat RNA purification. Make sure the kit has not
expired. Closely follow the protocol, especially after
DNase treatment
4 RNA appears to be degraded on RNA is contaminated with RNase Repeat RNA purification. Before starting, clean your
the gel work area, pipettes and gloves with RNaseKiller
(5 PRIME) or a similar product
Genomic DNA is apparent in the Unsuccessful DNase treatment Repeat DNase treatment
agarose gel wells
(continued)
12 RNA fragments appear to be Stop solution was not added to Repeat fragmentation. Make sure you quickly add the
shorter than expected the sample immediately after stop solution, mix and place the sample on ice
fragmentation
RNA fragments appear to be Incorrect concentration of Prepare a fresh fragmentation buffer
shorter/longer than expected fragmentation buffer
RNA concentration is too low/high Repeat fragmentation with concentrated RNA or
recalibrate
Fragmentation time and Repeat fragmentation under the specified conditions.
temperature are different than Check the block temperature, before repeating
those stated in the protocol fragmentation
RNA fragments seem to be longer RNA sample was eluted from the Ethanol precipitate your sample to remove any EDTA
than expected column with an elution buffer and repeat RNA fragmentation
containing EDTA
© 2013 Nature America, Inc. All rights reserved.
25 RNA is detected in bead-only Beads were not washed properly Repeat immunoprecipitation and elution
control (in comparable amounts
to IP)
RNA was solvent extracted and not Repeat immunoprecipitation and elution
eluted
RNA is not detected in the IP RNA starting amounts were too Repeat with larger starting amounts
sample small
Cells or tissue of origin have very Repeat with larger starting amounts for validation
low levels of methylation
RNA was degraded during elution Repeat and be sure to supplement IP and elution
buffers with RNase inhibitors
● TIMING
Day 1
Steps 1–5, RNA isolation: 3 h
Steps 6–10, fragmentation: 3 h
Day 2
Steps 11 and 12, validation of postfragmentation size distribution: 2 h
Steps 13–18, immunoprecipitation: 5 h
Steps 19–23, elution: 2.5 h
Days 3–7
Steps 24–25, recovery of precipitated RNA: 1 h
ANTICIPATED RESULTS
Pre-processing QC
The majority of FastQC metrics (Step 30) are expected to be comparable between IP and input control samples. Notably,
however, ‘Per sequence GC content’ is consistently higher in IP than in input control; this is apparently a property of
methylated regions. Additional small deviations can be attributed to the presence of adapter sequences, PCR duplicates,
repetitive sequences and rRNA. FastQC reports for two samples, SRR456557.fastq (input) and SRR456551.fastq (IP),
are available at http://sheba-cancer.org.il/Nat_protocols/Input_fastqc/fastqc_report.html and at http://sheba-cancer.org.
il/Nat_protocols/IP_fastqc/fastqc_report.html, respectively.
Read mapping
It is noteworthy that PCR duplicates can give rise to artifactual enrichments, and are therefore best eliminated. Sample-to-
sample variation in the extent of redundancy is influenced to a large degree by starting RNA amounts: redundancy increases
as starting amounts decrease.
© 2013 Nature America, Inc. All rights reserved.
Peak calling
In our test data set, the specified configuration of MACS (Steps 33–35) identified 35,098 peaks, of which 34,352 had an FDR
of ≤5%. Further filtering by fold changes of ≥4 results in 32,105 peaks. For each peak, MACS reports the chromosome name,
start and end positions, length of the peak region, summit location relative to the peak start position, number of reads in
the peak region, P value for the peak region, fold enrichment for this region (compared with the expectation from Poisson
distribution with local lambda) and an FDR.
Peak visualization
The UCSC genome browser session of the test data set (Steps 36–38) is available at http://genome.ucsc.edu/cgi-bin/
hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=mali&hgS_otherUserSessionName=m6A_protocol. Representative
plots are given in Figure 5.
Analysis of the test data set shows that m6A-enriched regions are generally discrete, and that they typically form sharp
peaks along the transcriptome. There are ~2.1 peaks per gene. Of the genes that harbor more than one peak, some contain
two or more contiguous peaks, suggesting that peak clustering is a feature of the methylome. Acknowledging this feature,
you may choose to further increase resolution by using the PeakSplitter utility of PeakAnalyzer to subdivide peak regions
containing more than one site of signal enrichment (not described in this protocol).
Input
3.3 × 10 .–704
0
5′ 3′
Annotation
300 UBE2Q1 (chr1:152,787,675-152,797,744)
No. of reads
The final part of the proposed analysis deals with peak dis-
tribution among different genomic features. PeakAnnotator
generates several files: ‘m6A_sig_peaks_PAinput.summary. 0
5′ 3′
txt’ reports the overlapping and the nearest downstream
Figure 5 | Representative human gene plots harboring m6A peaks.
gene for each peak. For peaks that localize within genes,
Normalized coverage of IP and input control is indicated in red and blue,
the position of the peak relative to gene features (exons, respectively, above gene architecture in a UCSC format. Thick black boxes
introns, 3′ and 5′ untranslated regions) is reported in the represent exons; thin black boxes represent 5′ and 3′ untranslated regions
‘m6A_sig_peaks_PAinput.overlap.txt’ file. (UTRs); thin lines represent introns.
update. Nucleic Acids Res. 39, D195–D201 (2011). N6-methyladenine-containing 5′-terminal oligonucleotides of mRNA.
2. He, C. Grand challenge commentary: RNA epigenetics? Nat. Chem. Biol. 6, J. Biol. Chem. 254, 4327–4330 (1979).
863–865 (2010). 22. Munns, T.W., Sims, H.F. & Liszewski, M.K. Immunospecific retention of
3. Chan, C.T. et al. A quantitative systems approach reveals dynamic control oligonucleotides possessing N6-methyladenosine and 7-methylguanosine.
of tRNA modifications during cellular stress. PLoS Genet. 6, e1001247 J. Biol. Chem. 252, 3102–3104 (1977).
(2010). 23. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9,
4. Schaefer, M. et al. RNA methylation by Dnmt2 protects transfer RNAs R137 (2008).
against stress-induced cleavage. Genes Dev. 24, 1590–1595 (2010). 24. Feng, J., Liu, T. & Zhang, Y. Using MACS to identify peaks from ChIP-seq
5. Bokar, J. Fine-tuning of RNA functions by modification and editing. data. Curr. Protoc. Bioinformatics 34, 2.14.1–2.14.14 (2011).
in Topics in Current Genetics 12 (ed. Grosjean, H.) 141–177 (Springer, 25. Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA
2005). datasets. Bioinformatics 27, 1696–1697 (2011).
6. Zhong, S. et al. MTA is an Arabidopsis messenger RNA adenosine 26. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-
methylase and interacts with a homolog of a sex-specific splicing factor. efficient alignment of short DNA sequences to the human genome. Genome
Plant Cell 20, 1278–1288 (2008). Biol. 10, R25 (2009).
7. Clancy, M.J., Shambaugh, M.E., Timpte, C.S. & Bokar, J.A. Induction of 27. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for
sporulation in Saccharomyces cerevisiae leads to the formation of comparing genomic features. Bioinformatics 26, 841–842 (2010).
N6-methyladenosine in mRNA: a potential mechanism for the activity 28. Bailey, T.L. & Machanick, P. Inferring direct DNA binding from ChIP-seq.
of the IME4 gene. Nucleic Acids Res. 30, 4509–4518 (2002). Nucleic Acids Res. 18, 18 (2012).
8. Jia, G. et al. N6-methyladenosine in nuclear RNA is a major substrate 29. Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer:
of the obesity-associated FTO. Nat. Chem. Biol. 7, 885–887 (2011). genome-wide annotation of chromatin binding and modification loci.
9. Levanon, E.Y. et al. Systematic identification of abundant A-to-I editing BMC Bioinformatics 11, 415 (2010).
sites in the human transcriptome. Nat. Biotechnol. 22, 1001–1005 (2004). 30. Li, H. & Durbin, R. Fast and accurate long-read alignment with
10. Klose, R.J. & Bird, A.P. Genomic DNA methylation: the mark and its Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
mediators. Trends Biochem. Sci. 31, 89–97 (2006). 31. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice
11. Dai, Q. et al. Identification of recognition residues for ligation-based junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).
detection and quantitation of pseudouridine and N6-methyladenosine. 32. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2.
Nucleic Acids Res. 35, 6322–6329 (2007). Nat. Methods 9, 357–359 (2012).