You are on page 1of 1

P. Jiang et al.

/ Developmental Biology 426 (2017) 143–154 145

quantification on a Synergy plate reader. Samples were then sum of the “effective lengths” (length less the read length) of the
pooled and sequenced with 72 bp reads with the Illumina HiSeq contigs mapped to that gene and then scaling the resulting values
Rapid chemistry. such that they summed to one million over all genes. Contigs
The raw read data and annotated gene expression measures are mapping to rRNA transcripts and their respective counts were
available in the Gene Expression Omnibus (accession number removed from the analysis. The list of assembled and annotated
GSE78034). contigs is available at www.axolomics.org//sites/default/files/Ax-
oDevelTimecourse_Annotated_Contigs_with_Gene_Symbol.fa.tgz.
2.3. RNA-seq data analysis This comparative approach was run using both human and Xeno-
pus tropicalis as reference species. The NCBI RefSeq sets used for
The RNA-seq data consisted of three biological replicate sam- human and X. tropicalis were dated 12/7/2009 and 8/3/2015, re-
ples from each stage (with the exception of stages 1 and 11, which spectively. It should be noted that by virtue of matching to human
only had two replicates), for a total of 49 samples. RNA-seq data (or frog) annotation there will be axolotl-specific genes that our
were comprised of paired-end reads with a length of 101 bases methodology a priori cannot capture.
(polyAþ data) and single end reads with a length of 72 bases (total We also estimated gene specific within-condition mean and
RNA data). Adapter sequences were trimmed from reads using an variance using median-by-ratio normalized expected counts
in-house script (available at www.axolomics.org/default/files/ffq. (normalized ECs) (Leng et al., 2013) of samples within each of the
ts.pe.ao.pl.tgz). Approximately 0.25% of reads were discarded for 17 stages. For each stage, the global mean-variance relationship is
having fewer than 28 bases after adapter trimming. No other represented by a fitted curve using all estimates in this stage. The
quality filters were applied. After filtering, the mean number of curve was fitted using polynomial regression on log(variance) " log
sequenced read pairs per sample was " 31.5 million. For each (mean), with 2 degrees of freedom.
stage, the filtered reads from all replicates of that stage were
combined and assembled using Trinity (Grabherr et al., 2011) 2.4. Differentially expressed genes (DEGs)
(v2014-04-13p1, with parameters “–glue_factor 0.01 –min_iso_
ratio 0.1”). We chose not to build one single assembly using contigs We used the EBSeq package (Leng et al., 2013) to assess the
from all stages because of computational memory limitations and probability of a gene being differentially expressed between two
concerns about creating “in-silico” transcript isoforms uniting stages or two clusters. Given any two conditions, the false dis-
contigs from different developmental stages. After building the covery rate (FDR) is calculated based on the replicate or triplicate
stage-specific assemblies, a single combined non-redundant samples. We also calculated the normalized ECs for each condition
transcriptome assembly was created by clustering all contigs from (using the EBSeq method). We required that DEGs should have
all stage assemblies using USEARCH (Edgar, 2010) (v7.0.1090, with FDR o5% and 42 fold-change of average normalized ECs be-
parameters “-cluster_smallmem -id 0.95 -strand both”) and only tween any given two conditions.
selecting the “centroid” contig from each cluster. The resulting In the case of our comparative analysis of Xenopus develop-
assembly had 896,365 contigs. ment, we acquired published data from Owens et al. (2016). We
Quantification of each RNA-seq sample was then performed used their calculated gene expression values (Gaussian process
using RSEM v1.1.6 (Li and Dewey, 2011) using the combined lower median of'Transcripts per Embryo’/10000) for each stage.
transcriptome assembly as the reference database. RSEM was run See Owens et al. (2016) for details. We only used gene expression
with the default options for paired-end data except for the “–no- values from the same stages as our axolotl data. We then defined
polyA” option, which was used because the assembled contigs DEGs as genes that showed a greater than two-fold change of
were not guaranteed to include polyAþ tails. By default, RSEM (‘expression value’ þ1).
uses the Bowtie aligner (Langmead et al., 2009) to map the reads To compare HOX genes between axolotl and Xenopus (Owens
against the contigs and we had Bowtie v0.12.1 installed for this et al., 2016), we used a log10 transformed “expression measure þ1″
purpose. RSEM employs an expectation maximization algorithm, method (Chaudhuri et al., 2011; Shalek et al., 2014). The addition
so that for reads that match to multiple contigs, RSEM assigns a of a value of 1 to'expression measure’ avoids undefined arithmetic
fraction of each read to each contig based on estimated abun- and suppresses reporting of high fold-ratio expression changes for
dances of contigs based on unique reads (Li and Dewey, 2011). genes with relatively low expression in one or both sample groups.
To enable functional interpretation of the resulting transcrip-
tional profiles for the combined axolotl transcriptome assembly, 2.5. Gene ontology analysis
we used a comparative RNA-seq approach developed previously
for analysis of axolotl RNA-seq data (Stewart et al., 2013). Briefly, Gene ontology (GO) analysis was performed using the R/allez
the contigs of the combined assembly were first mapped to tran- package (Newton et al., 2007) with each of 52 sets of DEGs (16
scripts from another species by running BLAST (NCBI BLAST stage-to-stage analyses, 6 cluster-to-cluster analyses, 4 long time
v2.2.18) with either the RefSeq RNA (via BLASTN), or protein (via interval stage-to-stage comparisons, with one set each of up-
BLASTX) sequences for that species as the database and each regulated and down-regulated DEGs for each analysis). Only GO
contig as a query. Contigs were assigned to transcripts by taking terms with 50-800 associated genes were considered. GO terms
the best BLAST hit with E-value o10 # 5. For each sample, the with Benjamini-Hochberg adjusted p values less than 0.01 were
expected fragment counts for each contig (as computed by RSEM), considered as enriched.
were then converted to comparative transcript counts by summing
the fragment counts of contigs mapped to the same transcript. 2.6. Hierarchical clustering analysis
Similarly, gene-level counts were obtained by summing the frag-
ment counts of transcripts that were annotated with the same We clustered the stages based on correlations of their gene
gene symbol. Thus, if two contigs map to different isoforms of the expression vectors. To do so, we calculated the normalized ECs
same human gene or two contigs representing two axolotl in- (Leng et al., 2013) for each sample. Then, for each stage we aver-
paralogs map to the same human gene, their counts will be aged the normalized ECs of each gene across all replicates from
combined into the count for one human gene. Relative abun- that stage. We then calculated, for each pair of stages, the Spear-
dances, in terms of transcripts per million (TPM), for genes were man correlation (Rho) of their normalized, averaged gene ex-
computed by first normalizing each gene's fragment count by the pression vectors. We performed hierarchical clustering on these

You might also like