Professional Documents
Culture Documents
Comparison Tools For Lncrnas Identification Analysis Among Plant and Humans
Comparison Tools For Lncrnas Identification Analysis Among Plant and Humans
Abstract—This article has as its main objective the evaluation particular, research in this area in plants is more backward
of the differences between long non-coding RNAs of plants and than in humans and animals [3], [9].
humans. Long non-coding RNAs are also known as lncRNAs. Thus, the identification of lncRNAs is probably one of the
The lncRNAS belong to the class of RNAs that do not encode
proteins and are related to several biological functions, such major challenges of RNA research for the next 20 years [10].
as chromatin modifications, post-transcriptional regulation and To contribute to this task, bioinformatics approaches represent
mainly in the different development processes of diseases such a powerful strategy to identify the best lncRNA candidates
as cancer. In this work, we want to verify the existence of for functional characterization. In this sense, several tools
differences in lncRNAs in plants and humans using state-of- have emerged in recent years, for example: CPC2 [11], CPAT
the-art approaches to identify lncRNAs. The main reason for
the study is that there are differences between the miNAs [12], RNAplonc [13], PlncPRO [14], LGC [15], PLEK [16],
(small ncRNAs) of plants and humans, whether in biological CPPRED [17], PredLnc-GFStack [18], and others [19]–[22]].
or computational characteristics, for lncRNAs it is still an open Despite these prediction tools, some were designed using a
question. To answer this question, this paper proposes to show the data set of training constituted either by plant, human, or
results of two ncRNAS prediction tools, trained with humans, and mixed data. This leads us to investigate in this work whether
which are widely used for lncRNA prediction: CPC2 and CPAT.
We will also show results from tools used to predict lncRNAS the patterns that characterize lncRNAs in plants and humans
in plants, which are trained with plant data: the RNAplonc, the are similar. By the way, the microRNAs (i.e., ncRNAs that
PlncPRO tool that contains two versions, one for monocot and contain about 22 nt) existing in plants and animals are different
one for dicot and the LGC tool that was trained with plants in biological or computational characteristics [23], [24] but
and humans. The results of tools trained with human data will little is known about lncRNAs.
also be displayed: PLEK, CPPRED and PredLnc-GFStack. These
eight tools were applied in two sets of tests, one composed of eight Given this consideration, the goal of this article is to
species of plants (Amborella trichopoda, Brachypodium distachyon, perform a cross-validation with different tools for lncRNAs
Citrus sinensis, Manihot esculenta, Ricinus communis, Solanum identification using plant and human data sets in order to
tuberosum, Sorghum bicolor, Zea mays) and the other composed understand how similar their predictions are. For this, we
of human lncRNAS. present nine tools used to identify lncRNAs. These tools were
Index Terms—tool, long RNAs, predict, non-coding, bioinfor-
matics used to classify three different groups of tests: (1) group of
plant tools; (2) group of human tools; (3) group of mixed (plant
I. I NTRODUCTION and human) tools. Among the tools compared in this article,
The non-coding RNAs (ncRNAs) constitute a heterogeneous we have: (1) three tools trained with plant data: RNAplonc
group of RNA molecules, which can be classified in differ- [13], PLncPRO mono [14], PLncPRO dico [14]; (2) four tools
ent ways according to their location, length, and biological trained with human data:CPAT [12], PredLnc-GFStack [18],
function [1]–[3]. They belong to the class of RNAs that do PLEK [16] and CPPRED [17]; (3) two tools trained with mixes
not encode proteins, besides, they are important players in of plant and human data: CPC2 [11] and LGC [15]. Comparing
the regulation of gene transcription, splicing, and translation the results, we found that the RNAplonc [13] tool performed
[4]. In addition, the recent wide applications of the high- better in plant data whereas the CPPRED [17] tool stood out
throughput approaches have facilitated the identification of in human data, whereas the CPC2 [11] tool is the one that
thousands of novel non-coding RNAs in many organisms, such has the best result in the mixed group, showing intermediate
as humans, animals, and plants [3], [5]. Specifically, long non- results in the plant and human data sets.
coding RNAs (lncRNAs) are ncRNA that contain more than The rest of the paper is organized as follows: In Section II,
200 nucleotides (nt) [6]. The research in this field is still we describe the method used, in the subsection II-A, we
in its infancy, especially in the case of plants. Thus far, only show the metrics used for evaluation and performance of the
a few lncRNAs have been sufficiently described [7], [8]. In tests, in the Subsection II-B and in the Subsection II-C, we
describe the data sets used in the tests and in the comparisons.
In Section III, we started the Subsection III-A, presenting
assess the generalized of a model, from a set of data [36]. This Species # total lncRNA # total mRNA # used
technique is widely used in problems where the purpose of Amborella trichopoda 5698 26846 3823
modeling is prediction. We then seek to estimate how accurate Brachypodium distachyon 5584 52972 4868
Citrus sinensis 2562 46147 2292
each model is in practice, that is, its performance for each data Manihot esculenta 3468 41318 3017
set. We ran the nine tools with the two data sets and to quantify Ricinus communis 4198 31221 4080
the classification performance under a unified standard, we first Solanum tuberosum 6680 51472 5607
Sorghum bicolor 5305 47205 4541
characterized in each data set the lncRNAs as a positive class Zea mays 18154 63540 12071
and protein coding transcripts as a negative class; then the
performance of these tools can be assessed with four defined
metrics: sensitivity, specificity, accuracy and F1-score. For the human data set, we used 35324 lncRNA sequences
To better analyze and visualize the results, we created (positive data set). LncRNAs were identified in the FANTOM
different graphs with the obtained metrics, which will be 5 [41] project, and for mRNas we used 35324 human transcrip-
presented and discussed in the next section. tion sequences (negative data set) from the Ensembl [42] data
set. The mRNAs were chosen randomly in the same amount
B. Evaluation criteria of lncRNA, to keep the tests balanced.
Shows the results on the evaluation of the best trained mod-
els according to seven statistical criterion [37] based on the: III. R ESULTS
True Positive (TP) estimator measures the lncRNAs correctly
We will start the results section talking a little about the
predicted; the True Negative (TN) estimator represents the
computers that were used to apply the tests, which are detailed
coding RNAs correctly classified from the negative data set;
in the Subsection III-A.
the False Positive (FP) estimator describes all those negative
entities that are incorrectly classified as lncRNAs, and the To improve the visibility of the data, we separated the results
False Negative (FN) estimator represents those true lncRNAs into three subsections. The bare subsection shows the results
that are incorrectly classified as non-lncRNAs. The seven of the tools of group one: group of plant tools, applied in
estimators criteria include: sensitivity (SE), specificity (SPC), the plant and human data sets (Subsection III-B). The bare
accuracy (ACC), F1-Score (see the Equations 1, 2, 3 and 4). subsection, which shows the result of the tools of group two:
group of human tools, applied on the data base of plant and
TP human (Subsection III-C). And the naked subsection, which
SE = (1)
TP + FN shows the result of the tools of group three: group of mixed
(plant and human) tools, applied in the plant and human data
TN set (Subsection III-D).
SPC = (2)
TN + FP
TP + TN A. Used computers
ACC = (3)
TN + FP + TP + FN Two computers were used to perform the tests: one personal
2 × TP laptop and a computer available at the Informatics Labora-
F1-Score = (4) tory (LABINFO) of the Federal Technological University of
2 × TP + FP + FN
Paraná. The laptop consists of a samsung with an Intel Core i7-
C. Test data sets 3630QM processor, with 8GB 2.40GHz RAM memory, with
We defined two sets of data to assess the performance of the NVIDIA GF108M [GeForce GT 630M] video card, is with
tools and to make comparisons between the plant and human the Linux Debian 10.2 operating system and the second is a
lncRNAS. For the plant data set, we used eight species. For server that has an Intel Xeon E5-2620 v3 processor, with 15 M
lncRNAS (positive data) we use the GreeNC [38] data set and cache, 2.40 GHz, having 32GB of RAM memory and NVIDIA
for negative data (mRNA) we use Phytozome [39] (see table Titan V graphics card, uses the Linux operating system Mint
I). Species were chosen based on their phylogenetic diversity 19.2 Tina.
B. 1-Group of plant tools
This first test was applied to the data set of eight plant
species in the Table I. Tests were carried out with the
RNAplonc [13] tool, a tool that was developed by our group,
being trained with plants. Another tool used for the tests is the
PLncPRO [14] tool, which is also trained with plants and has
two models, one for monocotyledonous plants and another for
dicotyledonous plants (see table II).
TABLE II
R ESULT OF PLANT DATA SET WITH LNC RNA TOOLS AND PLANT TRAINED Fig. 2. Average of the sum of all plant species by metrics.
TOOLS
Data set Tools SE SPC ACC F1-score In the second applied test, we have the result of the human
Amborella PLncPRO mono 86.19 96.29 91.24 90.77 data set (subsection II-C) tested with tools trained in plants:
trichopoda PLncPRO Dico 78.11 94.85 86.48 85.24
RNAplonc 100.00 77.01 88.50 89.69 PLncPRO Mono [14], PLncPRO Dico [14] and RNAplonc
Brachypodium PLncPRO mono 76.60 83.01 79.81 79.14 [13].
distachyon PLncPRO Dico 70.52 87.80 79.16 77.19
RNAplonc 97.64 86.09 91.87 92.31
We have a graph (fig. 3), with the result of the metrics for
Citrus PLncPRO mono 57.33 78.01 67.67 63.94 each (fig. 3), where we can see that PLncPRO Mono [14] has
sinensis PLncPRO Dico 47.69 94.94 71.31 62.44 the best sensitivity (SE) with 88.52%, then RNAplonc [13]
RNAplonc 99.48 88.35 93.91 94.23
Manihot PLncPRO mono 75.11 76.53 75.82 75.65 with 76.85%. For specificity (SPC), we have PLncPRO Dico
esculenta PLncPRO Dico 66.85 87.60 77.23 74.59 [14] with 94.15%, followed by RNAplonc [13] with 92.92%.
RNAplonc 99.90 86.64 93.27 93.69
Ricinus PLncPRO mono 85.56 84.92 85.24 85.29 We can also see that the accuracy of the RNAplonc [13] tool
communis PLncPRO Dico 73.82 94.21 84.02 82.21 is 84.90%, while the PLncPRO Dico [14] tool has 83.01%.
RNAplonc 99.98 81.47 90.72 91.51
Solanum PLncPRO mono 67.72 60.37 64.04 65.32 For the F1-score metric we have RNAplonc [13] again with
tuberosum PLncPRO Dico 63.47 81.31 72.39 69.69 the best average of 83.56%, followed by PLncPRO Mono [14]
RNAplonc 99.68 76.07 87.87 89.15
Sorghum PLncPRO mono 80.03 87.91 83.97 83.31
with 80.90%.
bicolor PLncPRO Dico 75.17 87.51 81.34 80.11
RNAplonc 96.34 86.15 91.25 91.67
Zea PLncPRO mono 95.88 86.96 91.42 91.79
mays PLncPRO Dico 83.00 86.74 84.87 84.58
RNAplonc 99.36 84.94 91.56 91.54
Fig. 4. Results of the test performed on the human data set with tools trained Fig. 5. Average of the metrics of the test performed on the plant’s data set
with human data. with tools trained with human data.
Fig. 6. Average of the metrics of the test performed on the plant’s data set
with tools mixed (plant and human).
Fig. 7. Average of the metrics of the test performed on the human data set
with tools mixed (plant and human). In these next two figures (fig. 10 and fig. 11), we have the
best tool in each group (plant, mixed and human), showing
IV. D ISCUSSION accuracy (ACC) and F1-score, ordered by sensitivity SE.
ACC indicates overall model performance. Among all clas-
To better compare the results, let’s start by analyzing two sifications, how many the model correctly classified, that is,
figures, fig. 8 and fig. 9, where we have grouped the three the ACC is the measure that translates the accuracy of the
sets of tools (plant, mixed and human), from the two data sets tool, that is, it allows determine the percentage of correct
(Plants and human), using the metrics of specificity (SPC) and tool settings. The F1-score, on the other hand, is a harmonic
sensitivity (SE). average between precision and recall. The F1-score is simply
Two basic concepts are sensitivity (SE) and specificity a way of looking at just one metric instead of two (precision
(SPC). The SE is the tool’s ability to identify lncRNAs, that and recall). It is a harmonic mean between the two, which is
is, corresponds to the probability of correctly classifying an much closer to the lower values than a simple arithmetic mean.
lncRNA. The SPC is the ability of the tool to identify an That is, when there is a low F1-score, it is an indication that
mRNA (transcribed), that is, corresponds to the probability of either the accuracy or the recall is low.
correctly classifying a coding RNA. In fig. 10, we have the tool RNAplonc [13], being the tool
In these figures (fig. 8 and fig. 9), we can clearly see that of the group of plants that obtained the best result in the
the tools in the human group have better results in the human human data set. We also have in the fig. 10 the tool CPPRED
[17], representing the human group with the best result in the V. C ONCLUSIONS AND FUTURE PERSPECTIVES
human data set. For the mixed group, we have the tool CPC2 The knowledge about lncRNAS, its characteristics, func-
[11], which obtained the best result in the human data set (see tions and performance within the cell is still scarce, but with
fig. 10). this work we can, through the results of the tests, show that
Comparing the figures (fig. 8 and fig. 10) and the test there are differences between plant and human lncRNAs. We
results with the human data set, we can see that the tools can see that tools developed and trained with plant data cannot
trained with human data obtained better results than the tool overcome the results of tools developed and trained with
trained with plants data or the tools of the mixed group. As human data, if applied to a human data set. And the same
can be seen in the figure 10, the CPPRED [17] tool obtained is true for human-trained tools when tested on plants.
an accuracy of 94.10%, while RNAplonc [13] obtained an This result justifies the methodology used and allows us
accuracy of 82.85% in the same set data, while the CPC2 to elucidate answers to our initial question, the difference
[11] tool achieved 86.24% accuracy. between lncRNAs in plants and humans is true, and makes
In fig. 8 showing the results of the plant data set, we can us think about how to quantify and describe such differences.
see that the tools trained with plants have better results. In
ACKNOWLEDGMENT
fig. 10, the tool of the plant group RNAplonc [13] obtained
an accuracy of 91.12%, while the tool of the human group We gratefully acknowledge the support of NVIDIA Cor-
PLEK [16] obtained an accuracy of 79.31% and the mixed poration with the donation of the Titan V GPU used for
group tool CPC2 [11], had an accuracy of 83.48%. this research. Thank CAPES for the doctoral scholarship.
This project has been supported by a PROBAL Grant
(CAPES/DAAD - 88887.144045/2017-00).
R EFERENCES
[1] B. B. Amor, S. Wirth, F. Merchan, P. Laporte, Y. d’Aubenton Carafa,
J. Hirsch, A. Maizel, A. Mallory, A. Lucas, J. M. Deragon et al., “Novel
long non-protein coding rnas involved in arabidopsis differentiation and
stress responses,” Genome research, vol. 19, no. 1, pp. 57–69, 2009.
[2] J. Liu, C. Jung, J. Xu, H. Wang, S. Deng, L. Bernad, C. Arenas-Huertero,
and N.-H. Chua, “Genome-wide analysis uncovers regulation of long
intergenic noncoding rnas in arabidopsis,” The Plant Cell, vol. 24, no. 11,
pp. 4333–4345, 2012.
[3] Q.-H. Zhu and M.-B. Wang, “Molecular functions of long non-coding
rnas in plants,” Genes, vol. 3, no. 1, pp. 176–190, 2012.
[4] J. J. Quinn and H. Y. Chang, “Unique features of long non-coding rna
biogenesis and function,” Nature Reviews Genetics, vol. 17, no. 1, p. 47,
2016.
[5] M. Q. Hassan, C. E. Tye, G. S. Stein, and J. B. Lian, “Non-coding rnas:
Epigenetic regulators of bone development and homeostasis,” Bone,
Fig. 10. Accuracy result by F1-score of the human data set vol. 81, pp. 746–756, 2015.
[6] S.-Y. Ng, L. Lin, B. S. Soh, and L. W. Stanton, “Long noncoding rnas
in development and disease of the central nervous system,” Trends in
Genetics, vol. 29, no. 8, pp. 461–468, 2013.
[7] J. L. Rinn, M. Kertesz, J. K. Wang, S. L. Squazzo, X. Xu, S. A.
Brugmann, L. H. Goodnough, J. A. Helms, P. J. Farnham, E. Segal
et al., “Functional demarcation of active and silent chromatin domains
in human hox loci by noncoding rnas,” cell, vol. 129, no. 7, pp. 1311–
1323, 2007.
[8] Y. Wang, X. Fan, F. Lin, G. He, W. Terzaghi, D. Zhu, and X. W. Deng,
“Arabidopsis noncoding rna mediates control of photomorphogenesis by
red light,” Proceedings of the National Academy of Sciences, vol. 111,
no. 28, pp. 10 359–10 364, 2014.
[9] Y. Bai, X. Dai, A. P. Harrison, and M. Chen, “Rna regulatory networks
in animals and plants: a long noncoding rna perspective,” Briefings in
functional genomics, vol. 14, no. 2, pp. 91–101, 2015.
[10] T. R. Cech, “RNA World research-still evolving,” RNA, vol. 21, no. 4,
pp. 474–475, Apr 2015.
[11] Y.-J. Kang, D.-C. Yang, L. Kong, M. Hou, Y.-Q. Meng, L. Wei, and
G. Gao, “Cpc2: a fast and accurate coding potential calculator based on
sequence intrinsic features,” Nucleic acids research, vol. 45, no. W1,
pp. W12–W16, 2017.
[12] L. Wang, H. J. Park, S. Dasari, S. Wang, J.-P. Kocher, and W. Li,
Fig. 11. Accuracy result by F1-score of the plants data set “Cpat: Coding-potential assessment tool using an alignment-free logistic
regression model,” Nucleic acids research, vol. 41, no. 6, pp. e74–e74,
2013.
These figures make it clear that tools trained with human [13] T. d. C. Negri, W. A. L. Alves, P. H. Bugatti, P. T. M. Saito,
D. S. Domingues, and A. R. Paschoal, “Pattern recognition analysis
data perform better on human data, and tools trained with on long noncoding rnas: a tool for prediction in plants,” Briefings in
plants perform better on plant data. bioinformatics, vol. 20, no. 2, pp. 682–689, 2019.
[14] U. Singh, N. Khemka, M. S. Rajkumar, R. Garg, and M. Jain, “Plncpro [36] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy
for prediction of long non-coding rnas (lncrnas) in plants and its estimation and model selection,” in Ijcai, vol. 14, no. 2. Montreal,
application for discovery of abiotic stress-responsive lncrnas in rice and Canada, 1995, pp. 1137–1145.
chickpea,” Nucleic acids research, vol. 45, no. 22, pp. e183–e183, 2017. [37] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. John
[15] G. Wang, H. Yin, B. Li, C. Yu, F. Wang, X. Xu, J. Cao, Y. Bao, Wiley & Sons, 2012.
L. Wang, A. A. Abbasi et al., “Characterization and identification of long [38] A. P. Gallart, A. H. Pulido, I. A. M. De Lagrán, W. Sanseverino, and
non-coding rnas based on feature relationship,” Bioinformatics, vol. 35, R. A. Cigliano, “Greenc: a wiki-based database of plant lncrnas,” Nucleic
no. 17, pp. 2949–2956, 2019. Acids Research, vol. 44, no. Database issue, p. D1161, 2016.
[16] A. Li, J. Zhang, and Z. Zhou, “Plek: a tool for predicting long non- [39] D. M. Goodstein, S. Shu, R. Howson, R. Neupane, R. D. Hayes, J. Fazo,
coding rnas and messenger rnas based on an improved k-mer scheme,” T. Mitros, W. Dirks, U. Hellsten, N. Putnam et al., “Phytozome: a
BMC bioinformatics, vol. 15, no. 1, p. 311, 2014. comparative platform for green plant genomics,” Nucleic acids research,
[17] X. Tong and S. Liu, “Cppred: coding potential prediction based on the vol. 40, no. D1, pp. D1178–D1186, 2012.
global description of rna sequence,” Nucleic acids research, vol. 47, [40] W. Li and A. Godzik, “Cd-hit: a fast program for clustering and
no. 8, pp. e43–e43, 2019. comparing large sets of protein or nucleotide sequences,” Bioinformatics,
[18] S. Liu, X. Zhao, G. Zhang, W. Li, F. Liu, S. Liu, and W. Zhang, “Predlnc- vol. 22, no. 13, pp. 1658–1659, 2006.
gfstack: A global sequence feature based on a stacked ensemble learning [41] C.-C. Hon, J. A. Ramilowski, J. Harshbarger, N. Bertin, O. J. Rackham,
method for predicting lncrnas from transcripts,” Genes, vol. 10, no. 9, J. Gough, E. Denisenko, S. Schmeier, T. M. Poulsen, J. Severin et al.,
p. 672, 2019. “An atlas of human long non-coding rnas with accurate 5 ends,” Nature,
[19] L. Kong, Y. Zhang, Z.-Q. Ye, X.-Q. Liu, S.-Q. Zhao, L. Wei, and G. Gao, vol. 543, no. 7644, pp. 199–204, 2017.
“Cpc: assess the protein-coding potential of transcripts using sequence [42] T. Hubbard, D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark,
features and support vector machine,” Nucleic acids research, vol. 35, T. Cox, J. Cuff, V. Curwen, T. Down et al., “The ensembl genome
no. suppl 2, pp. W345–W349, 2007. database project,” Nucleic acids research, vol. 30, no. 1, pp. 38–41,
[20] C. Pian, G. Zhang, Z. Chen et al., “LncRNApred: Classification of Long 2002.
Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble
Algorithm with a New Hybrid Feature,” PLoS ONE, vol. 11, no. 5, p.
e0154567, 2016.
[21] C. M. A. Simopoulos, E. A. Weretilnyk, and G. B. Golding, “Prediction
of plant lncRNA by ensemble machine learning classifiers,” BMC
Genomics, vol. 19, no. 1, p. 316, May 2018.
[22] L. H. Pyfrom, S.C. and J. Payton, “PLAIDOH: a novel method for
functional prediction of long non-coding RNAs identifies cancer-specific
LncRNA activities,” BMC Genomics, vol. 20, no. 137, 02 2019.
[23] Y. Moran, M. Agron, D. Praher, and U. Technau, “The evolutionary
origin of plant and animal micrornas,” Nature ecology & evolution,
vol. 1, no. 3, pp. 1–8, 2017.
[24] Z. Li, R. Xu, and N. Li, “Micrornas from plants to animals, do they
define a new messenger for communication?” Nutrition & metabolism,
vol. 15, no. 1, p. 68, 2018.
[25] Y. Wang, Y. Li, Q. Wang, Y. Lv, S. Wang, X. Chen, X. Yu, W. Jiang,
and X. Li, “Computational identification of human long intergenic non-
coding rnas using a ga–svm algorithm,” Gene, vol. 533, no. 1, pp. 94–99,
2014.
[26] G. M. Ventola, T. M. Noviello, S. D’Aniello, A. Spagnuolo, M. Cecca-
relli, and L. Cerulo, “Identification of long non-coding transcripts with
feature selection: a comparative study,” BMC bioinformatics, vol. 18,
no. 1, p. 187, 2017.
[27] J. Liu, J. Gough, and B. Rost, “Distinguishing protein-coding from non-
coding RNAs through support vector machines,” PLoS Genetics, vol. 2,
no. 4, p. e29, Apr 2006.
[28] M. F. Lin, I. Jungreis, and M. Kellis, “PhyloCSF: a comparative
genomics method to distinguish protein coding and non-coding regions,”
Bioinformatics, vol. 27, no. 13, pp. i275–282, Jul 2011.
[29] L. Sun, H. Luo, D. Bu et al., “Utilizing sequence intrinsic composition to
classify protein-coding and long non-coding transcripts,” Nucleic acids
research, vol. 41, no. 17, p. e166, Sep 2013.
[30] X.-N. Fan and S.-W. Zhang, “lncrna-mfdl: identification of human long
non-coding rnas by fusing multiple features and using deep learning,”
Molecular BioSystems, vol. 11, no. 3, pp. 892–897, 2015.
[31] J. Li, W. Ma, P. Zeng, J. Wang, B. Geng, J. Yang, and Q. Cui,
“LncTar: a tool for predicting the RNA targets of long noncoding
RNAs,” Briefings in Bioinformatics, vol. 16, no. 5, pp. 806–812, 12
2014. [Online]. Available: https://doi.org/10.1093/bib/bbu048
[32] L. Hu, Z. Xu, B. Hu, and Z. J. Lu, “COME: a robust coding potential
calculation tool for lncRNA identification and characterization based on
multiple features,” Nucleic Acids Research, vol. 45, no. 1, pp. e2–e2,
09 2016. [Online]. Available: https://doi.org/10.1093/nar/gkw798
[33] J. Baek, B. Lee, S. Kwon, and S. Yoon, “lncrnanet: Long non-coding rna
identification using deep learning,” Bioinformatics, vol. 1, p. 9, 2018.
[34] Z. Ji, R. Song, A. Regev, and K. Struhl, “Many lncrnas, 5’utrs, and
pseudogenes are translated and some are likely to express functional
proteins,” elife, vol. 4, p. e08890, 2015.
[35] J. Ruiz-Orera, X. Messeguer, J. A. Subirana, and M. M. Alba, “Long
non-coding rnas as a source of new peptides,” elife, vol. 3, p. e03523,
2014.