tion of diﬀerent immune reactions is sequence-speciﬁc(
). Phylogenetic approach is essential to ﬁnd func-tionally important genomic sequences based on detec-tion of their high degree of conservation across diﬀer-ent species. Such approach shows the level of improve-ment of the prediction of gene-regulatory elements inthe human genome. This necessitates the study of thedegree of homology of RNA recognition motifs (RRM)among these proteins, requiring an evolutionary com-putation. Having considered the importance of sub-mitting new sequences for further functional charac-terization, the newly cloned and expressed cDNA of previously unknown member of the hnRNP A/B fam-ily of proteins (Figure 1) is presented here. Its RNA-binding properties and tissue-speciﬁc gene expressionproﬁles were recently determined (
). Based on theconcept of correlation between sequences and RNA-binding modes, a systemic search was performed fornucleic acid association by evaluating sequence con-servation using multiple sequence alignments searchtools. This study continues phylogenetic results ob-tained from previous larger data sets (
The general 2xRNA-binding domain (RBD)—glycine structure of hnRNP A3. The space between thetwo adjacent RBDs is occupied by inter-RNA recognitionmotif linker fragment (IRL). Amino acids 1–209 compriseboth RBD1 and RBD2. RBD1 alone is composed of frag-ments of amino acids 1–112, while RBD2 is from aminoacids located in positions 112–209. Glycine-rich domaincontains amino acids numbered 209–296.
The aim of the presented work herein was to iso-late novel cDNA sequences with important functionalimplications in human pathology. Our eﬀorts havebeen devoted to the cloning and subsequent tissue-speciﬁc gene expressions of numerous human RNPsfrom the hnRNP A/B family of proteins (Figure 1;ref.
). The objective was to search for molecularbasis of autoimmunity by applying comparative anal-ysis of the sequences of diverse autoantigens. In thiscontext, evolutionary computation approach couldgive us major clues on how evolutionarily conservedmRNA transport machinery fails are linked to devel-opment of human autoimmune disorders. We weremainly interested in cDNAs, which might encode theyet undescribed hnRNP B2. The need for this wasbased on two observations. On the one hand, autoan-tibodies directed against hnRNP A2 crossreact withhnRNP B1 and hnRNP B2. Since hnRNP B1 is an al-ternatively spliced variant of hnRNP A2, this suggeststhat hnRNP B2 might be an alternatively spliced formof hnRNP A2/B1. However, no attempts to clone acDNA encoding hnRNP B2 were successful so far. Onthe other hand, cDNAs closely related to hnRNP A1and hnRNP A2 have been previously isolated from ahuman fetal brain library and from a
library, respectively. Their close relationships withhnRNP A2 suggested that one of these cDNAs mightactually encode hnRNP B2 (
).To isolate the searched cDNA, human liver andbrain cDNA expression libraries were screened byPCR using primers complementary to 5
- and 3
-untranslated regions of the FBRNP cDNA. The iso-lated sequence seemed to encode the full-length pro-tein. Interestingly, however, it was not completely ho-mologous to the FBRNP cDNA. Since the obtainednew sequence shared close identity to the
hnRNP A3 cDNA sequence (Entrez; accessionnumber L02956), the protein was termed human hn-RNP A3.Nucleotide sequence comparisons betweenFBRNP,
hnRNP A3 and our newlydetermined human hnRNP A3 proteins revealed thatextensive sequence conservation exist in RNA-bindingregions. The diﬀerences observed here were mainly atthe third position of the codon triplet. The majorityof sequence variations were seen at the Gly-rich do-main, composed of amino acids at positions 211-373.These sequences were observed more at nucleotidelevel, as expected, compared to the translated pro-tein sequences (Figure 2). Only protein sequences areshown for brevity.Identiﬁcation of various nucleic acid-binding do-mains of diverse hnRNPs was achieved by cloningand sequencing of cDNAs encoding these motifs. Ingeneral, all known human hnRNP proteins containat least one RNA-binding module and one anotherauxiliary domain fragment. The RNA-binding motifscontain the RNP consensus sequences (CS-RBD), theRNA recognition motif (RRM; ref.
), theRNP-80 motif, the RGG box (
), and the KH do-main (
). RNP domain is the most common featurein these RNPs. This domain is found in hnRNPs invarious amounts, ranging from 1 (in hnRNP C) to 4(
in Poly A-binding protein; ref.
). Figure 1shows the general modular structure of hnRNP A/Btype of RNP particles. Their general structure is com-posed of two domains: the ﬁrst 195 residues comprisethe so-called UP1 domain, containing two canonicalRNA-recognition motifs (RRM 1 and RRM 2), eachof which is comprised from the conserved RNP-2 andRNP-1 submotifs. The Gly-rich C-terminal domainGeno., Prot. & Bioinfo. Vol. 1 No. 4 November 2003 311