You are on page 1of 317

GENOME WIDE SURVEY OF CERTAIN MAMMALIAN GPCRS AND OLFACTORY RECEPTORS

A THESIS

Submitted by

NAGARATHNAM B

in partial fulfillment for the award of the degree of

DOCTOR OF PHILOSOPHY

FACULTY OF SCIENCE AND HUMANITIES ANNA UNIVERSITY CHENNAI 600 025
JUNE 2012

ii

iii

ABSTRACT
In the recent era of G-protein coupled receptor (GPCR) research, computational approaches in sequence analysis play a vital role in identifying related sequences (homologues), conserved features, (domains, motifs) and evolutionary impacts (orthologs) for the interested protein families at intraand inter-genomic levels. Candidate GPCRs and ORs (class A type GPCR) are important for their diverse cellular activities and have been considered for the genome-wide survey in selected eukaryotic genomes, which further helps to establish a structure, function resemblance. Generally, GPCRs are predicted for having extracellular N-terminal (N-out topology), intracellular C-terminal with seven transmembrane-helices (TMHs) and are connected by three intra and extracellular loops thereby termed as serpentine-like receptors. Previous cross-genome studies on human- Drosophila GPCRs, motivated to perform a cross-genome clustering on human- C. elegans GPCRs (Chapter 2). A profile based clustering (RPS-BLAST) was employed to associate more than 1000 C. elegans GPCRs with already grouped human GPCR clusters of eight major types of receptors. The generated 32 human- C. elegans GPCR clusters were analyzed for five different types of cluster association with proposed terminologies such as human GPCR clade [HC], coclusters [CC], neighbor clades [NC], neighbor members [NM], speciesspecific members [SS] observed at tree topology which facilitate to connect functional relevance at intra-and inter-genomic levels. Interestingly, the referred CC was significant and exhibited evolutionary integrity at inter-genomic level. Also, the identified 27 orthologs were evident to illustrate the effectiveness of using cross-genome clustering techniques in connecting related GPCRs even at

iv

remote homology. Overall 84% of the GPCR sequences across genomes have been associated at the significant E-value thresholds (ranges from 0.001 to 1) successfully by RPS-BLAST (work published). Cross-genome clustering on human and C. elegans GPCRs motivated to perform a phylogenetic analysis on serpentine receptors (SRs) exclusively (Chapter 3). As we know, nearly 20 protein families of SRs from C. elegans were related to chemosensation, a phylogenetic analysis on 683 serpentine receptors was carried out to identify the related sequences/clusters to represent the family specific/receptor specific sequence features, ultimately to connect at superfamily level. Interestingly, the only one receptor annotated for olfaction (odr-10) in C. elegans to sense di-acetyl compounds has been noticed along with 43 SRs in the phylogeny. All the associated homologues to odr-10 are from Str superfamily and particularly str-112 has been found as the most closely related sequence homologue to odr-10 from the phylogenetic analysis. As a case study, odr-10 has been modelled for understanding secondary structural details. A str family specific “QLF” motif was identified in ICL3, TM6 of odr-10 and 92 other SR family specific motifs were also identified by using TM-MOTIF package. The identified sequence features can be used further to train SVM models and to predict putative receptors from other nematode species. Attempts have been made to design an user-friendly alignment viewer TM-MOTIF (work published) to detect and to display conserved motifs on the predicted membrane topology in the set of aligned transmembrane proteins (Chapter 4). The tool is very effective in identifying not only the conserved motifs (default 60%) but also the amino acid substitution (AAS) with its respective physico-chemical properties (by using

TM-MOTIF is highly suitable for the comparative genomics and to identify the cluster-specific or receptor specific and common motifs observed at various percentage of conservation within and across the genome(s). human. . elegans GPCR clusters and human – only GPCR cluster dataset were considered to study primarily for the conserved motifs (MotifS program) and TM-MOTIF package has been used to record the observed motifs to its respective membrane topology (Chapter 5). User can also align sequence of interest with any one of the given reference sequence (known structure) to get a pairwise alignment and this particular display is highly helpful as a pre-requisite for homology modelling. Interestingly. a total of 33 conserved motifs have been identified from the human-Drosophila GPCR clusters and 76% of them were observed in TM helices.v an in-house program namely. In short. User can also perform a BLAST search to identify a nearest homologue from the incorporated cross-genome GPCR and OR cluster datasets of selected organisms. The package is integrated to DOR (Database of Olfactory Receptors). TMMOTIF provide option for the users to submit their sequence of interest (multiple FASTA and MSA) to visualize the seven predicted helices of TM proteins in VIBGYOR colouring scheme.C. As we know. The previously established 32 clusters of eight major types of receptors of cross-genome GPCR clusters such as humanDrosophila GPCR clusters. predominately in TM2 and TM7. the role of conserved motifs and AAS play crucial role in functional aspects. Besides the classical motifs such as E/DRY and NPXXY. two-receptor and multi-receptors types were also documented for the cross-genome GPCR clusters (work published).MotifS) at each position of the alignment. motifs observed in single receptor type (clusterspecific motifs or receptor-specific).

and class I (to sense water-borne odors). coclusters arrived at intra. In essence. Selected nematode ORs also shows no coclustering with human ORs due to long lineage and nematode life style. life style or reverse topology of fly ORs. orthologs.and inter-genomic alignments. rodents and nonhuman primates to analyze cluster association with human ORs. Interestingly. Study on humanmouse OR clusters showed significant coclustering and studies were carried with ORs of canine. cluster specific motifs. . It also includes motif identification tool (TM-MOTIF) and is associated with other features like predicted secondary structure and dimer prediction from collaborators (work in press). In other study. worm. This could be due to the independent evolution. cluster association. fly. genome-wide survey suggests representative sequences. It provides sequences.vi Olfactory receptor data repository was generated for selected eukaryotic organisms (yeast. mouse and human) and these sequences were aligned to produce intra. intra. phylogeny of selected genomes. 371 functional ORs from human genome were distributed in 10 distinct clusters. predicted TM boundaries. The results of sequence studies were organized in a publically available database namely DOR.and inter-genomic phylogeny. fly ORs showed no significant coclustering with human OR phylogeny and proves that insect ORs are evolutionarily distinct from mammalian ORs.and inter-genomic levels and are ultimately guiding to connect functional properties of known to unknown gene/protein and to understand structure function relationship. II (to sense air-borne odors) type receptors were discriminated while introducing few selected fish and amphibian ORs in the human OR phylogeny.

I sincerely express my earnest gratitude to my doctoral committee member Dr. RSF. Director-Research. Dr. Anna University of Technology Coimbatore for graciously permitting me to do this research. K. Mr. guidance. Besides I am extremely thankful to my co-supervisor and mentor Prof. Coimbatore. Vice Chancellor. V. Prof. Tiruchengode for his valuable guidance for my Ph.D. Sowdhamini. Bangalore who has been a source of inspiration. Director. Dr.. I submit my gratitude to Prof. K. teaching and non-teaching staff. NCBS. P. Department of Biotechnology. Dr. Srinivasan from IISc.vii ACKNOWLEDGEMENT I express my deep sense of gratitude to Dr. Bangalore for extending care and moral support to pursue the research work and I submit my deepest gratitude to Mr. RSF. S. advice to me throughout the course of this research work. help. Apurva Sarin. Obaid Siddiqi. my lab mates and “all@ncbs” for their kind hearted support in encouraging my research thirst. I express my heartfelt thanks to Prof. B. Vijayaragavan. study. KSR College of Technology. R. SenthilKumar. Lab-25. Balakrishnan. Bangalore. Prof. Karunakaran. PSG College of Technology. Dr. National Center for Biological Sciences. N. Thanks to my family members and my beloved APPA. NAGARATHNAM . and Dr. Prof. Shaju. Renuka Devi. Ashok Rao. Dr. Further.

6 Nematode Olfaction 1.7.3 Olfactory Signaling Pathway in Human ORs 1.1. 1.7.5 Insect olfaction (Drosophila ORs) 1.viii TABLE OF CONTENT CHAPTER NO.2 Classical Knowledge on Olfactory Receptors 1. GRs and IRs in Drosophila 1.4 ORs.6.2.7. 1.7.7 Mouse Olfaction 1 2 4 6 7 7 9 10 11 12 13 14 14 15 16 .7.1 Olfactory Receptors (ORs) 1.4.7. GPCRS: POPULAR DRUG TARGETS STRUCTURE AND CELLULAR ACTIVITIES OF MEMBRANE PROTEINS 1. MEMBRANE PROTEIN: TOPOLOGY GPCR MECHANISM GPCR CLASSIFICATION 1. 1. 1.7.3. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS iii xxii xxiv xxx 1 INTRODUCTION 1. PRIOR ART ON GENOME-WIDE SURVEY BREAKTHROUGHS IN GPCR CRYSTALLOGRAPHY STUDIES 1.5.7.

9 COLLECTION OF GPCR.11. ELEGANS G-PROTEIN COUPLED RECEPTORS 2. TITLE PAGE NO.14 HOMOLOGY MODELLING OF GPCRs/ORs 2 CROSS-GENOME CLUSTERING OF HUMAN AND C.10.9.9.1 INTRODUCTION 30 30 .2 PRALINE TM 1.10.1 PHYLIP 1.8 DATA REPOSITORIES FOR MEMBRANE PROTEINS 16 17 1.12 1.11.ix CHAPTER NO.10 MULTIPLE SEQUENCE ALIGNMENT TECHNIQUES 1.10.11 DERIVING PHYLOGENY OF GPCRs/ORs 1.3 MAFFT 22 23 24 24 25 26 26 1.3 Reverse PSI-BLAST (Sequence Vs Profile comparison method) 18 19 20 1.3 MEGA (Molecular Evolutionary Genetics Analysis) 27 1.13 CLUSTER ASSOCIATIONS SEQUENCE CONSERVATION AND DIVERSITY 27 28 29 1.11.2 TREE-PUZZLE 1.1 BLAST (Basic Local Alignment Search Tool) 1. 1.9.1 CLUSTAL W 1.2 PSI-BLAST (Profile Vs Sequence comparison method) 1.HOMOLOGUES 1.

5.3 2.x CHAPTER NO. 30 31 33 33 34 35 35 38 38 2.5 Species specific Members [SS] 2. elegans .1 Result Summary for Peptide Receptors 2. elegans GPCRs 2.2 Generation of Representative Profiles 2.2.5.6.2 TITLE C.4 Cross – Genome Alignment of Human – C.3 Performing RPS-Blast 2.4 OBJECTIVES PRIOR ART 2.3 Result Summary for Nucleotide and Lipid receptors 68 .5.5 Cross -Genome Phylogeny of Human – C.6.1 Human GPCR clade [HC] 2.6.6 Terminologies used to Describe Phylogeny 2.1 Superfamilies of Serpentine Receptors 2.4.6.5. elegans GPCRs 2.5.5.6 Superfamilies of Serpentine receptors (SR) 39 40 40 40 41 41 41 41 42 43 67 2. elegans and Human GPCRs PAGE NO.4 Neighbor Members [NM] 2.2 Result Summary for Chemokine Receptors 2. 2.5.AN ATTRACTIVE ANIMAL MODEL 2.6 RESULTS AND DISCUSSION 2.5 METHODOLOGY 2.6.3 Neighbor Clades [NC] 2.5.5.2 Coclusters [CC] 2. elegans GPCRs 2.5.6.1 Features Related to C.5.6.6.6.5.1 Selection Criteria for C.

4 Result Summary for Biogenic Amine Receptors PAGE NO. elegans GPCRs OBJECTIVES 117 117 118 118 CHEMOSENSORY RECEPTORS IN C.7 3. elegans 119 CHEMOSENSORY NEURONS AND OLFACTORY APPARATUS IN C.6.6.5 Result Summary for Class B (Secretin) Receptors 2.6.6 Result Summary for Cell Adhesion Receptors 2. ELEGANS AND IDENTIFICATION OF CONSERVED MOTIFS IN SERPENTINE RECEPTOR SUPERFAMILIES 3.8 Result Summary for Frizzed/Smoothened Receptors 2. elegans 120 122 3. 81 2.8 FEATURES AND IMPORTANCE OF SRs SRs: FUNCTIONAL RELEVANCE WITH OTHER EUKARYOTIC GPCRs 122 123 3.6 FAMILIES AND SUPERFAMILIES OF SERPENTINE RECEPTORS IN C. elegans 119 3.7 CONCLUSION 108 110 101 99 94 3 PHYLOGENETIC ANALYSIS OF SERPENTINE RECEPTORS OF C.4 3.9 METHODOLOGY .5 INTRODUCTION HOMOLOGUES OF C.2 3. TITLE 2.6.6.xi CHAPTER NO.3 3.1 3.7 Result Summary for Class C (Glutamate) Receptors 2.

TITLE PAGE NO.2.2 Homology Modelling of odr-10 3.10.5 Identification of Motifs in SRs 3.3 Alignment Procedure by MAFFT 123 123 124 3.10.2.11 CONCLUSION 4 TM-MOTIF: A PACKAGE AND AN ALIGNMENT VIEWER TO IDENTIFY CONSERVED MOTIFS AND AMINO ACID SUBSTITUTIONS IN ALIGNED SET OF SEVEN TRANSMEMBRANE HELIX PROTEINS 4.5 Odr-10 an outgroup to HOR 124 125 127 128 128 129 130 131 131 132 3.10. Functional Importance of Conserved Motifs in TM-Proteins 4.10 RESULTS 3.4 Preliminary phylogenetic analysis 3.2.1.2 Alignment by MAFFT 3.xii CHAPTER NO.1 INTRODUCTION 4.10.1 Data Collection 3. Motif Related to Structural Integrity and Stability 137 136 135 135 .9.9.9.1 Pairwise alignment of odr-10 with bovine rhodopsin sequence 3.1.1.2 Prediction of TM-helices by HMMTOP 3.2.10.10.3 Structure validation for Odr10 model 3.4 Phylogeny of Selected Serpentine Receptors 124 3.1 Identified Motifs in SR Families : A Pilot Study 3.10.2.9.9.2. 3.

elegans cross-genome GPCR clusters 4. In-Built Dataset of Cross-Genome GPCR and OR Cluster Dataset 4.7 Pairwise Alignment in TM-MOTIF 4.2 Human-C.1.4.5 RESULTS 144 144 145 143 143 142 141 141 141 141 141 138 138 139 140 .2 Alignment Procedures for Cross-Genome GPCR/OR Clusters 4.3.4.4 Detection of Motifs and Amino Acid Substitution (AAS) in the Cross-Genome Alignment 4.4.3.4.3.4. 4.5 Mapping of Identified Motifs on TM-helices and Loops in MSA 4.3 Human-mouse cross-genome OR clusters 4.6 Identification of Homologues Sequences for user Submitted Queries by Performing BLAST 4. Impacts of Motifs in Evolutionary Bioinformatics 4.1 Human-Drosophila cross-genome GPCR clusters 4.4.4. 4.4.1.1. TITLE PAGE NO.1. OBJECTIVES OF TM-MOTIF KEY FEATURES OF TM-MOTIF METHODOLOGY 4.4.4. 4.2.4.xiii CHAPTER NO. Prediction of Membrane Topology for TM Helices and Loops 4.1.

4 Alignment with Reference Sequence 4.5.7.3.6. 4.2.5 Identifying closest homologues of user sequence in selected organisms 151 150 148 147 4.6 Display of Over predicted helices 151 4.5. DEFAULT PARAMETERS 4.3.3.2 Display of Identified Motifs and AAS in MSA: (by using “Run Motif” option) 4.3. 4.xiv CHAPTER NO.Output Files 4.1 Display of predicted 7 TM|helices in VIBGYOR colouring 145 146 146 scheme: (by using “Run TM” option) 146 4.1 TM-MOTIF.3.5.5. 4. Input Options 4. CAVEAT AND FUTURE DEVELOPMENT AVAILABILITY CONCLUSIONS 152 152 153 154 154 .1. TITLE PAGE NO.9.3.6. Output Options 4.3.3 Display of Detected Motifs on TM-helices: (by using “Run TM-Motif” option) 4.5.5. Software Input and Output Options 4.5.5.8.5.

8 MOTIFS OBSERVED IN HUMANDROSOPHILA CROSS-GENOME CLUSTERS 5.C.7 RESULTS OCCURRENCE OF MOTIFS FOR SINGLE RECEPTOR TYPE 5.8.xv CHAPTER NO.5 METHODS 5.2 Alignment Procedure 5.5.4 IMPACT OF AMINO ACID CONSERVATION AND TYPES OF SUBSTITUTIONS 5.9 MOTIFS OBSERVED IN HUMAN.10 CHARACTERISTIC MOTIFS FROM CROSS-GENOME GPCR CLUSTERS 169 167 164 165 164 163 159 159 160 160 161 161 162 158 156 156 157 .8.5. 5 ANALYSIS ON CONSERVED MOTIFS AND PERMITTED AMINO ACID EXCHANGES IN CROSS-GENOME GPCR CLUSTERS 5.1 Cross-genome GPCR cluster dataset 5.2 Motifs Observed in Loop Regions 5.6 5.1 Motifs Observed in Transmembrane Helices 5.3 INTRODUCTION OBJECTIVES RESIDUE CONSERVATION IN CROSSGENOME SEQUENCES 5.2 5. elegans GPCR CROSS-GENOME CLUSTERS 5.1 5.5.4 Program to Detect Motifs and AAS 5. TITLE PAGE NO.3 Prediction of membrane topology 5.5.

1.1.3 Conserved PMNYM / PMSYM motif in BGA Receptor 5.2 Identified KLK/R and RLAR/K motif in Secretin Receptor 5.1. Alignment procedure 6. Methodology 6. Objectives and Scopes 6.1.3.4.xvi CHAPTER NO.10. OR: Membrane Topology 6.1.1 Conserved D/ERY and NPXXY motifs in GPCR Clusters 5.5. Retrieval of OR sequences 6. Analysis of phylogeny 179 180 178 179 173 173 173 174 175 175 177 177 173 .1. TITLE PAGE NO.1.6.10. Phylogeny on selected human olfactory receptors 6.1.1.5. PHYLOGENETIC STUDY ON SELECTED HUMAN ORS 6.1. Prior Studies on ORs 6.1.6.1.2.1. 5.10. Introduction 6.2.3. Prediction of membrane topology : Human ORs 6.6.1. Olfactory Receptors 6.4.6.11 SUMMARY 170 171 169 169 6 GENOME WIDE SURVEY OF OLFACTORY RECEPTORS (ORS) IN SELECTED EUKARYOTIC GENOMES 6.6.6.

Results PAGE NO.5.4 Results 6. Objective 6.1.7. Fish ORs 6.2.3.4. Class I and II type receptors in human OR phylogeny 6.1.xvii CHAPTER NO.1 Cocluster HXC1 Class I type receptors 6. CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND FISH GENOMES 6.3.1.7. TITLE 6.3. Motif analysis on human olfactory receptors 6.3. Review of Literatures 6.2.1.2. Sequence features of 10 human OR-subclusters 6.1. Sequence conservation: across fish and human ORs 6.1.7.2.7.1.3.class II type receptors 195 195 192 192 193 191 191 189 186 186 187 187 188 183 185 181 182 181 .1.5.2 Cocluster HXC2. Representative OR sequences 6. SVM Analysis 6.4.3 Amphibian ORs 6. Results 6.1 Objective 6.2.2. 181 6.3.5.2.3.3 CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND AMPHIBIAN GENOME 6.7.2 Literature survey on class I and II type ORs 6.5.7.2.3.

5.5.4 Results 6.5.3 Results on Drosphila OR Phylogeny Analysis 200 6.4.class II type receptors PAGE NO.2 Drosophila ORs 6.xviii CHAPTER NO.4.2 Insect ORs and mammalian ORs: (Evolutionarily unrelated) 6. TITLE 6. YEAST AND HOMO SAPIENS 6.1 Background 6.3 Summary 206 207 208 211 204 205 205 206 204 204 203 .3.2 Results and Discussion 6. elegans GENOMES 6.4.3 Membrane proteins in Yeast 6.5.5.1 Cluster association: 10 subclusters 200 6.6.6 CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED OLFACTORY RECEPTORS FROM HUMAN AND C. 196 199 199 199 6.5.6.4 PHYLOGENETIC ANALYSIS ON DROSOPHILA OLFACTORY RECEPTORS 6.1 Odr -10 and homologues 6.6.3.4 Summary 6.1 Background 6.4.4.3 Cocluster HXC3 .5 CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED ORS FROM DROSOPHILA.5 Summary 6.

1 Cross-genome OR cluster association 6. TITLE PAGE NO.2 Objectives 6.8.6.4 Summary 220 220 220 220 221 222 .7.3 Human –Mouse OR Orthology 6.7.8.7.7.7.7 CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED ORS FROM HUMAN AND MOUSE GENOMES 6.genome phylogeny with Class-I type receptor homologues 6.8 PHYLOGENETIC ANALYSIS ON OLFACTORY RECEPTORS FROM SELECTED HUMAN AND NON-HUMAN PRIMATES 6.xix CHAPTER NO.4 Complex Picture on Human-Mouse OR Orthology 6.8 Summary 218 218 217 215 214 215 215 212 212 213 213 6.2 Background 6.6. 6.7 Common motifs in the Cross-genome phylogeny 6.1 Objectives 6.1 Introduction 6.7.8.7.7.2 Cross.7.4 Results 6.7.6 Results 6.5 Methodology 6.8.3 Methodology 6.8.

9.xx CHAPTER NO.1 OR sequences of target genomes: 6.9.3 Structural features (Application of sequence searches) 6.9.9.3 7.4 Cluster association and Phylogeny 6. 6.2.9.2.2.3 Single/cross.9.5 7.2. TITLE PAGE NO.2 7.9.5 Softwares and Tools – (TM-MOTIF) in DOR 6.2.4 Summary 230 233 229 228 227 222 222 224 225 226 7 CONCLUSION 7.9.2 Predicted TM boundaries 6.9.genome OR alignments 6.6 7.4 7.9 DATABASE OF OLFACTORY RECEPTORS (DOR) 6.1 7.7 COMPENDIUM CROSS-GENOME GPCR CLUSTERING PHYLOGENETIC ANALYSIS ON SERPENTINE RECEPTORS TM-MOTIF PACKAGE STUDY ON CONSERVED MOTIFS AND AAS IN CROSS-GENOME GPCR CLUSTERS PHYLOGENETIC ANALYSIS ON ORS IN SELECTED EUKARYOTIC GENOMES SUMMARY 236 236 237 240 242 245 247 253 .1 Objectives 6.2 Features on OR sequences in DOR 6.

xxi CHAPTER NO. APPENIDX 1 THE LIST OF IDENTIFIED FAMILY-SPECIFIC MOTIFS IN SR 256 REFERENCES LIST OF PUBLICATIONS CURRICULUM VITAE 260 284 285 . TITLE PAGE NO.

6 Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC3) 198 . 2.1 Motifs@ observed in the transmembrane helices and loop regions of human and Drosophila GPCR clusters+ 162 6. elegans GPCRs in 32 Clusters 114 116 2.4 Sequence identity of neighboring frog ORs and human class I type receptors observed in cross-genome OR phylogeny 197 6.3 Sequence identity of neighboring fish ORs and human class I type receptors observed in cross-genome OR phylogeny 191 6.1 Analysis on sequence features of 10 human OR subclusters 183 6.xxii LIST OF TABLES TABLE NO. TITLE PAGE NO.1 Distribution of Human and C.1 List of Identified Orthologs List of identified “motifs” in serpentine receptor super families 134 5.2 List of conserved motifs in 10 human OR subclusters (60% level of conservations) 184 6.5 Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC2) 198 6.2 3.

7 Significant cluster association for str type receptors in CeC3 and sequence pairs with high /low identity has been given 210 6.10 Percentage Identity between selected human ORs and non-human ORs 221 .xxiii TABLE NO. TITLE PAGE NO.9 Percentage identity for selected human and mouse ORs for significant association from cross-genome OR phylogeny 219 6.8 Sequence identity and similarity between odr-10 and associated SR 213 6. 6.

3 5 8 10 Central dogma of “genome-wide survey on sequences” Crystal structure of bovine rhodopsin (Li et al 2004) Membrane topology of olfactory receptor (odr-10) in C.xxiv LIST OF FIGURES FIGURE NO.2 1.2(a-c) Pictorial representation for various types of cluster association 42 2.1 Flow-chart to depict the step-wise procedure for cross-genome clustering of GPCRs 37 2.3(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.6 Overview on the techniques involved in genome–wide survey 22 2.6(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.4(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.5(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.7(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 55 52 50 48 46 . 1.1 1.3 TITLE PAGE NO. elegans 1.5 GPCR signaling pathway ORs and organization of the olfactory system in mammals and OR signaling pathway (Meyer et al 2000) 13 1.4 1.

9(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.10(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.12 (a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display and Radial Display 2.20(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 80 78 76 72 70 69 66 64 63 61 59 57 .8(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.11(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.14(a-b) Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display) 2.13(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.19(a-b) Cross-genome phylogeny of peptide receptors nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.18(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.17(a-b) Cross-genome phylogeny of nucleotide and lipid receptors(Rectangular Display & Radial Display) 74 2. 2.xxv FIGURE NO.15(a-b) Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display) 2.16(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 2. TITLE PAGE NO.

25(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.28(a-b) Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display) 2.29(a-b) Cross-genome phylogeny of cell adhesion type receptor (Rectangular Display & Radial Display) 2.21(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.33(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 107 105 104 102 100 98 96 93 91 88 86 84 82 .30(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 2.23(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.32(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 2.24(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.26(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.22(a-b) Cross-genome phylogeny of biogenic amine receptor receptors (Rectangular Display & Radial Display) 2. 2.27(a-b) Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display) 2.xxvi FIGURE NO. TITLE PAGE NO.31(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 2.

elegans Flow-chart Tool guide of TM-MOTIF : an overview Snapshot for the available main menu of the front window of TM-MOTIF with user interactive features 145 Options given for the submission of input sequences in TM-MOTIF package Sample output for the option “RUN –TM” Sample output for the option “RUN –MOTIF” Sample output for the option “RUN – TM-Motif” Snapshot for the display of pairwise alignment of user’s input sequence with selected reference sequence 150 109 112 123 125 127 129 130 132 140 142 146 147 148 149 Snapshot Depicts the Display of Over Predicted TM-Helices 151 .1 4.3 3.2 4. elegans GPCRs at various E-value thresholds 3.7 4.2 3. 2.1 3. TITLE PAGE NO.8 4.4 3.6 4.9 Pie-diagram to show the distribution of serpentine receptors (SR) in the dataset Phylogeny on selected serpentine receptors (circular view tree) The subcluster showing odr-10 and its homologues Pairwise alignment of odr-10 with bovine rhodopsin sequence Three -dimensional model of olfactory receptor odr-10 and structure validation Phylogeny on selected human olfactory receptors with an olfactory receptor (odr-10) from C.5 4.5 3.4 4.xxvii FIGURE NO.35 (a-b) Distribution of C.6 4.3 4.34(a-b) Cross-genome phylogeny of FRZ/SMT type receptor (Rectangular Display & Radial Display) 2.

4 Phylogeny of selected olfactory receptors in Homo sapiens and fish genomes Snapshot of Alignment window for the motif “KAFSTC” in human ORs and in few fish ORs at cross-genome alignment 6.4(a-c) 6.also exhibiting the coclusters like HXC1.1 TITLE Pictorial representation to denote the occurrence of highly conserved “DRY motif ” in TM3.6 Snapshot depicts the co-clustering of fish ORs with class I type receptors of human ORs in HSC1(given in A).3 5.xxviii FIGURE NO.3 6.HXC2 and HXC3 to indicate the class I and II type receptors from frog ORs with human ORs (given in B). 5.HXC2 and HXC3 to indicate the class I and II type receptors from frog ORs with human ORs (given in B).2 5.2(a-b) Phylogenetic display of selected human olfactory receptor 6.ICL2 Flow-chart describes about the steps involved in the study PAGE NO. 6. 194 193 190 .1 5.also exhibiting the coclusters like HXC1. 158 159 168 171 179 180 189 Percentage residue conservation in TM helices and loops in GPCR Clusters Illustration of characteristic motifs (observed at 60% conservation) Flow-chart for the sequence analysis on olfactory receptors 6.5 Snapshot depicts the co-clustering of fish ORs with class I type receptors of human ORs in HSC1(given in A).

10 Observed cluster association in the cross-genome phylogeny of selected ORs from human and C.17 6.16 6.14 6.13 Cross genome phylogeny on selected human ORs with ORs from non human primates and aves 222 225 6. Drosophila and yeast 206 6.11 Cross-genome phylogeny of selected olfactory receptors (ORs) from human and mouse genomes 216 6. elegans genomes 208 6. 6.7 6.19 Display of predicted membrane boundaries in DOR Display of “Alignment” option in DOR Display of cross-genome OR phylogeny in DOR Overview on pictorial representation of available features in DOR for sequence analysis 230 6.xxix FIGURE NO. 201 203 6.18 6.12 Phylogeny on selected human and mouse olfactory receptors with special emphasize to mouse class I type receptors 216 6.20 Overview on DOR features for sequence and structural information for olfactory receptors in DOR 231 233 6.8 TITLE Phylogeny of Drosophila Olfactory receptors Observed 10 subclusters of Drosophila olfactory receptors PAGE NO.21 Display of 3D Structure and related features in DOR .9 Cross-genome phylogeny on selected ORs from human.15 Available main menu in the front page of DOR A snapshot of the give option “sequence” and its application in DOR 226 227 228 229 6.

xxx LIST OF ABBREVIATIONS AAS BGA receptors BLAST BS CAR CC CMK FRZ/SMT GLR GPCRs HC MAFFT MEGA N&L NC NJ NM ORs PR RMSD RPS-BLAST SEC SR SS SVM TM proteins - Amino acid substitutions Biogenic amine receptors Basic Local Alignment Tool Bootstrap Cell adhesion receptors (CAR). Frizzed/smoothened receptors Class C (glutamate) receptors G-protein coupled receptors Human GPCR clade Multiple Alignment using Fast Fourier Transform Molecular Evolutionary Genetics Analysis Nucleotide and lipid receptors Neighbor clades Neighbor joining Neighbor members Olfactory receptors Peptide receptors Root-mean-square deviation Reverse PSI-BLAST Class B (secretion) receptors Serpentine receptors Species-specific members Support vector machine Trans-membrane proteins . Co-clusters Chemokine receptors (CMK).

.1 CHAPTER 1 INTRODUCTION The vast and frequent update of sequence databases to build repositories for various genomes and predicting accurate structural information of these sequences are two critical steps in Computational Genomics. association and annotation of novel proteins etc. Methods such as data clustering or principal component analysis. This imbalance is indeed a challenge to achieve the goal of identifying function(s) of interested gene(s) immediately. cross-genome phylogenetic analysis . Huge accumulation of sequence information in one end and limited resources on structural details on the other end is the crucial scenario in bioinformatics. the accumulated large size data repositories can be handled effectively only through bioinformatics techniques such as genome– wide survey which is a more sophisticated approach than the traditional geneby-gene approach and provide clues to connect sequences from various genomes for the common function. classification. but can be inter-connected effectively for the cause of identifying functional annotations (Alfarano et al 2005). Available knowledge and approaches for genomics (Lipman et al 2011) and structural genomics (Redfern et al 2008) are drastically different. My current objective is applying effective bioinformatics approaches such as genome-wide survey. However. artificial neural networks or support vector machines are useful for gene/protein prediction. further support in analyzing functional genomics data.

sequence comparison studies. In principle. cluster association.and inter-genomic levels. 2008 and Metpally and Sowdhamini 2005) will be appropriate to explain the approach of accumulating related proteins (associated gene clusters). forms the baseline of computational biology. These rationale on genome-wide survey of interested gene/protein sequences provide platform to integrate knowledge on sequencestructure-function paradigm for public access (Kerrien et al 2011). species-specific behavior and co-clusters arrived at intra. relating biochemical functions with the phenotypes. along with reference to structural similarities. orthologs. preserved at cellular.1). Cross-genome sequence analysis provides knowledge on sequence conservation across taxa. provide clues to connect functional resemblance (Redfern et al 2008) (Ye et al 2006). identifying putative orthologs and to observe conserved motifs from various genomes. Thus.1 PRIOR ART ON GENOME-WIDE SURVEY Performing genome–wide survey on selected or interested protein families (Tripathi and Sowdhamini. This conceptual framework really helps to compare sequences from various genomes and provides clues to connect the sequences of “known” function to the “unknown”. biochemical and molecular levels . Sequence studies for various genomes will provide opportunity to identify a group of associated proteins based on phylogeny and can be exploited for functional relevance. This unidirectional hypothesis of associating sequences.2 on certain GPCRs and ORs to propose representative sequences. sequence studies act as a primary step to connect structural and functional studies. cluster-specific motifs. 1. ultimately to connect the functional properties of known to “unknown gene/protein” (Figure 1. predicting structural details.

conserved motifs.3 species-specific tendencies and exhibit evolutionary integrity at cross-genome level (Figure 1. gaining practically useful insights on symbiotic nitrogen-fixing alpha- .1). Label 1 refers to the selection of interested genomes. a cross-genome phylogenetic analysis on selected GPCRs of human and Drosophila genome (Metpally and Sowdhamini. led to generate 32 cross-genome GPCR clusters. Particularly. Label 2 refers to the collection of non-redundant sequences from the selected genomes. co-cluster arrangements. For instance. (Zhang et al 2007). Other case studies like genome-wide survey on identifying putative serine/threonine protein kinases (STKs) in cyanobacteria. Label 3 refers to crossgenome alignment procedure.1 Central dogma of “genome-wide survey on sequences” Note: Pictorial representation describing the procedures involved in “genome-wide sequence analysis”. Such an approach proved valuable for identifying the natural ligands of Drosophila and human orphan receptors. [ Figure 1. Label 4 refers to cross-genome phylogeny on sequences. identification of orthologs. observing functional clues to hypothetical proteins in the phylogeny. 2005) organized as eight major groups of GPCRs. Label 5 refers to cross-genome cluster association and analysis for species-specificity. cross-genome sequence studies with selected model organisms will be useful for vast practical applications.

These case studies illustrate the important applications of genome-wide survey and usage of phylogeny in identifying similar or related sequences for protein of interest across genomes. phylogenetic analysis in discriminating gustatory and olfactory receptors in Drosophila (Robertson et al 2003). influence of phylogenetic analysis in ethno-medicinal studies (Saslis-Lagoudakis et al 2011) are highly commendable.4 proteobacterium like Sinorhizobium meliloti (Schluter et al 2010) based on experimental data. Also. As purification and crystallization of membrane proteins are very crucial events in membrane protein crystallography (Dilanian et al 2011).160) structures in the PDB. identifying olfactory receptor subfamilies in mouse (Sullivan. elegans (Robertson and Thomas 2006). For structural crystallization.2 BREAKTHROUGHS IN GPCR CRYSTALLOGRAPHY STUDIES As we know. phylogenetic grouping of serpentine receptor superfamilies in C. membrane proteins embedded in the lipid bilayer have to be extracted and need to form a protein-detergent complex (PDC) (KoszelakRosenblum et al 2009). while solving three-dimensional structures of membrane proteins. only a limited number of membrane proteins have been reported so far. crystal structures are available for only very few membrane proteins. amphibians (Freitag et al 1995). the diverse cell surface proteins exist as 30% in human genome and are very popular for their therapeutic importance and applications. phylogenetic classification on transporters and membrane proteins from lower organisms (De Hertogh et al 2002) to higher–order organisms (Chang et al 2004). the surrounding environmental lipids in cell membranes interfere with both crystallography and nuclear magnetic resonance (NMR) spectroscopy. 1. Among the available (>82. phylogenetic analysis on olfactory receptor subfamilies (class I and class II type) in fish (Freitag et al 1999). et al 1996) and human (Glusman et al 2001). .

TM-helix 3 in blue-green (106–140). β1 adrenergic receptor (Warne et al 2008). TM-helix 7 in red (286–309). These structural studies will guide to compare the reference structures with disease-implicated genes based on modelling to interpret the dysfunctions. dopamine D3 receptor.2. .2 a and b). adenosine receptor (Jaakola et al 2008). β2 adrenergic receptor (Rasmussen et al 2007). TM-helix 5 in yellow (200–230). TM-helix 8 in magenta (311–321).sphingosine 1phosphate receptors (S1P1 receptors) are few important crystal structures.a photoreceptor protein.is the first solved crystal structure (Palczewski et al 2000) (Figure 1. 2004). 1. et al.. Space-filling representation of rhodopsin. TM-helix 2 in light blue (71–100). TM-helix 4 in yellow-green (150– 173). Most of the solved structures are used as templates for molecular modelling.2 Crystal structure of bovine rhodopsin (Li et al 2004) a) Crystal structure of bovine rhodopsin displayed in ribbon representation (Li.5 Figure 1. CXCR4 chemokine receptor (Wu et al 2010). TM-helix 6 in orange (241–276). The observed seven TM-helices and one peripheral helix are colored in the rainbow order: TM-helix1 in dark blue (residues 34–64). b) Rhodopsin. histamine receptor and most recently reported lipid GPCR .

a number of monogenic mutations have been identified in rhodopsin causing disease called retinitis pigmentosa. serious illness such as schizophrenia (Seeman 1987). AIDS and so on are few other examples to emphasize the multi-functional role of GPCRs and its clinical implications. autonomous nervous system transmission and behavioral and mood regulation. dopamine and serotonin (5-hydroxytryptamine. Diversity of GPCRs and ligand-binding properties make these receptors as interesting targets for the structure-based drug design (Schlyer and Horuk 2006) and even lead the scope for personalized medicine. sense of smell. asthma. neurodegenerative diseases. Alzheimer's disease and Parkinson's disease (Lee et al 1978).3 GPCRS: POPULAR DRUG TARGETS As GPCRs are involved in a wide variety of physiological processes. heart diseases. visual sense. Also there are many reported disorders such as genetic disorders of the calcium-sensing receptor (CaSR). receptors such as AT1 angiotensin. Several previous reviews and literature highlight the clinical importance of GPCRs (Insel et al 2007) and few examples can be discussed to denote the importance of GPCR biology in medicine. . graves disease. cancer. For instance. they are effectively targeted in medicinal chemistry. Notably.6 1. diabetes. 5-HT) receptor subtypes are most exploited for their clinical importance and related diseases which are all useful drug targets. such as regulation of immune system activity and inflammation. adrenergic. and diseases related to autoimmunity. number of endocrine disorders. cell density sensing.

A class of cell-surface receptors retain structural features.4 STRUCTURE AND CELLULAR ACTIVITIES OF MEMBRANE PROTEINS Membrane proteins are embedded within the lipid bilayer and are designated as transmembrane proteins. Wnt proteins or endogenous cell surface adhesion molecules or photons and exogenous compounds like odorants. having extracellular N-terminal. clinical importance (Kuwabara and N 2001) and availability of repositories for multiple organisms (Fredriksson and Schioth 2005) provide significant impetus for the study of GPCR sequences and their ligand-binding properties.2). and recognize various type of ligands (Bockaert and Pin 1999). The prediction methods are mainly based on . participation in signaling pathways (Greenwald 2005).5 MEMBRANE PROTEIN: TOPOLOGY There are several prediction methods available online to predict topology of membrane proteins. and are popular for their versatile functional importance. GPCRs are ubiquitous as they majorly participate in signal transduction. peptides. since they loop inside and outside of the cell boundaries (Figure 1. Since the downstream targets of such membrane receptors are guanine nucleotide binding proteins. intracellular C-terminal with seven transmembrane-helices (TMHs) connected by three intra and extracellular loops and reminding a snake-like structural element /display to have names such as 7TM receptors or heptahelical receptors or serpentine-like receptors (Probst et al 1992).7 1. Substantial evidence on GPCR oligomerization (Prinster et al 2005). they are also referred as Guanine nucleotide-binding protein-coupled receptors. serpentine receptors. 1. G-protein coupled receptors (GPCRs). Ligands could be endogenous compounds such as amines.

elegans The predicted seven trans membrane helices (by HMMTOP) for odr-10 was given in TOPO2 display. PREDTMR2. DAS. . The conserved “YRY” motif in TM3. 44-63 for TM2 . ICL2 and the Str superfamily specific “QLF” motif in ICL3 has been highlighted in red colour. but olfactory receptors show N-out and C-in topology in higher order organisms (Figure 1. 202-225 for TM5. Lundin et al 2007) and also referred as inverted/reverse topology.3 Membrane topology of olfactory receptor (odr-10) in C. Figure 1. Phobious and TOPCON are popularly used to predict the secondary structure of membrane proteins. TSEG. 94 -113 for TM3. The other interesting fact is that especially Drosophila ORs and GRs retain N-in and C-out topology (Bargmann 2006. SPLIT. TM-finder. MPEx. The methods like HMMTOP (Tusnady and Simon 2001). MEMSAT. canonical GPCR members exhibit N-in and C-out topology.8 the “hydrophobicity” profile of the helices.TMHMM (Krogh et al 2001). 256-275 for TM6 and 286-305 for TM7 was predicted by HMMTOP. TMAP. 126-145 for TM4. Pred-TMP. TMpred. Notably. SOSUI (Hirokawa et al 1998). Methods are also available to discriminate signal peptides (Lao et al 2002) in proteins. TopPred II. Benton et al 2006.3). wherein residues from 12-31 for TM1.

The Golf subunit is mainly related to sense the chemosensory signals and participates in olfactory signaling pathways (Figure 1. and phospholipases. For instance.e. neurotransmitters. and several sensory messages (such as light. Also. adenylyl cyclases.6 GPCR MECHANISM Membrane proteins are effectively involved in signal transduction (Figure 1. biogenic amines. lipids. Activation of AC stops when G-proteins return to the GDP-bound state (Figure 1. but still the area remains unclear) and causes the activation of a guanine nucleotide-binding proteins (G-protein)..4). receptors undergo conformational change (i.9 1. AC activity is triggered when it binds to a subunit of the activated G-protein and subsequently triggers cAMP pathway for further transduction to result in various biological responses. .4). Gs state of G-protein regulates the enzyme called adenylate cyclase (AC). minimal rearrangement occur in TM6 and TM3 helices. GPCRs are also involved in various secondary pathways like ion channels. GPCRs are dedicated to recognize intercellular messenger molecules (such as hormones. this event is primarily dependent on the type of the G-protein. odors and gustative molecules). Due the influence of various external stimuli. where GPCRs are activated by various external stimuli (Rodbell et al 1971).4). growth and developmental factors).

Superfamily of GPCRs are classified majorly as class A (rhodopsin-like).7 GPCR CLASSIFICATION GPCRs comprise the most ‘prolific’ family of cell membrane proteins.htm) 1. class C (Metabotropic glutamate). activation of G-protein subunit. class D (Fungal pheromone). nucleic acid. olfactory. class B (Secretin-like).ibibiobase. biogenic amine. Though all the candidate GPCRs from various families retain seven TM-helices and are connected by ICLs and ECLs. Particularly. sequence differences occur and exhibit subtle structural diversity (Gether 2000). The candidate GPCRs with characteristic seven TM-helices were classified with the aid of several prediction methods and classifiers.a database of dopamine D4 receptor (home page) and SOURCE: TRENDS in Pharmacological sciences URL: http://www. subsequent activation of cAMP and event of internalization for biological responses. occupying 80% of the distribution and retains diverse receptors like rhodopsin. class A is the largest. class E (cAMP receptors) and class F (Frizzled/smoothened) (Kristiansen 2004). and .4 GPCR signaling pathway Image represents about GPCR-signal transductions which depicts the entry of ligands /stimuli. (Image adopted from DB-DRD4 . bioactive lipid. Knowledge on GPCR classification is necessary since they involve in various signaling pathways and recognize diverse set of ligands and are related to various biological functions.10 Figure 1.com/projects/db-drd4/G_protein.

Class D retains receptors such as fungal pheromone P and α-factor receptors (STE2/MAM2). Notably. glucagon. such as frizzled type receptors/FRZ (Vinson and Adler 1987.e. Bhanot et al 1996). not only for its biological or chemical perspective. excluding putative candidate GPCRs. Ca2+-sensing receptor. Class C includes receptors like metabotropic glutamate receptors (mGluRs). 1. Recently. In general. ocular albinism (Schiaffino et al 1996. Arabidopsis thaliana receptor GCR1 (Josefsson and Rask 1997). chemical senses are broadly divided into olfaction (the sense of smell) and gustation (the sense of taste). parathyroid hormone. ORs are fascinating for their functional significance . vomeronasal receptors type 1 /VNS (Dulac and Axel. γ-aminobutyric acid type B receptors (GABA-B) and vomeronasal receptors type 2. 1995). but also for its powerful sociocultural phenomenon (Low 2005). B and C cover nearly 600 GPCRs in the human genome. and plant receptors (Grill and Christmann 2007) – i. few other GPCR families. (Perfus-Barbeoch et al 2004) have also been added to the existing GPCR families.7.1 Olfactory Receptors (ORs) “Sense of smell” .a process of olfaction is beyond simple scientific understanding. Critical knowledge on understanding and analyzing about the olfaction is a necessary science. It has been observed that Class A. vasoactive intestinal peptide and so on are related to class B. Olfactory receptors participate in sensing diverse chemical stimuli or odors (Firestein 2001). whereas fungal pheromone A and M-factor receptors (STE3/MAP3) are related to class E.. olfactory receptors (ORs) are members of class A type receptors and has been dealt exclusively in Chapter 6 under the title of genome–wide survey on olfactory receptors in selected eukaryotes. Wherein receptors such as secretin. calcitonin. Schiaffino et al 1999).11 peptide receptors. smoothened type receptors/SMT (Alcedo et al 1996 and Nehme et al 2010). Class F retains slime mold cyclic adenosine monophosphate (cAMP) receptors.

hormone state and also mood (Munger et al 2009). studies related to insect olfaction (Robertson et al 2003). to assess its quality. species-specific tendencies and co-clusters in tree topology (Chapter 6 for more details). conspecifics. olfactory dis-orders and so on. mates as well as threats. olfacto-sexual function and to study olfacto-neural communication. Due to their diverse role.12 in detecting food. ORs are very important as well as present in our everyday life experiences and are need to be explored more in detail for the vast practical applications in the field of pharmaceutical industry (aroma therapy). . to enhance its flavor. food industry. availability of ORs in various genomes. have explained about the role of olfactory receptors and the organization of olfactory system in humans (Buck and Axel 1991). Further research studies on phylogenetic approach in discriminating class I and class II type receptors to sense the water. cosmetic industry (scent/perfume manufacturing).e. cluster-specific motifs. to indicate the presence of potential toxins and pathogens. gender.and air-borne odors in higher eukaryotes i. 1. Niimura and Nei 2005). by Nobel Laureates Buck and Axel. genetic identity. to know about reproductive status..2 Classical Knowledge on Olfactory Receptors The landmark paper published in the year 1991. ORs activate chemosensory cells leading to neural recognition and influence behaviours. and observed common peptides in OR subfamilies in selected eukaryotic genomes further to (Gottlieb et al 2009) are providing remarkable background and facilitate the genome-wide survey of ORs identify OR subclusters.7. Around three percent of our genes are used to code for different odorant receptors on the membrane of the olfactory receptor cells. human and mouse (Zozulya et al 2001. nematode olfaction (Robertson and Thomas 2006). olfactory signaling . performing genome-wide survey on ORs of selected eukaryotic organisms will improve scientific credibility and ultimately serve for human benefit. Thus.

In this.3.7. generating the major second messenger 3`. Such binding activates Golf – a G – protein.5 ORs and organization of the olfactory system in mammals and OR signaling pathway (Meyer et al 2000) a) Depicts the pictorial representation of ORs and organization of the olfactory system in mammals b) Depicts OR signaling pathway. Olfactory Signaling Pathway in Human ORs The process of olfaction primarily starts with binding of an odor to specific receptor on sensory neuron where chemical energies transformed to electrical signals to sense the smell. The other hypothesis (lower panel in b) explains the components of cGMP-signaling pathway and putative targets of cGMP which involves receptor guanylyl cyclase GC-D. The alpha subunit of Golf activates the enzyme adenyl cyclase.5 a and b). enabling the main olfactory system and using common pathway to encode thousands of odorants (Figure 1. (a) (b) Figure 1. an adenylyl cyclase (ACIII). Depolarization of these cells cause action potentials (nerve impulses) and are sent to the olfactory bulb and also by the pathway involving guanylyl cyclase GC-D (Meyer et al 2000). cAMP is degraded by a CaM-dependent phosphodiesterase (PDE1C2). a cyclic nucleotide-gated (CNG) channel (α3α4β1b) and a chloride channel (ClC). upper panel describes the entry of various odors and recognized by ORs and initiate cGMP signaling pathway which involves G protein (Golf). which depicts the proposed two hypothesis of OR-signal transduction (Meyer et al 2000). This allows the Na2+ and Ca2+ to flow in and depolarize the cell. After the response.5`-cyclic adenosine monophosphate (cAMP) which directly opens the cyclic nucleotide gated channel.13 1. . cGMP-regulated PDE2. Human nose expresses different types of receptors. an unknown cGMP-regulated ion channel and the known CNG channel of the cAMP-signaling pathway.

But. insects GRs have the same transmembrane topology as ORs. Ionotropic Glutamate Receptors (IR) in Drosophila is referred as a new family of odorant receptors and these proteins accumulate in sensory dendrites and not present at synapses. In insects.4.14 1. These proteins are expressed in distinct subsets of olfactory neurons and certain family members were restricted to distinct portions of the olfactory system. Electrophysiological studies explained the differentiation in the . It is believed that nearly 60 olfactory receptors (Berkeley Drosophila Genome Project database) play a major role in identifying and discriminating diverse odors for the insect–survival and these Drosophila olfactory receptor (DORs) gene family are identified as G-protein coupled receptors (Clyne et al 1997. olfactory neurons play a central role in sensing volatile cues that afford the organism the ability to detect food. the taste neurons initiate innate sexual and reproductive responses. Nearly the same numbers of gustatory receptors (GR) are meant for gustatory functions (Clyne et al 1997). gustatory neurons sense soluble chemical cues that elicit feeding behaviours.7. Insect olfaction (Drosophila ORs) Several fundamental explanations have been published (Siddiqi.7. ORs. Notably. GRs and IRs in Drosophila As we know.5. They mediate chemical communication between neurons at synapses and are expressed in a combinatorial fashion in sensory neurons that respond to many distinct odors. but do not express either insect odorant receptors (ORs) or gustatory receptors (GRs). (Clyne et al 1999) to investigate molecular mechanism on Drosophila olfaction. 1990). Vosshall and Stocker 2007). 1. predators and mates. Gao and Chess 1999.

Generally. it is also mentioned that heteromeric insect ORs comprise a new class of ligand-activated non-selective cation channels (Sato et al 2008). Or82a and Or10a were tested experimentally with 110 odorant molecules using empty neuron system (Dobritsa et al 2003) and responses of receptors vary to different chemical classes. Or49b. Or23a. Or22a. Or67c.15 morphology of the olfactory sensilla and their distribution patterns (Venkatesh and Singh 1984. 24 antennal receptors such as Or2a. elegans. Or35a. Or43a. Or85b. Or47a. insects ORs lack homology to G-protein coupled chemosensory receptors of vertebrates and exhibit drastically differing mechanisms in olfaction. Or9a. OR83b is also called as “coreceptor” (Vosshall and Stocker 2007) for its functional importance. Notably. Or19a. Or7a. Recent studies explained insect ORs as heteromeric ligand-gated ion channels (More details in Chapter 6). Or43b. chemosensation plays a central role in nematodes for its survival. Or85a. In C. the functional insect ORs retain variable insect ORs with a constant odorant binding receptor called OR83b and forms the heteromeric complex then participate in signaling pathway. Since worms lack both auditory and visual sense. Or47b. based upon the odor response profile of individual neurons and few exhibit odor specificity. Or85f. Notably.6. Or65a. In the literature (Larsson et al 2004). Or98a. Or67a. Or88a. Studies suggest that there are 30 different classes of ORNs in the antenna (in adult ~40). Around 1330 genes and 400 pseudo genes have been . Or59b. Nematode Olfaction Chemosensory receptors in nematodes are highly diverse and large in number. 1. Stocker 1994). chemosensory receptors belong to G-protein coupled receptors and retain seven transmembrane proteins. Or33b.7.

only one protein namely odr-10 (Figure 1.7. In mouse. among them 120 OR genes were potentially functional. Mouse Olfaction As found in human olfactory receptors. all of the class-I type ORs were located in a single large cluster in chromosome 7. 1. Database namely SEVENS (Ono et al 2005) provides useful sequence information. elegans (Sengupta et al 1996).Oligomerization Knowledge Base Project. but had been considered an evolutionary relic in mammals (Ngai et al 1993) and the class II receptors are found in amphibians and terrestrial vertebrates (Freitag et al 1995). GPCR Natural Variants database (NaVa). mouse ORs also possess two broad classes of ORs with excellent bootstrap support (Glusman et al 2001). The class I type in mouse ORs are as found in fish and in the frog.7.8 DATA REPOSITORIES FOR MEMBRANE PROTEINS There are a huge number of data repositories and prediction servers for membrane topology are available exclusively for membrane proteins. was reported as an olfactory receptor in C. IUPHAR (Committee on Receptor Nomenclature and Drug classification) incorporates detailed pharmacological. repositories related to GPCRs (Elefsinioti et al 2004) like gpDB (Theodoropoulou et al 2008). There are 147 class I OR genes found in mouse OR subgenome.3). GPCRDB and integrated web resources like G Protein Coupled Receptor . chromosomal location and intragenomic phylogenetic clusters for membrane proteins from more than 50 eukaryotic organisms. 1. Also many of these receptors are known as serpentine receptors and around 19 largest gene families are reported so far. functional and . Notably. elegans. Among the large number of proteins.16 identified as chemoreceptors (Robertson and Thomas 2006) in C.

HOMOLOGUES Sequence similarity searches are robust techniques to identify nearest homologues for a query sequence from database of interest. TOPDB (Tusnady et al 2008). when a query and the subject are aligned with high similarity scores.17 patho-physiological information on GPCRs. paralogs were generated by the event of gene duplication and belong to the same genome. peripheral and anchored membrane proteins and also peptides (Raman et al 2006). ligand-gated ion channels and nuclear hormone receptors.9 COLLECTION OF GPCR. homologues share . Data repositories for olfactory receptors are also available for public access. TMpad (Trans Membrane Protein Helix-Packing Database) and MPDB –(Membrane Protein Data Bank) are useful to provide structural information on integral. provide collection of domains and sequence motifs. ORDB (Skoufos et al 2000). The similarity scores depend upon the sequence features like amino acids and permitted amino acid substitutions (AAS). Generally. In other words. odorMapDB are highly useful and particularly relevant to retrieve sequences for the olfactory receptors (ORs) from multi-genomes. two proteins retaining similar sequences can be called as homologues. Homologues are further classified into orthologs and paralogs. voltage-gated ion channels. HORDE (The Human Olfactory Data Explorer) and integrated web resources from Sense Lab for ORs with associated links such as odorDB. then they can be referred for their sequence relevance and can be called as homologues. While orthologous proteins evolved from a common ancestral gene belonging to two different genomes. Pairwise comparison of proteins is a fundamental step in “sequence similarity searches”. The other related databases for structural resources like PDBTM. Thus. 1.

so that the structure and function can be extrapolated to the new sequence. Functionally and evolutionarily important protein similarities can be recognized by comparing three-dimensional structures. but when structures are not available.1 BLAST (Basic Local Alignment Search Tool) Sequence comparisons between two sequences are achieved by producing quality alignments which maximize the correspondence between similar residues and minimize gaps (Altschul et al 1997). are effectively used to find homologues and further to identify common functional relevance. BLAST and FASTA (Lipman and Pearson 1985) are robust methods. patterns of conservation such as motifs. dynamic programming technique has been implicated to achieve alignments locally (BLAST) or globally (FASTA). RPSBLAST. IMPALA (profilebased searches) (Schaffer et al 1999) other approaches like PSI-BLAST. and Hidden Markov Models can be used to identify related sequences from the database of protein sequences. profiles. FASTA (sequence based searches) (Lipman and Pearson 1985). The objective here is to align or match a sequence of unknown function with characterized/annotated proteins from model organisms. positionspecific scoring matrices. A necessity arises to select an appropriate technique for similarity search when we deal with evolutionarily distant sequences and particularly membrane proteins. Conceptually.18 significant sequence similarity and can be further connected for their functional relevance. 1. the heuristic approach (BLAST) can deal with sequences considerably differing in length and identifies islands of . Generally. et al 1997). Several methods like BLAST (Altschul.9. Each method is unique for its scoring scheme with respect to amino acid substitutions and the gap penalties.

9. PHI-BLAST (Pattern Hit Initiated BLAST) and DELTA-BLAST (Domain . blastn. which includes blastp (protein-protein BLAST). whereas PAM and JTT-200 (Jones et al 1992) can be used for membrane proteins. An E-value refers to the number of alignments one expects to find with a score greater than or equal to the observed alignment score in a search against a random database. Generally. E-values are most often used. The scoring system majorly includes the substitution matrix and the gap-scoring scheme to align the sequences based on possible similarities. PAM (point residues) amino acid scoring accepted mutations per 100 matrix which is based on an explicit in the BLAST evolutionary model (Dayhoff et al 1978) is provided software distribution. Among them. It relies upon Smith-Waterman algorithm (Smith and Waterman 1981). lowest E-values are considered as highly significant for best alignment. blastx.is applicable for five main search methods such as blastp. It includes PAM40. 62 and 85 (Henikoff and Henikoff 1992). BLAST-a robust sequence comparison tool .19 short matches. these matrices are very appropriate to deal with globular proteins. bit scores and E-values are considered for quantify the alignment significance. Generally. 1.2 PSI-BLAST (Profile Vs Sequence comparison method) Among the five BLAST programs. PSI-BLAST (Position Specific Iterated BLAST). BLAST produces statistically significant alignments in the output and features like raw scores. the work described in this thesis mostly relies on the basic protein BLAST technique. and PAM250. whereas the BLOSUM matrices are based on an implicit model of evolution and includes BLOSUM 45. PAM120. tblastn and tblastx for varying inputs such as nucleotide and protein sequences. and is guaranteed to find the optimal local alignment with respect to the scoring system to provide maximal scoring segment pairs (MSPs).

Representative sequences from the protein families (example:3PFDB Shameer et al 2009). this method is effective in associating even distantly related sequences with remote homology. 2010) and the improved PSI-BLAST search techniques such as cascade PSI-BLAST (Bhadra et al 2006) as per user requirement. Hence.9. jack-knife approach.6).3 Reverse PSI-BLAST (Sequence Vs Profile comparison method) To associate remotely related sequences. blastp compares a protein query with a protein database. The generated profiles at each iteration. PSSMs give the amino acid propensities at each sequence position based on the multiple alignments. . 1.20 Enhanced Lookup Time Accelerated BLAST). PSSM generation also uses the multiple alignment sequence weights. Thus. the expected number of amino acids and the frequencies of unobserved amino acids (pseudo counts). wherein the query sequences are searched against a database of PSSM (Position Specific Scoring Matrices) profiles. As the name suggests. PSI-BLAST allows the user to build a PSSM (position-specific scoring matrix) using the results of the first blastp run and iteratively uses the profile as query against the database of protein sequences (Altschul et al 1997). sequence search space has been broadened and opportunity has been extended to connect sequences at remote homology (Figure 1. reverse PSI-BLAST technique (RPS-BLAST) is highly effective. The application can be further improvised by using as jump-start PSI-BLAST (Altschul et al 1997). related domains and cluster types can be used to generate profiles to represent sequence properties as a block of consensus of amino acids. HOE (Homologous over-extension) reduced profile search (Gonzalez and Pearson. are searched against the database of protein sequences by rigorous iterations until convergence (meaning iterate until no new sequences are found). This method differs from other sequence searches.

So. the practical implications like generating cross-genome phylogenies. finding new members. setting significant E-value thresholds and to interpret sequence search for related profiles. This effective method can be employed carefully in designing profiles. classification and to associate functional annotation to new sequences based on known data. there is little chance of missing very distantly related sequences in these search techniques. HMMs have been used for gene prediction. If stringent sequence properties are employed. Hidden Markov Model (HMM) can also be used for pattern recognition and it provides a mathematical representation of a protein sequence (Eddy 1998. recognition of transmembrane helices (Sonnhammer et al 1998). where the predictions relay upon training dataset. some limitations do exist. Machine learning approaches are appropriate techniques to deal with pattern recognition problems and to recognize remote homology.21 In the other method. . But. phylogenetic analysis (Felsenstein and Churchill. RPS-BLAST helps to associate even the distantly related sequences to its related profiles. Method like support vector machines (SVMs) (Pugalenthi et al 2010) is effectively used in classification problems where the already trained dataset with known features (Positive set) is used to associate unknown gene/protein sequence (Negative set) and is useful to propose putative members. 1996) and in distant homology detection (Krogh et al 1994b). Separately. that compares protein sequences against database of protein sequences. Karplus et al 1998). scaled at sequence against database of sequences. associating evolutionarily distant sequences.

SEVENS DB. where .6) in analyzing the relationships among diverse sequences.) following the collection of sequences. 1. which are facilitating sequence comparison studies and the sequence can be aligned by various alignment methods. using redundancy filter as the primary step for the cross-genome studies. Weights can be assigned to the aligned elements so as to determine the degree of relatedness or to detect the existing homology between the multiple sequences. A pairwise alignment is between two sequences and a multiple sequence alignment (MSA) with many sequences. The methodology is starting with sequence search programs (such as BLAST. PSI-BLAST. RPS-BLAST) to homologues sequences and to perform cross-genome analysis.1 and Figure 1. Here. predicting the membrane topology. The arrangement of two or more sequences can be possible by aligning the sequences for common properties or sites. HORDE and so on.22 Figure 1.10 MULTIPLE SEQUENCE ALIGNMENT TECHNIQUES Alignment procedures play a crucial role (Figure 1. “n” number of sequences were aligned simultaneously. ORDB.6 Overview on the techniques involved in genome–wide survey The given diagram depicts the use of available data repositories related to membrane proteins (GPCRDB. instead of aligning two sequences. MSA can be referred as a generalization of pairwise sequence alignments. PHI-BLAST.

The method improves quality of alignment by implementing amino acid weight matrices such as BLOSUM with series of 80. Membrane proteins differ considerably from globular proteins in sequence composition.1 CLUSTAL W The CLUSTAL W (Thompson JD. Multiple sequence alignment techniques which are designed for globular proteins are not optimal to align the transmembrane proteins.45. Later. PAM with series of 20.62. thus called as multiple sequence alignments and the alignment of multiple sequences is possible by introducing the gaps “_” into the sequences. 120. And recommended alignment procedures (Pirovano 2008). 3) The sequences are progressively aligned according to the branching order in the guide tree. 350. then the alignment has been referred as “cross-genome sequence alignments” and the resulting phylogeny is referred as “cross-genome phylogeny” (Figure 1.30. dynamic programming algorithm was used to enhance accuracy by providing the scores using gap opening penalties (GOP) and gap extension penalties (GEP). 1983). 1994) is a popular MSA tool and generally the MSA technique consists of three main stages like 1) All pairs of sequences are aligned separately in order to calculate distance matrix giving the divergence of each pair of sequences. Initially.23 “n” is always >2. can be employed carefully. 1. 60. When sequences from different genomes have been aligned together. The region that inserts into the cell membrane possesses different hydrophobicity patterns when compared to soluble proteins.10. the CLUSTAL W program apply fast approximate (heuristic) method based on the number of K-tuple (this is the size of exactly matching fragment that is used) matches for generating pairwise distances (Wilbur and Lipman. GONNET .6). 2) A guide tree is generated from the distance matrix.

an alternative iterative scheme was implied to enhance the alignment quality. The method is very advanced than other alignment programs. the servers to align TM-proteins (like PRALINE TM) are more specific. The reliable topology prediction methods guide the boundaries of TM domain and loop as an initial requirement. in increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. which are conceptually different in aligning TM helices and loops by using different matrices (for example PRALINE TM and MAFFT). TMHMM v2. the profile scoring scheme simply applies TM-specific substitution scores from the matrices like PHAT to reliably compare TM positions. Then. Earlier methods like STMP (Shafrir and Guy. 0 (Krogh et al 2001) and Phobius (Käll et al 2007) for membrane predictions.24 matrix (can be used for larger datset) with series of 80.2 PRALINE TM Thus. such as the progressive method (FFT-NS-2) and the iterative refinement method .3 MAFFT MAFFT (Multiple Alignment using Fast Fourier Transform) can be used for aligning large datasets of transmembrane protein. 1 (Tusnady and Simon.10. 2001). 250 and 350. 1. where the transmembrane regions are first predicted (Pirovano 2008). MAFFT alignment program (Katoh et al 2002) is more effective with two different heuristics. Finally. there are few recommended alignment tools to align transmembrane proteins. 2004) is also useful and is the first multiple sequence alignment program targeted to align transmembrane proteins. Recent study suggests that PHAT matrix (Ng et al 2000) outperforms to the JTT matrix (Jones et al 1992) especially on database searching (Ng et al 2000). 160. 120. 1. Though CLUSTAL W is handy to align large number of sequences with reliable accuracy. PRALINE TM refers HMMTOP v2.10.

Here. The other important feature of the program is that the number of input sequences can be very large and it offers a range of multiple alignment methods such as L-INS-I (accurate.000 sequences) and so on. can be viewed/edited (if required) with the help of alignment viewers/editors. wherein the tree topology depicts the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics. Alignment viewers and editors such as Seaview. ML method refers to the probabilistic approach and evaluates every possible tree topology given a starting set of sequences. FFTNS-2 (fast. Various computer-aided programs are available to generate maximum likelihood (ML) (Strimmer and von Haeseler 1997) or unweighted pair group method with arithmetic mean (UPGMA) algorithm (Sokal and Michener 1958) or neighbor joining method (NJ) (Saitou and Nei 1987) of constructing phylogeny. Yet another attractive feature of the alignment program is to provide a range of matrices.11 DERIVING PHYLOGENY OF GPCRs/ORs Multiple sequence alignments (MSA) supply the sequence properties at equivalent regions. and MEGA are highly useful in visualizing and improving the alignment quality. . for alignment of <~10. Large number of sequences aligned by appropriate alignment tools. The tree-based representation of the observed relationship among the species/sequences (protein or nucleotides) can be inferred for the passed evolutionary trends within and across genomes. Genedoc. CLC. BioEdit. Jalview. for alignment of <~200 sequences). SeqPop.25 (FFT-NS-I). especially JTT 200 matrix (Jones et al 1992) which is usually meant to deal with membrane proteins. 1. which can be used to drive phylogenetic analysis. It is a hypothesis.

search for the optimal choice can be reached. These distances can be viewed as a rough measure for the overall sequence divergence. 1. As this step . NJ algorithm searches not only evaluate pairwise distances (using distance matrices). the supplied ‘n’ (number of sequences in the alignments) is set for the quartets. but also set neighbors that minimize the total length of the tree. and by maximizing the total probability of the tree. ac|bd.11. and ac|bd are weighted by their posterior probabilities. NJ method is recommended to deal with sequences whose evolutionary distances are short. distance matrix. 1981) is a free computational phylogenetic package consisting of 35 portable programs. This is performed in three steps: In ML step. 1. It facilitates to perform parsimony. There are multiple packages available both for the standalone and on-line access. In the puzzling step. In NJ method. quartet trees are considered from intermediate tree adding sequences one-by-one. All quartets are evaluated using ML method and the three quartet topologies such as ab|cd.26 by assigning probabilities to every possible evolutionary change at informative sites. including bootstrapping and consensus trees.2 TREE-PUZZLE It is a popular computer program to reconstruct phylogenetic trees from molecular sequence data such as nucleotide sequence/ proteins based on the maximum likelihood (ML) method (Schmidt et al 2002). and likelihood methods.1 PHYLIP PHYLIP (Phylogeny Inference Package) (Felsenstein.11. It implements quartet puzzling algorithm. Suites like PHYLIP. TREE-PUZZLE and MEGA are more user-friendly and are appropriate tools to perform phylogenetic analysis both for ML and NJ method. The average distance between all pairs of sequences (maximum likelihood distances) is computed. it eliminates possible errors that can occur when we use UPGMA method.

JTT and mtREV24 (Adachi and Hasegawa. a majority rule consensus tree has been built. by using consensus tree. minimum evolution (ME) and maximum parsimony (MP) to produce bootstrap construction tree for the required replications. These two steps are time–consuming and the result files (. For the classification. Understanding the distribution of clusters with significant bootstrap (BS) values helps to classify / group the related sequences.outtree) are useful for interpreting tree topologies.12 CLUSTER ASSOCIATIONS The generated tree topologies can be inferred for cluster associations.27 is highly dependent on the order of sequences. The evolutionary models such as DAYHOFF. 1. MEGA 5 can be employed for phylogenetic reconstruction and phylogeny visualization. maximum composite likelihood (MCL).11. . many intermediate trees from different input orders are constructed. radial and circular displays (Kumar et al 2008). in the phylogenetic analysis on mouse olfactory receptors (Zhang and Firestein 2002). For example.dist.puzzle. nearly 1000 OR genes were classified into several OR families. 2004) are for more distantly related amino acid sequences. MEGA is handy to display tree topologies legibly such as rectangular. with the generated intermediate trees. neighbor-joining (NJ). VT is for use with proteins of distant relationships as well (Muller and Vingron 2000). they identified reliable clusters as those having >50% .3 MEGA (Molecular Evolutionary Genetics Analysis) MEGA is an user-friendly software for phylogenetic studies. 1996) (is for use with proteins encoded on mtDNA) matrices are provided. testing an array of evolutionary hypotheses using maximum likelihood (ML). In the consensus step. and . Others like BLOSUM 62 and the WAG model (Whelan and Goldman. 1. which also integrates sequence alignment approaches like CLUSTAL W and MUSCLE.

PROSITE. yet to be published) can be used to visualize the set of aligned TM-proteins and observed motifs and AAS.13 SEQUENCE CONSERVATION AND DIVERSITY The performed intra.14 HOMOLOGY MODELLING OF GPCRs/ORs The sequence searches and clustering provide representative sequences to generate three-dimensional structures and this further helps to map hotspots and to associate functional properties. MeMotif.and inter-genomic phylogenetic studies guide the sequence association for the species-specific tendency as well as coclustering arrangements. Such annotation tools can be applied in comparative genomics of GPCRs or ORs to identify cluster-specific/family-specific motifs along with the knowledge on predicted topology (Figure 1. By this definition. Available tools and databases such as TOPDOM. 1. Several computational techniques and software tools are available to locate and display conserved amino acid residues in the aligned set of homologues sequences.6). Evolutionarily conserved sequence properties such as motifs (Scott Gleim 2009) are highly important to connect further for the structural and functional relevance.and inter-genomic level (Figure 1. 1. Cluster associations will provide information about the conserved species-specific behaviors and evolutionary integrity obtained at intra.28 bootstrap support and more than 40% protein identity. IMOTdb and SmoS. WEBLOGO. This kind of segregation of gene/protein sequences will create cluster association for the interested protein families. and with the guidance of in-house program MotifS program (by Sowdhamini. mouse ORs were classified into 228 families. Comparative .6).

.29 modelling/homology modelling is an appropriate procedure for generating 3D models for the interested proteins and can be achieved by the following steps: i) Primarily. whose structure is known. The nearest homologues sequence with reference sequence. 1993) and web server like SWISS-MODEL (Arnold et al 2006). Alignments can be manually edited to improve the alignment quality (using MEGA). MAFFT can be used for membrane proteins. Procedures such as PRALINE TM . iii) Building co-ordinates of the three-dimensional model based on the generated alignment can be achieved by using software like MODELLER (Sali and Blundell. iv) Assessing potential accuracy for the generated models and models with least energy constraints can be selected. using PROCHECK server (Laskowski et al 1993) and VERIFY 3D (Bowie et al 1991). ii) Pairwise alignment of template and target sequence can be made by using appropriate alignment methods. v) Structure validation can be done by checking for disallowed conformations or structural environments (can be guided by Ramachandran Plot values. In essence. can be used as a template. If unfavorable conformations and short contacts are observed. the compiled writings in this introductory chapter provide a necessary background to the following work chapters 2-6. homologues sequences of the query can be collected by using effective sequence search methods. model can be minimized by using SYBYL software package (Tripos associate Inc).

they are also referred as Guanine nucleotide-binding protein-coupled receptors. constitute nearly 20% of whole genomes and are most attractive drug targets since they are implicated in various diseases.AN ATTRACTIVE ANIMAL MODEL C. Membrane proteins are embedded within the lipid bilayer and are designated as transmembrane proteins. ELEGANS .1 INTRODUCTION Membrane proteins are ubiquitous (Perez 2005).2 C. intracellular C-terminal with seven transmembrane-helices (TMHs) connected by three intra and extracellular loops and provides a snake-like structural element /display to have names such as 7TM receptors or heptahelical receptors or serpentine-like receptors. A class of cell-surface receptors retains structural features. having extracellular N-terminal. 2. ELEGANS G-PROTEIN COUPLED RECEPTORS 2. C. because of i) Clear understanding on complete cell-lineage from fertilization to maturity (Brenner 1974) in the worm . since they loop inside and outside of the cell boundaries. elegans is an attractive experimental animal model and is a hermaphrodite with the protandrous reproductive system. G-protein coupled receptors (GPCRs) or serpentine receptors. If the downstream targets of such membrane receptors are guanine nucleotide binding proteins. elegans has been used as viable/feasible model organism. For over 50 years.30 CHAPTER 2 CROSS-GENOME CLUSTERING OF HUMAN AND C. ii) Detailed study on its entire nervous .

So. iv) convenient storage and maintenance protocol and v) Availability of genome information from C. also added more interest and confidence in employing C. elegans shows either partial or high degree of conservation with human biology and the relevance is brought into the lime-light (Scientific Frontiers in Developmental Toxicology and Risk Assessment 2000). MRC from Laboratory of Molecular Biology at Cambridge. 2. for the current study. Also.31 system (White et al 1986). One of the best studied pathways in C. Phylogenetic analysis on C. elegans GPCRs has been pursued.2. Sydney Brenner. a cross-genome phylogenetic analysis with selected human GPCRs and C. elegans helps to compare with the higher order organism are as follows: It has been observed that atleast 12 signal transduction pathways in C. iii) knowledge on RNA interference in manipulating the expression of genes (Fire et al 1998). The mentioned classical reasons made this simple nematode as a useful model organism. elegans Sequencing Consortium (Hillier et al 2005). elegans with other organism of interest leads to comparative genomics and helps to explore the observed functional relevance between genomes at cross-genome level. elegans as a model organism. elegans and Human GPCRs Current objective to study cross-genome GPCR clustering could be very effective to address commonality/relevance occur between C. .1 Features Related to C. elegans and human GPCRs at sequence level and the resulted sequence information can be further well-studied for structural and functional relevance. elegans is the insulin/insulin-like growth factor IGF-1 signaling pathway and is related to the mechanism of controlling lifespan of worm. the list of promising features and reasons about C. The same pathway is conserved across taxa like Drosophila and human. who received Nobel Prize in Physiology or Medicine in 2002. Further.

nitric oxide receptor pathway. elegans and (Kuwabara and O'Neil 2001). hedgehog pathway (patched receptor protein)-. have been explored by using the transgenic C. elegans. NF-kappaB pathway.32 High conservation of (TGF-b) pathway. The Machado-Joseph disease gene (SCA3/MJD) also has an identified homologue in C. nematode and vertebrates. Wnt pathway via β–catenin –. Notably. Studies related to neurodegenerative diseases. receptor protein tyrosine phosphatase (RPTPs) pathway. receptor serine/threonine kinase (TGF-β receptor) pathway -. IL1froll receptor. G-protein coupled receptor (large G-protein) pathway. . elegans Sequencing Consortium. genes related to Alzheimer’s disease and colon cancer in humans have counterparts in C. partial conservation of Toll-like receptor pathway and the JAK/STAT signaling pathway. NP_506566 with more than 40% sequence similarity to human gonadotropin releasing hormone receptor I and II (GnRHR1 and GnRHR2) provide a platform for relating extent of similarities in their reproductive endocrinology (Vadakkadath Meethal et al 2006). elegans homologues have been identified for 60–80% of human genes (Lai et al 2000). A recent report on the identification of two C. apoptosis pathway. elegans (Teng et al 2006). various bioinformatic approaches and resources (C. cadherin pathway. ligandgated cation channel pathway are the few pathways mentioned in early literature for the conservation across fly. elegans NP_491453. 1998 and (Sonnhammer et al 1997) also reporting that C. receptor tyrosine kinase pathway. nuclear hormone receptor pathway. integrin pathway. gap junction pathway. Functional expression studies were performed on human somatostatin receptor 2 (Sstr 2) and chemokine receptor 5 (CCR5) in the gustatory neurons of C. Further. receptor guanylate cyclase pathway. receptor-linked cytoplasmic tyrosine kinase (cytokine) pathway. elegans (Fortini et al 2000). such as Parkinson’s and Huntington’s diseases. notch-delta pathway-.

elegans to the already grouped known human GPCRs. previously . The current chapter focuses on comparing C. Also there is an increasing evidence for genetic and physiological similarity (e. elegans (<5%) (C. secretin (SEC). glutamate receptor (GLR). The study is aimed to associate more than 1000 GPCRs of C.3 OBJECTIVES The significant genetic tractability in the whole genome of D. elegans Sequencing Consortium. chemokine receptors (CMK). melanogaster (1%). such as human and Drosophila.. C. For this. and frizzled and smoothened (FRZ/SMT) with selected GPCRs from Drosophila genome. elegans as a model organism for understanding fundamental pathways in higher order organisms. elegans carries an extensive repertoire of Pfam domain matches. nucleotide & lipid receptor (N&L). cell adhesion (CAR). 2. elegans GPCRs (Brenner 1974) with the established human GPCR clusters. 2001).4 PRIOR ART Previous lab publication (Metpally and Sowdhamini 2005) proposed a novel approach to establish phylogenetic cluster association and dealt with eight major groups of human GPCRs such as peptide receptors (PR). stress response and basic physiological processes) with higher order organisms (humans) are noteworthy to deal with C. 2.g. biogenic amine receptor (BGA). 2012 ) and occurrence of more than 1000 candidate GPCRs in humans are further reasons to investigate these cell surface receptors across genome (Marinissen and Gutkind. conserved signalling pathways and homologues of proteins found in other organisms.33 The selected model organism C. elegans GPCRs to explore the nematode genome for its genetic influence in the evolutionary trends (Remm and Sonnhammer 2000) and the effectiveness of employing C.

cross-genome phylogenetic approach is intended to understand the evolutionary plasticity to discover functional similarities and to identify functionally related genes.4. Current study. Melkman and Sengupta 2004) and are usually referred as serpentine receptors (SR) or chemosensory receptors (CR). Sengupta et al identified odr-10 in C.1 Superfamilies of Serpentine Receptors C. orthologous relationship and conserved motif patterns across these two organisms.34 established 32 human-Drosophila GPCR cluster dataset was used and the candidate receptors from Drosophila genome have been removed and the representative GPCRs from each cluster has been used to generate PSSM profile to represent the cluster property and are used to employ (RPS-Blast) (Khader Shameer et al 2009. elegans GPCR families (Sengupta et al 1996). Marchler-Bauer et al 2009) to associate with C. possessing around 400 apparent pseudogenes and almost ~1300 predicted genes that encode members of putative chemosensory genes (Robertson and Thomas 2006. elegans GPCRs. 2. elegans (Chen et al 2005). in turn. These genes are classified under serpentine receptors (SR) superfamily and about 7% of occurrence of serpentine receptors in the whole genome indicates the extreme dependency of chemosensory abilities due to the lack of visual and auditory systems in C. elegans participating in olfactory response which inturn reveals the relationship between odr-10 and other serpentine receptors in C. Recent reports are stating that . elegans chemoreceptor genes are strikingly abundant and diverse. will enable to observe the details of cluster association in retaining nematode-specific gene clusters and also the established evolutionary integrity with human GPCRs at cross-genome phylogeny. In principle. Through genetic screening.

str) and others or Solo type includes srbc. the large Str family along with related sri and srj families are observed to be related to odr-10 (olfactory receptor) in C. srw and srz. 19 of these families are wellestablished and grouped under superfamilies such as Sra superfamily (sra. elegans GPCR sequences Around 1204 GPCR sequences of C. srbc-64 and srbc-66 candidate serpentine receptors are also implicated in pheromone activity in C. elegans with reference to GPCRs. srg 37 are pheromone-like receptors and participate in sensing ascaroside pheromones which are observed in sex chromosomes (McGrath et al 2011). . and sre). In other instance.5.35 serpentine receptors like srg 36. elegans. These case studies are helpful in understanding the divergent chemoperception properties in C.1 i. 2. sru. Notably. elegans (Kim et al 2009).5 2. srv. sri. elegans chemosensory receptors. Srg superfamily (srg. METHODOLOGY Selection Criteria for C. srh. srt. function and expression patterns in C. and srxa). elegans serpentine receptors into nearly 20 recognizable families on the basis of sequence similarity and shared intron locations. They classify C. elegans were collected from SEVENS database (Ono et al 2005) and are subjected to prediction of the membrane topology (Figure 2. srx. elegans GPCRs Retreival of C. srsx. A detailed compilation on SR superfamilies and relative families of odr-10 by Robertson and co-workers (Robertson and Thomas 2006) provide information on phylogenetic distribution. srj. srab. Str superfamily (srd. srb.1).

Since we were interested to perform cross-genome phylogeny with C. iv. elegans GPCRs.1). Elimation of over/ under predicted TM helices : The C. elegans GPCR dataset” (Figure 2. Due to the removal of Drosophila GPCR sequences from the human -Drosophila GPCR cluster dataset of the known cluster association (except for 26th Cluster). so to get human GPCR-only clusters from 31 clusters and Drosophila GPCR –only cluster from one cluster (Cluster No 26). elegans were retained after this screening procedure and the sequences are retained as “C. since it was associated only with Drosophila GPCRs (Metpally and Sowdhamini 2005) and has been used for our current study. The observed consensus from both methods was used to define the eligible candidate GPCRs. Alignment of Human (31)/Drosophila(1) GPCR clusters Human-Drosophila GPCR cluster dataset for 32 clusters of eight major groups were obtained from our previous lab publication (Metpally and Sowdhamini 2005) (herein we refer as “known cluster association”). whereas GPCRs predicted to lower or upper to the mentioned cut-off were removed from the dataset. the extra indels were observed in the previous . Prediction of membrane topology for C. Thus.1). iii.2 in Figure 2. elegans GPCRs The membrane topology of each GPCR sequence was predicted by using SOSUI (Hirokawa et al 1998) and HMMTOP (Tusnady and Simon 2001) prediction methods (refer step 1. the associated Drosophila GPCR sequences were eliminated from the previously established human-Drosophila GPCR cluster.36 ii. totally 1160 GPCR sequences of C. elegans GPCR sequences predicted for 7 (±2) TM helices were retained.

1 Flow-chart to depict the step-wise procedure for crossgenome clustering of GPCRs Note : Step 1. Step 1. respectively.37 alignments. The resulted improved alignment for 32 clusters were retained with totally 353 human GPCR sequences from 31 clusters and 14 Drosophila GPCRs from one cluster (Cluster 26) to obtain 32 clusters (Figure 2. Step 2 refers to the construction of PSSM profiles from the respective human (31) and Drosophila (1) GPCR cluster alignments.4 refers to the construction of human GPCR cluster dataset from already established human-Drosophila GPCR cluster dataset and preparation of respective human GPCR cluster alignment.1) and are referred as pre-aligned set of GPCR association as “GPCR cluster dataset”. Figure 2.1 indicates the collection of C. The observed extra indels in each alignment position of the MSA were carefully edited manually by using MEGA 4.3 refers to the removal of over/under predicted TM-helices. Step 1. elegans GPCRs from SEVENS database and followed by the removal of redundancy by CD-hit server. Step 3 refers to the usage of RPS-blast for the association of C. Step 4 and 5 refers to the cross-genome alignment and phylogeny. elegans GPCRs with the human GPCR profiles and generation of cross-genome GPCR clusters. Step 1.2 refers to the prediction of membrane topology by HMMTOP server and SOSUI. .0 (Tamura et al 2011).

These profiles are termed as “representative profiles”. Cross-associations. which happened in 11 queries. the queries from C. elegans GPCR dataset (unknown association) were given chance to select its mostly related GPCR profile from the generated human (31) /Drosophila (1) GPCR cluster dataset by employing RPS.3 i. for 32 clusters.Blast (Reverse PSI-Blast). With this significant initial standardization and confidence level of 90 % for correct association.001 nearly 90% of the times.5.38 2.1). E-value thresholds were standardized by performing a trial study with 102 sample Drosophila GPCR sequences were chosen as queries and correct cluster association could be obtained using RPS-BLAST at an E-value cut-off <0. were also meaningful (for example: the Q8MKUO receptor identifies cluster 8. a peptide receptor cluster. 32 Profiles were created by supplying their respective MSA of a representative sequence to Psi-Blast procedure (Position-Specific Iterative Basic Local Alignment Search Tool) (Altschul et al 1997) ultimately to produce PSSMprofiles of 32 cluster specific profiles. .2 Generation of Representative Profiles The previously aligned GPCR cluster dataset (by CLUSTAL W) for 32 clusters of known receptor types were used to generate positionspecific scoring matrix (PSSM) or profile to represent cluster/receptorspecific sequence properties. Performing RPS-Blast Trial study with Known associations Separately.5. since it represents sequence property of respective alignments from 32 clusters (refer step 2 in Figure 2. but belongs to cluster 11 that also contains peptide receptors). 2. In the current attempt.

0 and < 5.39 ii.5. elegans GPCRs to select its closely related profile from the dataset with little chance of encountering false connections (11% of predicted false association for known cluster association). The cross-genome cluster association was decided on the basis of its respective profile. elegans GPCRs to the respective human GPCR cluster (as in the previous dataset). In principle. The E-value thresholds are mainly considered for finalizing the association. Setting E-value thresholds for unknown association In preliminary analysis. 2. elegans GPCR sequence at varying E-values guides the respective C. 1. correct associations between the 100 selected Drosophila GPCRs and their respective human GPCR profiles were observed at E-value range of < 0.5. elegans GPCRs The pre-aligned set of GPCR sequences from the dataset with newly associated C.1). percentage identity and E-value from the hit list arrived from RPS-Blast.4 Cross –Genome Alignment of Human – C. 0. elegans GPCR sequences (Figure 2.01.0 till 14 were tried with C. elegans for the current study. and significant bit score. The identification of the closest representative profile by each C.001. the alignment is based on the TM topology and profile scoring schemes such as PHAT matrix with the gap penalty of 15 (1 extension) for predicted TM regions and Blosum 62 matrix for non-TM regions with the gap penalty of . A need of relaxing the E-value thresholds for different ranges has arisen due to fact of higher evolutionary divergence between humans and C.001 Various ranges of E-value thresholds from 0. The previously associated human GPCR sequences of the respective known human GPCR profile were aligned with newly associated C. elegans GPCR sequences were aligned by an appropriate multiple alignment tool called PRALINETM (Pirovano et al 2008) to generate cross-genome sequence alignments of 32 GPCR clusters. 0.0 to > 5.

40 16.5. Some critical decisions of the constructed phylogenetic trees were done based on branching patterns and cluster association (Figure 2.5 Cross Genome Phylogeny of Human – C.6 2.2 Coclusters [CC] Refers to the coclustering or heterogenous distribution or clear intermixing of C.0 software (Tamura et al 2007) (Figure 2.2A and C). The resultant tree files were viewed by MEGA 4.6. alignments were further optimized by manual editing through MEGA 4.5. elegans GPCRs with human GPCRs to represent a strong cross-genomic clustering in the established branching pattern/clade at intergenomic level and is denoted as ‘CC’ in phylogenetic tree (Figure 2. 2.5 (1 extension) followed by an iterative scheme to enhance the alignment quality. as discussed below.1).6. This package was employed to perform quartet-based maximum-likelihood phylogenetic analysis with the puzzling step of 10. elegans GPCRs were clustered to generate cross-genome phylogeny by using Tree-Puzzle (Schmidt et al 2002). Where necessary. 2.000 times and resolved trees were generated as a phylogenetic tree file (outtree files).5. . elegans GPCRs The generated cross-genome GPCR cluster alignments of human – C.2A-C). Few terminologies were introduced for the interest of describing type of cluster association.1 Terminologies used to Describe Phylogeny Human GPCR clade [HC] Refers to the pure distribution (homogenous occurrence) of human GPCRs in the established branching pattern at intra-genomic level and are referred as HC in tree topology (Figure 2. 2.0.2A-C).5.

6. elegans GPCRs adjacent or neighboring to human GPCRs clusters [HC] in the established branching patterns at intra-genomic level and is referred as ‘NC’ in phylogenetic tree (Figure 2. 2. The clades are denoted as ‘ SS’ in the phylogenetic tree. . However.6.5. broken into 24 types. 2.5.2C).specific members [SS] Refers to pure distribution (homogenous) of C.6.4 Neighbor Members [NM] Refers to pure (homogenous) or intermixed (heterogenous) distribution of GPCRs in the branching pattern with limited nodes.2C). as mentioned in CC.6 Superfamilies of serpentine receptors (SR) The distribution of chemosensory receptors of C.5. it is only in the context of cross-genome human-C. 2. Srg and Other. the observed associations may not be viewed as closely related at inter-genomic level.5.41 2. mostly originating from root and is denoted as ‘NM’ (Figure 2. elegans GPCRs and remains as separate clade in a tree topology.6. Str.3 Neighbor Clades [NC] Refers to homogenous occurrence of C.5 Species.2B). elegans are discussed according to superfamilies like Sra. elegans GPCRs and this term does not imply complete set of unique genes observed only in one species in the entire taxonomy and evolutionary tree of life (Figure 2. In this current study. This classification has been followed throughout the 32 clusters to appreciate the influence of human GPCRs and species-specific preservation of such superfamily members. although we refer species-specific members. as suggested by Robertson and coworkers.

refers to NC association.6 RESULTS AND DISCUSSION In the current Chapter (as mentioned in Methods). elegans GPCRs has observed at adjacent or neighboring to HC. the bootstrap supporting values (herein referred as Bs). wherein HC represents the association of GPCRs from human genome. Figure 2. 2. where the occurrence of C. elegans GPCR sequences were associated by querying against the database of human GPCR profiles from the known cluster association by using a sensitive RPS – . refers to the association for speciesspecific (SS).4. 2.4.4.1). 1106 C.A.B. Figure represents the cluster association as HC (human GPCR clade) and CC (Coclusters). CC represents association of GPCRs from human and C.C. Coclusters (CC) and neighbor member (NM) occur in the tree topology. elegans genome. cluster association of serpentine receptor type (superfamily level) are observed using both rectangular (known as dendrogram) and radial view (known as radial display) for the analysis and final graphical display and result interpretations (Figure 2.42 Also.2(A-C) Pictorial representation for various types of cluster association Notes: 2. 2.

cardiovascular diseases etc.6. PR occur predominantly in the dataset.1 Result summary for Peptide Receptors Cluster 1-11 are related to peptide receptors and around 442 GPCRs from C.1). autoimmune diseases.1) for cluster wise association with significant E-values using the procedure for cross-genome association discussed and detailed in Methods The association observed between the two genomes is considered for the best connecting sequences for known receptor type in higher–order organism with a effective model organism further to compare the sequence properties. Few important receptors are given along with their group to emphasize the distribution of various PR in the Clusters 1-11. degenerative diseases. functional. the size of the peptide ligands varies from two amino acid residues to as many as 50. Many of the peptide receptors are related to potential clinical applications and related to various diseases such as chronic inflammatory diseases. cancer. apelin (`~36 amino acids) and orphan receptors GPRF that act as co-receptor for the human immunodeficiency virus (HIV) (belong to subfamily A2 and A3) are observed in the cluster dataset (Joost and Methner . 2. elegans GPCR sequences were tabulated (Table 2. then to connect the structural. Broadly GPCRs occur in A1-A19 subfamilies. hence these receptors also act as interesting drug targets. bradykinin (9 amino acids) – (refer Cluster 3).43 Blast technique. elegans have been associated with nearly 101 peptide receptors of human in the dataset (Table 2. The observed cross-genome cluster associations were discussed in detail below with cluster-wise summary according to the observed crossgenome GPCR topology/phylogeny. The newly associated C. Generally. Small peptide receptors such as angiotensin (8 amino acids). and evolutionary relatedness among them.

MAS1 oncogene . prolactin-releasing peptide receptor. GPR50. neuropeptide Y receptor (Cluster 11). LGR6 from subfamily A10 (Cluster 7) are present in the cross-genome GPCR cluster associations. Glycoprotein hormone receptor. TRFR). urotensin-II receptor related to subfamily A5 of class A type receptors are observed in Cluster 1. prokineticin receptor 1. 2 (Cluster 11). Bombesin receptor. gastrin-releasing peptide receptor (GRPR). Human peptide receptors are distributed into clades. GPR75 and GPR83 from subfamily A9 are found in cluster association. elegans GPCR (CC1). neurokinin receptor (Cluster 11).GPR1 (Cluster 6). GPR32 (Cluster 9).3. further allowing to explore the functional relevance with nematode GPCRs.1). Mutations on this gene have been associated . GPR19. elegans GPCRs at different E-value cutoffs (Table 2.A and B). endothelin receptor (Cluster 7). thyrotropin-releasing hormone receptor (TRHR. coclustering of neuropeptide receptor (npr-9) from C. Cluster 1 is associated with eight human peptide receptors and 32 C. melanin-concentrating hormone receptor. leucine-rich repeat-containing G protein-coupled receptor 4. Cluster 1 Human GPCRs such as galanin receptor 1-3. and motilin receptor that are related to subfamily A7 are observed in the cluster dataset. Anaphylatoxin receptors formyl peptide receptor. Interestingly in CC1. also majorly involved in endocrine regulations and onset of puberty (Seminara et al 2003). GPR44 and GPR77 (Cluster 9) are from subfamily A8. having only human GPCRs (HC1) and another clade having human GPCRs with a C. The human GPCR (Q969F8) observed in CC1 is implicated in breast carcinomas. LGR5. KiSS1-derived peptide receptor (GPR54) (de Roux et al 2003). elegans with human GPCR (KiSS1-derived peptide receptor (GPR54) /Swissprot Code (Q969F8) was observed (Figure 2.44 2002).

NC4 associates with pure set of srd members. this receptor exhibit orthologous relation with a human GPCR (Table 2. srh type) are observed as neighbor members in NM1. elegans model organism. Interestingly. elegans GPCRs. Their associations with cluster-1 human peptide receptors suggest that they serve as closely related environmental peptide receptors. Such functionally important human peptide receptor extend an orthologous relationship (Table 2.2). srw type is related to families of FMRFamide and other peptide receptors. elegans and proves to be an interesting target to be studied in C. Receptors from Str superfamily (str. elegans. which are expected to have relatives in vertebrates and insects and have strong clustering at the chromosomal level (Troemel et al 1995). elegans shows favourable association at significant E-value (2.45 with hypogonadotropic hypogonadism and central precocious puberty in humans (de Roux et al 2003). .2) with neuropeptide receptor (npr-9) of C. where NC1.NC3. npr-9 of C. This further helps to interpret that these unknown or unannotated GPCRs in nematode are probably related to human peptide receptors through cross-genome phylogeny and also probably belong to class A GPCRs (Rhodopsine –like). SS1 and SS2 retain candidate receptors from srw type in C. While observing the distribution of serpentine receptors exclusively at various E-value thresholds in cluster 1 among 32 nematode GPCRs. include majority of hypothetical proteins. Neighboring clusters (NC1 – NC5) exhibit topology only with C.00E-47) with human GPCR (Q969F8). NC5 also neighbor members in NM2.

Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4. Human GPCRs are denoted in green color. quartet – puzzling steps of 10. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. 46 . Str super family (fuchsia/pink). elegans like Sra super family (aqua).000 puzzling steps were done for maximum likelihood method.1.Others/Solo type receptors (maroon).0. hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C. typical membrane proteins (purple).(a) Figure 2.3 (b) (A and B) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 1) : Phylogenetic trees were generated using TREE-PUZZLE 5. Srg super family (blue) .

et al. suggesting clear conservation at superfamily level. 11 candidate human peptide receptors (HC1) and 41 serpentine receptors of C. srv. elegans at SS1 indicates the high nematode–specific tendencies for these receptor types. uvt-6 branches with a hypothetical protein (NP_510833. are also observed as neighboring members in NM1. NC4 and NC5 retain receptors of srsx. this association is also reported in SEVENS database. Notably.3) that belongs to rhodopsin family and is most similar to the mammalian somatostatin receptors (referWBGene00006864) (Wicher et al 2009). such as opioid and somatostatin receptors.2) proving the effectiveness of RPS-blast in associating related homologues across genome. Uvt-6 exhibits the significant association at the E-value of 2. a hypothetical protein namely R106. str).A-B). srv. . srx (‘others’ superfamily) with a hypothetical receptor and NC7 includes six hypothetical proteins and including an unidentified vitellogenin-linked transcript family member (uvt-6).00E37. The observed pure dispersion of srh type receptors of C. NC6 branched with srm. elegans GPCR members with human peptide receptors by RPS–Blast where the somatostatin receptor types are present in its profile. like of srd. srsx and srw family. NC2.2 (NP_510833.00E-41 and found as an ortholog (Table 2. Particularly. This helps to understand the association of C. sre types. 2000) and the neighboring clade NC1 has members from Str superfamily (sri.4.. are found in cluster 2. Diverse members. elegans are distributed in cluster 2 (Figure 2.3) associates with somatostatin receptor type SSR5 at the significant E-value of 3. HC1 remains as clade with a majority of somatostatin receptor types (Matsumoto.47 Cluster 2 The receptors related to subfamily A4.

serpentine receptors of C. Human GPCRs are denoted in green color. Str super family (fuchsia/pink). Out-group is not shown in the figure. 48 .Others/Solo type receptors (maroon).4 (A and B) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 2): Phylogenetic trees were generated using TREE-PUZZLE 5. typical membrane proteins (purple). hypothetical trans membrane proteins (red) are also shown. Srg super family (blue) .(a) (b) Figure 2. quartet – puzzling steps of 10. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.1. elegans like Sra super family (aqua).000 puzzling steps were done for maximum likelihood method. Generated newick tree files were colored by using MEGA 4.0.

By observing the distribution of serpentine receptors in PR cluster 3.00E-08 and 8.49 Cluster 3 In cluster-3. HC1 retains eight entries of human peptide receptors and 34 candidate GPCRs from C. elegans Sequencing NC1. respectively.1 (NP_504431. srv type) of C. Abundant srw members with comparable representation from Str superfamily (srh type) and Srg superfamily (srt. two hypothetical proteins namely T15B7. bradykinin receptor I.2). elegans GPCR members and NC3-NC5 clades carry pure set of serpentine receptors of sri.1998) are distributed into neighboring clusters neighboring members NM1 with 16 mixed receptor types specific clade SS1 (Figure 2. hypothetical protein and srw members. NC2 carries srxa and str type C.00E-08.1).NC5. elegans (C. elegans are dispersed as neighboring members in NM1.5. NC1 retains candidates from srd type of Str superfamily. are associated at the significant E-value thresholds of 3. SS1 covers six candidate GPCRs from Str superfamily and single representation from Srg superfamily.1) and C35A11. Srd types of chemoreceptors in C. in NC4.A-B and Table 2. respectively. GPR15 types belonging to the same subfamily of A3 in GPCR classification (Joost and Methner 2002).11 (NP_504729. elegans retain one potential disulfide bond in extracellular domain 2 and shares with the srh and sri families a highly conserved PYR sequence at or near the inner end of transmembrane (TM) helix 7 and belongs to pfam profile PF10317 (Thomas and Robertson 2008). . HC1 remains with closely related peptide receptors like angiotensin II receptor. and species- Consortium .

serpentine receptors of C. quartet – puzzling steps of 10. Str super family (fuchsia/pink).5 (A and B) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 3): Phylogenetic trees were generated using TREE-PUZZLE 5.0. Srg super family (blue) . Generated newick tree files were colored by using MEGA 4. typical membrane proteins (purple). Human GPCRs are denoted in green color. hypothetical trans membrane proteins (red) are also shown. elegans like Sra super family (aqua). Out-group is not shown in the figure. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.(a) (b) Figure 2.1. 50 .Others/Solo type receptors (maroon).000 puzzling steps were done for maximum likelihood method.

7 (NP_505697. neuromedin-B receptor (NMB-R) (Neuromedin-B-preferring bombesin receptor.HC3 helps to understand evolutionary integrity at intra-genomic level (Figure 2.51 Cluster 4 In cluster-4. NC3 and NC4.6 A-B). NC5 clade includes two members of srj type of Str superfamily with a hypothetical protein. NC7. elegans and 8 human GPCRs were observed and subgrouping of human GPCRs into HC1. elegans receptor (NP_502893. HC1 includes human Gastrin-releasing peptide receptor (GRPR). NC8.2 (srv-14). refer: WBGene00005725) which illustrates the intergenomic association at cross-genome phylogeny (Figure 2. further providing functional relevance. srw and srh types and are distributed in NC1. NC10 includes candidates from srh type and an unannotated transmembrane protein. 32 candidate GPCRs from C. related to Lung carcinoma. Overall. respectively in a common fashion covering two members of the same receptor type. such as participating in glucocorticoid actions and blood pressure control. ET-A (ET1R). Apart from srv-14 observed in CC1. this cluster is an illustrative model in explaining the rich distribution of Str type of SR with hypothetical proteins to connect functional relevance from known receptor type. NC6 includes members of hypothetical proteins (Rhodopsin-like). CC1 refers to the coclustering of human GPCR (Q8TDVO) and a C.A).00E-10 in NC6 exhibit significant association. suggesting similar function. .2) and C35A5. clusters with a hypothetical protein observed in NM1.6. Two srh type receptors. NC11 retain srh type members uniquely.2 (NP_494987. two hypothetical proteins such C54A12. Non-selective type (ETBR) and Endothelin-1 receptor precursor.00E-13. G-protein-coupled receptor 37 and endothelin receptor type B-like (Villeneuve et al 2000) are noticed in HC3 carrying the same functional property. SS1 includes four srd members of Str superfamily. The neighboring clusters include srsx.2) at the significant E-values such as 2. 1. (Thomas et al 2009)and BRS3 (Bombesin receptor subtype-3 related to obesity and associated to diseases (Ohki-Hamazaki et al 1997) and HC2 retains functionally related Endothelin receptor. NC2 retains sri type receptor with a hypothetical protein.

quartet – puzzling steps of 10. Srg super family (blue) .0.1. Out-group is not shown in the figure. serpentine receptors of C. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. 52 .000 puzzling steps were done for maximum likelihood method. elegans like Sra super family (aqua). Human GPCRs are denoted in green color. typical membrane proteins (purple).Others/Solo type receptors (maroon). Str super family (fuchsia/pink). hypothetical trans membrane proteins (red) are also shown.6 (A and B) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 4): Phylogenetic trees were generated using TREE-PUZZLE 5.(a) (b) Figure 2. Generated newick tree files were colored by using MEGA 4.

NC7-NC10) retain mostly hypothetical proteins/receptors. Notably. wherein NC1 includes candidates of Sri. str and . C.1 (str-138) Neuromedin U receptor 2 of C.1-rhodopsin-like/hypothetical protein).2) also shares similar functional cues in calcium signaling pathway (KEGG PATH: Ko04020).7. wherein human thyrotropin-releasing hormone receptor (TRFR_HUMAN) coclusters with a GPCR from (NP_491990.although their functional equivalence is yet to be established (Table 2. Further.2). whereas rest of the neighboring clusters (NC3-NC5. elegans such Indeed. NC1 to NC10 are reported as neighboring clusters.org/wiki/Rhodopsin-like_receptors). elegans GPCRs and 8 human GPCRs (Figure 2. coclustering is expected in this case. NC8 clade branches with GNAT (GCN5related N-acetyltransuperfamilyerase (GNAT) family protein) at the bs of 59 with hypothetical protein member and NC9 is branching with a Spr-2 “Sex Peptide Receptor (Drosophila) Related family member (sprr-2)” (NP_510455. NP_505077. elegans is observed to retain orthologous relationship with human GPCR Q96AM5 . this cluster covers majorly of rho-like members (hypothetical proteins) and provides broader scope in connecting functional relevance with available 19 subfamilies (A1-A19) of GPCRs of higher organisms (http://en.53 Cluster 5 Cluster-5 carries 54 C. all the neighbor members in NM1 are also from hypothetical proteins. SS1 and SS2 include purely Str superfamily superfamily members (srj.wikipedia.A-B) and dispersion of human GPCRs in CC1 is notable for cross-genome clustering.2). since there is significant evolutionary similarity between the two proteins and the pair is termed as an “ortholog” (Table 2. Srh type from Str superfamily and uniformly the pure set of Srx and Srw candidates are observed in NC2 and NC6. Overall.

4 (NP_493666.54 srd) and notably SS3 retains species-specific olfactory receptor Odr-10 (Sengupta et al 1996) and branches with three other str type receptors. This superfamily helps to correlate the functional clues within candidate representation for olfaction in C. albeit at poor E-values and appearing as species-specific clade is encouraging suggesting that such cross-family connections can be recognized using significance of E-values and mode of clustering. Apart from orthologs.1 (NP_509515. .2) associated at significant E-values and can be explored for functional relevance with the respective receptor profile and are notably observed in CC1. elegans and also proves the fact that Str/Stl family carries the large group of genes with special functional relationship towards Odr-10.1) and K10B4. association of this sequence with class-A member by RPS-Blast. hypothetical proteins such as C48C5. Since knowing Odr-10 is meant for olfactory perspective.

Others/Solo type receptors (maroon). Str super family (fuchsia/pink). 55 . quartet – puzzling steps of 10. elegans like Sra super family (aqua). serpentine receptors of C. Human GPCRs are denoted in green color. Out-group is not shown in the figure. hypothetical trans membrane proteins (red) are also shown.1.000 puzzling steps were done for maximum likelihood method.7 (A and B) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 5): Phylogenetic trees were generated using TREE-PUZZLE 5.0.(a) (b) Figure 2. Srg super family (blue) . typical membrane proteins (purple). Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Generated newick tree files were colored by using MEGA 4.

NC1 – NC5 are denoted as neighboring clusters due to the occurrence of HC1 in between. V1BR. SS1 retains 2 srw receptors in a distinct fashion to represent species-specific clade. HC1 carries closely related V2R.A-B). vasopressin receptor. srh. V2BR -Vasopressin receptors (Thomas et al 2009) and five other human GPCRs and are dispersed along with neighbor members in NM1 along with the diverse type of C. six entries from Gnrr type. orexin receptor. The neighbor members of C. elegans majorly includes candidates from srw type. srsx. NC5 clades. gonadotrophin releasing hormone receptor (GNRHR. elegans GPCRs. on the other hand retain only srw. two entries of hypothetical proteins and a sre type receptor. elegans GPCRs. NC1-NC3. respectively. . srh type members. NC4 associates with a hypothetical protein and a gnrr (NP_491453.56 Cluster 6 Cholecystokinin receptor. Limited branching patterns is the critical feature in describing this phylogeny suggesting polyploidy in this cluster (Figure 2. GRHR) belongs to subfamily A6 are seen in cluster6. Cluster-6 establishes peculiar pattern of associating with eight hormone receptors of human with 42 C.8.1) and 27 entries are distributed as neighbor members in NM1 including five human peptide receptors. five candidate GPCRs from Str superfamily. neuropeptide FF receptor. This cluster retains a peculiar fashion of intermixing nematode gnrr (gonadotropin releasing hormone receptor) (Vadakkadath Meethal et al 2006) with human gonadotropin releasing hormone receptor (GRHR) further helps to correlate biological significance in reproductive endocrinology in model organisms.

quartet – puzzling steps of 10. Srg super family (blue) . Generated newick tree files were colored by using MEGA 4. 57 .0. typical membrane proteins (purple). hypothetical trans membrane proteins (red) are also shown.8 (A and B ) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 6): Phylogenetic trees were generated using TREE-PUZZLE 5.Others/Solo type receptors (maroon). serpentine receptors of C.(a) (b) Figure 2. Str super family (fuchsia/pink). elegans like Sra super family (aqua).000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side)displays.1. Human GPCRs are denoted in green color.

GPR39. fshr-1 shows high E-value significance with (2. where the human GPCRs are branched into HC1 and HC2 suggesting distinct members observed even within the humanGPCRs clusters. neurotensin receptor (cluster 5). elegans GPCRs are associated. . but also its functional importance in germline differentiation and survival in nematode taxon (Cho et al 2007).9. HC1 retains closely related human glycoprotein hormone receptors and HC2 branches with related human leucine-rich hormone receptors denoting the functional integrity observed at higher order organisms for these types of receptors. This cluster covers species-specific members in SS1 including sri and srh type members of Str superfamily.00E-59) LGR7 and 8 and observed in CC1 association. TRFR). GHSR and GPR39 (cluster 5). thyrotropin-releasing hormone receptor (TRHR. eight human and 34 C. elegans. This strong coclustering can be explained not only due to the orthologous relationship to mammalian follicle stimulating hormone receptor. The neighboring clusters NC2 and NC3 have candidate representation from srx. NC3-NC9 and following neighboring members are abundant and purely distributed with members from the largest superfamily of Str and highly duplicated srt members of C. GHSR. Among the 34 C. sre and srt type receptors. belongs to subfamily A7are observed in cluster 7. elegans GPCRs. elegans (fshr-1) (Figure 2.58 Cluster 7 Neuromedin U receptor (cluster 5). In cluster-7. whereas CC1 illustrates the coclustering of leucine-rich repeat-containing GPCR7 (LGR7) and insulin-like peptide 3 receptors (LGR8) (Daniel et al 2006) of human GPCRs with a homologue from C.A-B).

quartet – puzzling steps of 10. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Str super family (fuchsia/pink).000 puzzling steps were done for maximum likelihood method. 59 .Others/Solo type receptors (maroon).1.(a) (b) Figure 2.0. typical membrane proteins (purple).9 (A and B) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 7): Phylogenetic trees were generated using TREE-PUZZLE 5. Human GPCRs are denoted in green color. serpentine receptors of C. Out-group is not shown in the figure. elegans like Sra super family (aqua). Srg super family (blue) . hypothetical trans membrane proteins (red) are also shown. Generated newick tree files were colored by using MEGA 4.

two hypothetical protein members.5). cluster 8 includes huge number of srh type receptors from Str superfamily and the retention of this largest superfamily is observed at nematode specific-clades from SS1-SS7 (Figure 2. notably a hypothetical protein Y54E2A. HC1 retains neuropeptide receptors. CC1 includes human cholecystokinin (CCK) receptors.1 (C02B8. Overall. Apart from the ortholog npr-2. srxa type members from ‘Other’ type superfamily. HC2 also includes neuropeptide receptors related to wakefulness.2) associated at significant E-value of 2. important for gall bladder contraction and pancreatic enzyme secretion (Ulrich et al 1993) and is significantly branched with a nematode cholecystokinin receptor-type namely Ckr-2 (NP_001022842.60 Cluster 8 Cluster-8 carries eight human GPCRs and 34 C. elegans GPCRs like srw.A-B) suggesting a significant over-representation or amplification of species-specific members. NC1 is associated purely with the srd type receptors. elegans GPCRs. elegans NP_509368. Eight neighboring members of C. Deletion of the orexin gene in mice produces a condition similar to canine and human narcolepsy in vivo (Sikder and Kodadek 2007). NC2 includes a str candidate with a probable GPCR of C. sprr. Neighboring clusters are observed from NC1-NC3.1 (NP_497057. elegans.1) at the bs of 55. . NC3 carries neuropeptide receptors (npr-1.00E-45. The observed association further helps to analyse CCK receptors in two taxa for functional similarities and are illustrative of cross-genome association. npr-2) of C.10. a neuropeptide receptor (NK2R) and with other two human neuropeptide receptors suggesting a common functional relevance despite a heterogenous dispersion. food consumption and locomotion in humans.

Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.Others/Solo type receptors (maroon).000 puzzling steps were done for maximum likelihood method. Human GPCRs are denoted in green color.1. serpentine receptors of C.0. Str super family (fuchsia/pink). quartet – puzzling steps of 10. elegans like Sra super family (aqua). 61 .(a) (b) Figure 2. Out-group is not shown in the figure. typical membrane proteins (purple). Srg super family (blue) . hypothetical trans membrane proteins (red) are also shown.10 (A and B ): Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 8): Phylogenetic trees were generated using TREE-PUZZLE 5. Generated newick tree files were colored by using MEGA 4.

few are particular and most are related to potential GPCRs. . Four other human peptide receptors.11 A-B). suggesting that C. CAB03313. but also proves the fact that all the serpentine receptors are not necessarily non-GPCRs (Murphy and Tiffany 1991). An example study from this cluster association i. Among 12 neighboring clades (Figure 2. whereas NC1 has receptors from Srg superfamily. NC9. Particularly. NC3. Human GPCRs within this cluster are sub distributed into five clades (HC1-HC5) and also with C. srbc-64 and srbc-66 candidates (Figure 2. NC10 and NC12 are branched predominately distributed with Str superfamily members. elegans GPCR counterparts could be identified for such receptors. FMLR and GP44) of human origin belonging to the subfamily of A8 (http://en. observed as neighbor members as NM1 in tree topology. NC8.. elegans GPCRs and 14 human GPCRs and the distribution of C. elegans str type members. elegans and this illustrates the involvement of serpentine receptors not only in chemoperception. Interestingly. particularly NC2.org/wiki/Rhodopsin-like_receptors and (Rognan 2006) remain as neighbor members.wikipedia.3 at the E-value of 6. NC1–NC12 is observed for neighboring clusters. elegans Str superfamily receptors dominates along with srbc receptor population and equally intermixed human peptide receptors (Figure 2.00E-06 shows significant association.62 Cluster 9 Cluster-9 has 54 entries of C. Candidates from srbc type occur at NC7. SS1 carries distinct srj type receptors from Str superfamily with a hypothetical protein. NC11 includes receptors from srbc and srw families. NC5.e. the formyl peptide receptors / chemoattractant receptors (FML1. represent clear inter-genomic clustering.11 A-B) are responsible for pheromone activity in C.11 A-B). NC11 exclusively and in NC6 hypotetical protein has a counterpart with a typical GPCR (dct-12). FML2. NC4.

Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. quartet – puzzling steps of 10.1.000 puzzling steps were done for maximum likelihood method. typical membrane proteins (purple).0. 63 . Out-group is not shown in the figure. Srg super family (blue) . Human GPCRs are denoted in green color.11 (A and B ) :Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 9): Phylogenetic trees were generated using TREE-PUZZLE 5. serpentine receptors of C. Str super family (fuchsia/pink).(a) (b) Figure 2. elegans like Sra super family (aqua). Generated newick tree files were colored by using MEGA 4. hypothetical trans membrane proteins (red) are also shown.Others/Solo type receptors (maroon).

Various type receptors like sri.64 Cluster 10 Cluster 10 (Figure 2. elegans like Sra super family (aqua).A-B) includes eight entries of human GPCRs and 26 entries of C.000 puzzling steps were done for maximum likelihood method. while NC4 is with pure set of str members. srg. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. srd-2 (NP_496196. srt. elegans. srt are also associated in NM1 clade further to analyse the diverse properties of serpentine receptors in C.00E-05. . typical membrane proteins (purple). Out-group is not shown in the figure. quartet – puzzling steps of 10. Figure 2. Str super family (fuchsia/pink).12 (A and B) Cross-genome phylogeny of peptide receptors: (Rectangular Display and Radial Display Cross-genome phylogeny of peptide receptors (Cluster 10): Phylogenetic trees were generated using TREE-PUZZLE 5.12.Others/Solo type receptors (maroon). srd.0. srh. Notably.1) in NM1 exhibit significant association at the E-value of 5. Human GPCRs are denoted in green color. Generated newick tree files were colored by using MEGA 4. hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C.1. members. Srg super family (blue) .

thereby providing NC2-NC17 as neighboring (Figure 2. NC6. NC14.A-B) and exhibits sufficient polyploidy. the association provides NM1 with many more neighboring members like hypothetical proteins. NC15-17 includes majority of hypothetical proteins and NC3. Interestingly. each neighboring cluster retains its distinct identity with a branching pattern of carrying the same type of receptors belongs to the appropriate superfamily (Troemel et al 1995). Though all the entries are uniting at the root. srh.13. Notably. Thus. Tag-49 along with 10 related human peptide receptors. NC13. NC10. srw members. NC11. NC5. . NC2. NC8. typical GPCR members like GAL4.13. sra. elegans GPCRs (Figure 2. sre. NC1.A-B). notably two orthologous pairs are also present in this cluster (refer Table 2.2).65 Cluster 11 Cluster 11 retains 10 human chemokine peptide receptors and 62 C. Also. two peptide receptors of human (NY1R and NY4R) retained strong association and cluster together within the clade HC1. Appreciably. NC14 include members from the largest Str superfamily. sri. considerable number of hypothetical proteins are observed at the significant E-value thresholds in this cluster related to human peptide receptors.

Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C. Srg super family (blue) . quartet – puzzling steps of 10.000 puzzling steps were done for maximum likelihood method.0.13 (A and B ) :Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of peptide receptors (Cluster 11): Phylogenetic trees were generated using TREE-PUZZLE 5.1. Out-group is not shown in the figure. Human GPCRs are denoted in green color. Str super family (fuchsia/pink). elegans like Sra super family (aqua).(a) (b) Figure 2.Others/Solo type receptors (maroon). typical membrane proteins (purple). Generated newick tree files were colored by using MEGA 4. 66 .

which are functionally related to calcium storage in human cells (Teng et al 2006) are clustered together at the highest bs of 96 in HC1 clade (Figure 2. Neighboring clusters NC1 and NC2 also retain candidates from Str superfamily (srd.1) is associated at the significant E-value of 0. In Cluster-13. HC1 accommodates purely with 10 human GPCR entries denoting the evolutionary specificity of human chemokine receptors (Joost and Methner 2002).A-B).67 2. Likewise. elegans (NP_504623. hypothetical protein at the bs of 53 and is followed by diverse neighbor members like sri. NM1 includes 9 members of human chemokine receptors with a hypothetical protein from C.2) shows favourable association with human CMK receptors at the significant E-value 0. Especially.024 and observed in NC2 (Figure 2. HC3 associates with CCR5 and CCR3 at the bs value of 50 and both of these receptors are implicated in AIDS virology (Stefano Costanzi and Gershengorn 2006). Cluster 12 is associated with 16 entries of C. srbc and Col-40 in NM1. NC3 is dispersed with a Sra type. among the other serpentine type receptors. 15 human GPCRs are associated with 13 C.003 as neighbur members (NM1) in this cluster. wherein a srt member (NP_507069.14. HC2 and NM1. str-177 (NP_505383. Notably. among all human chemokine .1) at the significant E-value of 0. Human GPCRs are distributed in the HC1. IL-8A and IL-8B receptors.A-B).15.str) predominantly. Notably.86. HC2 comprises of receptor for adrenomedullin (ADMR) and Q8NE10. elegans GPCRs.A-B).14. elegans GPCRs and 10 human GPCRs (Figure 2.6.2 Result summary for Chemokine Receptors Cluster 12 and Cluster 13 Clusters 12 and 13 include candidates from chemokine receptors. SS1 clade retains purely nematode chemokine receptors of srh type members from Str superfamily.

. SS1 in cluster17. NC3 includes five members of Str type and a member from srv type receptors.3 Result summary for nucleotide and lipid receptors Clusters from 14-19 associate receptors belonging to nucleotide and lipid type (Joost P and Methner 2002) and are majorly activated by negatively charged ligands (Montero. 2. in the cluster association. NC6 in cluster-14. DUFF (the Duffy antigen) (Joost and Methner 2002) is distantly related receptor. three hypothetical proteins such as Y54G11B. NC12 in cluster-15.00E-06 and 0.1) and F13H6.68 receptor. 2.6. Notably.003. but this receptor is observed closer to a hypothetical protein in C. elegans. elegans-only GPCR clusters denote the abundance of this particular superfamily (Str) in the nematode genome.1 (NP_497004. NC2.1). SS1 of cluster-18 and Srd members in SS1 of cluster-19 explain the unique association of candidate GPCRs mainly from Str superfamily. SS2 and majority of ‘neighbor members’ in cluster -16. elegans GPCR entries in neighboring clusters. SS1-SS3. et al. SS1 indicates species-specificity with a member of srxa and a hypothetical protein.12 (NP_504730. all the species-specific clades observed from clusters 14 to 19 belong to Str superfamilySS1-SS8. Such C. 2005) . Notably. .5 (NP_504623. T15B7. SS1. These receptors also retain basic residues at the ligand-binding sites and show high sequence diversity owing to the binding of different ligands.00E-06. Neighboring cluster NC1 associates with srh members of Str superfamily and NC2 covers hypothetical protein. NC1 – NC3 illustrate the distribution of C. respectively.1) showed favourable association with human CMK receptors at the significant E-values such as 2.

Generated newick tree files were colored by using MEGA 4.14 (A and B ): Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of chemokine receptors: (Cluster 12): Phylogenetic trees were generated using TREE-PUZZLE 5.Others/Solo type receptors (maroon). serpentine receptors of C. 69 . Human GPCRs are denoted in green color.1. Srg super family (blue) . typical membrane proteins (purple). quartet – puzzling steps of 10. Out-group is not shown in the figure. elegans like Sra super family (aqua). hypothetical trans membrane proteins (red) are also shown.000 puzzling steps were done for maximum likelihood method. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Str super family (fuchsia/pink).(a) (b) Figure 2.0.

Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. serpentine receptors of C. 70 . hypothetical trans membrane proteins (red) are also shown. Human GPCRs are denoted in green color. Str super family (fuchsia/pink).15 (A and B): Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display) Cross-genome phylogeny of chemokine receptors: (Cluster 13): Phylogenetic trees were generated using TREE-PUZZLE 5. Generated newick tree files were colored by using MEGA 4. quartet – puzzling steps of 10. Out-group is not shown in the figure.0. typical membrane proteins (purple). elegans like Sra super family (aqua).(a) (b) Figure 2. Srg super family (blue) .000 puzzling steps were done for maximum likelihood method.Others/Solo type receptors (maroon).1.

HC1 retains evolutionarily-related human opsin members and four other opsin candidates are dispersed along with the diverse members of C. srg) (Figure 2. . sro-1 type receptor is associated at favourable E-value of 6. respectively wherein NC1 has members from both sre and srg families. NC6 clusters with majority of Str type members (srj and str families) and NM1 observed with srt members. srt. srd. srx. Notably.00E-16 and is observed in NM1. srbc. elegans GPCR members of srh and SS6-SS8 retain sri-type belonging to the large Str superfamily respectively. srw.16. Human GPCR members are distributed in HC1 clade and are also present in the neighboring members in NM1. elegans GPCRs. srx.71 Cluster 14 Cluster-14 retains seven human GPCRs and 64 C. SS1. demonstrating species-specificity. The neighboring clusters from NC2-NC5 establish clear composition of srx.SS5 clusters retain C. elegans (like srsx.A-B) suggesting the evolutionary conservation at the functional level across two taxa for these types of receptors.

Generated newick tree files were colored by using MEGA 4.0.16 (A and B ) :Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 14): Phylogenetic trees were generated using TREE-PUZZLE 5. quartet – puzzling steps of 10. Srg super family (blue) . elegans like Sra super family (aqua). typical membrane proteins (purple). Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Out-group is not shown in the figure.Others/Solo type receptors (maroon). 72 .000 puzzling steps were done for maximum likelihood method. hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C.(a) (b) Figure 2. Human GPCRs are denoted in green color.1. Str super family (fuchsia/pink).

NC7 has str type receptor with hypothetical proteins at the bs value of 62. NC9 and NC10 associate majorly with srw members. NC11 includes hypothetical proteins at the bs of 54. associates with a sra-type member with a hypothetical protein.00E-15. but observed along with neighbor members shows intergenomic clustering. The human GPCR from HC1 – HC4 do not cocluster with C.73 Cluster 15 Cluster-15 is associated with 18 human GPCRs and 58 C. elegans GPCRs. NC12 and NC13 have pure association of candidates from sri type and srw type respectively. NC8 includes two members of sri type from Str superfamily and a candidate of srt type from Srg superfamily. 9. wherein NC5 includes srj type receptor and a hypothetical protein. NC2-NC4 clades include association from Str superfamily (srh.00E-14 and 1. srm). elegans candidate GPCRs and 7 other human GPCR entries are dispersed in NM1 (Figure 2. three hypothetical proteins such as F53B7. 2006) as opposed to NC1-NC13 observed as neighboring clades.1) and K10C8. sri. NC6 at the bs of 70. Notably.00E-14 respectively. NC1 branches at the significant bs of 94 including srv type members of Srg superfamily. sri and str) denoting the species specificity (Robertson and Thomas.1 (NP_509685.2 (NP_505802.2 (NP_506168. Human GPCR members are distributed into HC1 . NC13 is followed by seven human nucleotide and lipid receptors counterpart with mostly of srw members of C.HC4 along with C. Notably. elegans GPCRs as neighboring members. ZC374.17.A-B).1). SS1-SS3 carries clear composition from Str superfamily (srj. elegans GPCR members. .1) exhibit favourable association with human N&L type receptors at the significant E-values such as 4.

Out-group is not shown in the figure. elegans like Sra super family (aqua). Generated newick tree files were colored by using MEGA 4. hypothetical trans membrane proteins (red) are also shown.000 puzzling steps were done for maximum likelihood method. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.17 (A and B ): Cross-genome phylogeny of nucleotide and lipid receptors(Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 15): Phylogenetic trees were generated using TREE-PUZZLE 5.1. serpentine receptors of C. Srg super family (blue) . quartet – puzzling steps of 10.0. typical membrane proteins (purple). 74 .(a) (b) Figure 2. Str super family (fuchsia/pink).Others/Solo type receptors (maroon). Human GPCRs are denoted in green color.

18. however. SS1 and SS2. represent the species-specific clades containing a majority of candidate receptors from Str superfamily (Figure 2. whereas the neighboring clades. NC1 and NC2 retain pure set of srd GPCRs and a mixture of sra and srxa type. 1998).1) is associated at the significant E-value of 2. all the human GPCR members are associated in HC1 clade at the bs of 76 (http://en.75 Cluster 16 In cluster-16. srx-86 (NP_001023896. respectively. More C.00E-04 showing favorable association with human N&L type receptors and with a hypothetical protein T21622.wikipedia. srx. . Notably.org/wiki/Rhodopsin-like_receptors).A-B). hypothetical proteins are distributed within cluster-16 as neighbor members (NM1) (Robertson. srh. elegans GPCRs such as sri.

elegans like Sra super family (aqua). Generated newick tree files were colored by using MEGA 4.18 (A and B ): Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 16): Phylogenetic trees were generated using TREE-PUZZLE 5. typical membrane proteins (purple). Str super family (fuchsia/pink). serpentine receptors of C. hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.Others/Solo type receptors (maroon). Human GPCRs are denoted in green color.1. 76 . quartet – puzzling steps of 10. Srg super family (blue) .(a) (b) Figure 2.0.000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure.

2) and srsx-22 (NP_505153. and rest of receptors from Srg superfamily got associated. and HC5 carries Sphingosine 1-phosphate receptors.1) and NC4 comprises two hypothetical proteins.6 is observed in NM1. This indicates that CB1-like cannabinoid receptors may have evolved after the divergence of deuterostomes (Elphick and Egertova 2001) and (McCarroll et al 2005) Lysophospholipid receptor got clustered in HC2 clade.A-B). . The neighboring clusters were annotated from NC1-NC8. NC5 clade is of str type with bs of 70. GP12. Interestingly. but the genomes of the protostomian invertebrate like C. The human GPCR members are dispersed into HC1-HC5. NC6 is of sri type (20) with 65 bs and 0. NC2 clade retains srh type 23. NC7 is with Srh type. Overall. GPR3. Melanocortin receptor observed in HC4 clade. whereas ZK418. SS1 retains receptors of Srj type and a hypothetical protein and stay distant from the root.2) at the significant E-values such as 8.00E-08. CB2R) at the best bs of 93. F10D7. srsx22 are observed in the NC8. 5. Interestingly. cluster-17 covers more number (24 entries) of serpentine types (srh.1 (NP_510813.11 (Table 2. NC8 includes a hypothetical protein with a Srsx (Other type) receptor. srg. HC1 comprises of cannabinoid receptors(CB1R. two hypothetical proteins ZK418.2). elegans GPCRs (Figure 2. GPR6 are associated in HC3. Many more neighbors with diverse receptor type are observed in NM1. And F10D7. pure set of clustering belongs to the same family has been observed from NC1-NC7 in this cluster helps to explain the strong intra-genomic retention.77 Cluster 17 Cluster-17 carries 18 human GPCRs and 30 C.1.00E-08 shows favourable associations with human N&L receptor type.1). sri. srj. 66].00E-10 and 6.6 (NP_498545. str) which falls under Str superfamily. Srv type members associate in NC1.498 E-value. elegans do not contain CB1R nor FAAH orthologs (Table 2. NC3 retains pure Srj type members (Robertson 1998) at an average E-value of 1.19.

quartet – puzzling steps of 10. 78 .000 puzzling steps were done for maximum likelihood method. Str super family (fuchsia/pink). serpentine receptors of C. Srg super family (blue) . Out-group is not shown in the figure.1. Human GPCRs are denoted in green color. typical membrane proteins (purple). Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.Others/Solo type receptors (maroon). elegans like Sra super family (aqua).19 (A and B) :Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 17): Phylogenetic trees were generated using TREE-PUZZLE 5. hypothetical trans membrane proteins (red) are also shown.0. Generated newick tree files were colored by using MEGA 4.(a) (b) Figure 2.

prostacyclins and thromboxanes. where NC1 branched at the best bs of 93 with Srt type receptor and hypothetical protein.2 (T07F8. prostacyclins and thromboxanes receptors cluster together with a bs of 74 and observed in HC1.00E-15 shows significant association with the human N&L type receptors and observed in NC. elegans GPCR entries (Figure 2. Particularly. Prostanoids function close to the site of synthesis. Many str type receptors. a hypothetical protein AAB38097. such as inflammation. NC3 and NC4 also associate with pure set of receptors like srbc. but they are also implicated in a number of pathological conditions.178. In SS1.2) associated at the E-value of 1. A srxa receptor is observed in NM1. and parturition. elegans GPCRs are grouped into NC1-NC5. NC2 retains pure set of receptors from Sra superfamily. and they are deactivated before they are exported into the circulation as inactive metabolites. with a hypothetical protein and a srt type receptor. renal physiology. which are related to the receptors binding to prostaglandins.79 Cluster 18 Cluster-18 consists of 8 Human and 28 C. are associated at NC5.A-B). .20. gestation. cardiovascular disease and cancer The prostaglandins. receptors from largest Str superfamily with a hypothetical protein are branched with an average E-value of 1. Prostanoids have essential homeostatic functions in the cytoprotection of gastric mucosa. The C. These receptors bind to ligands which are derivatives of arachidonic acid (AA) and serves as the precursor via the cyclooxygenase (COX) pathway. srw from ‘others’ superfamily.

Out-group is not shown in the figure.Others/Solo type receptors (maroon). Generated newick tree files were colored by using MEGA 4.20 (A and B) :Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 18): Phylogenetic trees were generated using TREE-PUZZLE 5. hypothetical trans membrane proteins (red) are also shown.000 puzzling steps were done for maximum likelihood method.0. 80 . elegans like Sra super family (aqua). Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Human GPCRs are denoted in green color. Srg super family (blue) . Str super family (fuchsia/pink). typical membrane proteins (purple).1. serpentine receptors of C. quartet – puzzling steps of 10.(a) (b) Figure 2.

adenosine and histamine. lysophosphatidylcholine and sphingosylphosphorylcholine. Human GPCR clades contain nucleotide and lipid receptor of protease-activated receptors (PAR). However. srw. NC20. srd belong to Str superfamily and are observed at NC1-11. NC21-NC22 along with hypothetical proteins and few receptors from ‘Others type’. sri. NC23. elegans GPCRs are associated (Table 2. histamines. dopamine.6.22 A-B to 2.26 A-B and This suggests biogenic amine receptors have ancient evolutionary origin. NC18-NC19. psychosine receptors. and hypothetical proteins) suggesting intergenomic association at NM1. srbc type receptors. However. srxa. NC13-14. NC15 17. notably all the other neighboring clades like NC12. This cluster is highly populated with GPCRs from Str superfamily and ‘Others type’ receptors.21A-B) in which human GPCRs are distributed into HC1-HC3. this cluster is aggregating majorly of Str and ‘Others type’ superfamily receptors (Robertson and Thomas 2006) which provide information in connecting these receptor types with biogenic amine receptors of humans. str. Predominantly. Overall. srh. elegans receptors was observed (Figures 2. 18 human GPCRs and 93 C. srv. as they are observed in invertebrates to higher vertebrates.81 Cluster 19 In this cluster. srj. octopamine and adrenaline receptors Intermixing of human and C. receptors like srh.1 and Figure 2. . serotonin receptors. elegans (like str. around 12 human GPCRs intermix with diverse types of receptors from C. 2. NC1-NC24 are denoted as neighboring clades.4 Result summary for biogenic amine receptors Biogenic amine receptors are distributed into five clusters (cluster20 to cluster-24) mainly consisting of trace amine. NC24 associate with pure set of srw. muscarinic acetylcholine. melatonin.

Str super family (fuchsia/pink). Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Human GPCRs are denoted in green color. elegans like Sra super family (aqua). typical membrane proteins (purple). Srg super family (blue) . Generated newick tree files were colored by using MEGA 4. Out-group is not shown in the figure.Others/Solo type receptors (maroon).0.(a) (b) Figure 2. 82 .000 puzzling steps were done for maximum likelihood method. quartet – puzzling steps of 10. hypothetical trans membrane proteins (red) are also shown.1.21 (A and B) :Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 19): Phylogenetic trees were generated using TREE-PUZZLE 5. serpentine receptors of C.

Q96RI8_Hum. further proposing the analysis of the functional relevance with human N&L type receptors. NC1 carries pure set of srh type receptors of Str superfamily and a srv type receptor of Srg superfamily observed in NM1. O14804_Hum. . Davenport 2003)) and are potentially druggable targets (Foord et al 2002).23 stay quite distant from the root and has been annotated as a species-specific clade SS1. srj) at an average E-value of 4.00E-04.00E-14. SS1 and NM1. Similarly.A-B) and eight GPCR entries of C. srh. The eight human GPCR entries get branched with very good bs value of 90 at HC1 (Figure 2.83 Cluster 20 Cluster 20 is represented mainly by trace amine (TA) receptors. other hypothetical protein in the dataset namely C24A8.22. Notably. apart from srv-33 associated at 7. a putative neurotransmitter receptor (PNR) is closely related to trace amine (Q969N4_Hum.6 is associated at significant E-value 1. Q9P1P5_Hum (GPR58) and Q9P1P4_Hum (GPR57) are closely related to Q96RJ0_Hum (TA1). elegans GPCRs are distributed as NC1. and Q96RI9_Hum) receptors. They form a subfamily of GPCRs related to Norepinephrine (NE). Receptors of Str superfamily (str. serotonin (5-HT). Trace amines and their receptors may therefore be useful in treating various neurological and psychiatric disorders (Berry 2007. and dopamine (DA) receptors.

Out-group is not shown in the figure. Str super family (fuchsia/pink). Human GPCRs are denoted in green color.22 (A and B): Cross-genome phylogeny of biogenic amine receptor receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 20): Phylogenetic trees were generated using TREE-PUZZLE 5. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. Generated newick tree files were colored by using MEGA 4.1. elegans like Sra super family (aqua). hypothetical trans membrane proteins (red) are also shown. quartet – puzzling steps of 10.0. typical membrane proteins (purple). 84 .000 puzzling steps were done for maximum likelihood method.Others/Solo type receptors (maroon).(a) (b) Figure 2. serpentine receptors of C. Srg super family (blue) .

85

Cluster 21 Cluster-21 consists of four human and 49 C. elegans GPCRs (Figure 2.23 A-B). Human GPCRs are distributed into HC1 and CC1. HC1 retains GPCRs belonging to family of melatonin receptors and a candidate GPCR of C. elegans NP_493667 (srd 3 type) has been observed in associating with a human GPCR (Q9NQS5_Hum) at a significant E-value of 0.01 in CC1 to represent intergenomic association. The other C. elegans GPCRs get associated into multiple clades (NC1-NC12). Interestingly, NC1, NC4-NC6, NC8-NC10 clades are predominantly associated with receptors of Srg superfamily, whereas NC2, NC7, NC11, NC12 clades carry pure distribution of receptors from largest Str superfamily. Abundant srx type receptors of Srg superfamily are observed with srbc type receptors and a hypothetical protein at NM1. Overall, despite their association with human biogenic amine receptors, there is a clear branching into various members that belong to Srg and Str superfamilies. Also notably, candidate receptors from Srg superfamily is associated predominately in this cluster. Receptors Srx-50 and srx-60 exhibit significant association at the E- values 2.00E-10 and 1.00E-09 in NM1 and NC10, respectively, to study further to connect functional relevance with human N&L type receptors.

(a)

(b)

Figure 2.23 (A and B) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of biogenic amine receptor (Clusters 21): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet – puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.

86

87

Cluster 22 Cluster-22 consists of 9 human and 29 C. elegans GPCR entries (Figure 2.24.A-B). Human 5-HT1 receptor class comprises of five different receptors, which share 41 to 66% overall sequence identity within themselves and observed in HC1 and HC2. Human GPCRs are distributed into HC1, HC2 and CC1, CC2, NM1 which represent the intermixing of receptors between two taxa. In NM1, 5HT1A of human has an orthologous relationship with NP_497452.1 (ser 4) of C. elegans at 1.0e-71. Dopamine 4 receptor of human and tyra-2 of C. elegans also got clustered together as ortholog pairs. C. elegans entry, NP_506052.1 (srj type) associates with Q8TDV2 in CC1 at best percentage identity among the rest all other members as 15.0 at an Evalue of 0.15 whereas, NP_001024728.1 (ser 1) coclusters with Q16538 in CC2 and retains the percentage identity as 11.4 at a significant E-value of 1.0e-50. Neighboring clades of NC1-NC7 are present and mostly retain pure set of receptor at intra-genomic level, wherein receptors from Srg (in NC1), Str (in NC2, NC3), Others (in NC4), Str (in NC5, NC6, NC7) superfamily are observed. Interestingly, the receptor namely tyra-2 in NM1 exhibit significant association at the favorable E-value of 5.00E-69 for further investigation to connect function commonality.

(a)

(b)

Figure 2.24 (A and B): Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of biogenic amine receptor (Cluster 22): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet – puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.

88

89

Cluster 23 This cluster contains 22 human GPCRs and 77 C. elegans GPCR entries (Figure 2.25 A-B). G-protein-linked Acetylcholine Receptor family members, like muscarinic acetylcholine, adenosine, histamine and many orphan receptors of human GPCRs, are all clustered in different clades (HC1-HC7) in cluster 23. Particularly, G-protein-linked Acetylcholine receptor family members, gar-1, gar-2 are observed together as a neighboring cluster (NC11) and gar-3 of C. elegans got associated as neighboring

member (NM1) with human GPCR clades. NP_001024236.1 (gar-3) has 37.9% of identity with ACM3 human GPCR and retain orthologous relationship and a hypothetical protein NP_001040810.1, retains 7.8% identity with a human GPCR (AA3R) is another notable ortholog, although their functional equivalence is yet to be established. NC1 is from Sre type belonging to the Srg superfamily, while NC2 associates only hypothetical proteins. Pure set of Srw receptors (Solo Type) are observed in NC3 and NC9, whereas NC4 has receptors of sra and srt types. Notably, pure set of srsx type (Solo Type) receptors are associated in NC5 and NC7. NC6 also retains only Srbc type (Solo Type). NC8 includes majority of srj type and a hypothetical protein. Srg receptors are at NC11 and two hypothetical receptors are at NC12, NC13. NC15 associates with srj, str members of Str superfamily. Hypothetical proteins, lung even transmembrane receptor and Srx, Srv type receptors are observed at NM2. The species-specific clade (SS1) retains purely srh type receptors of Str superfamily. Overall, cluster-23 has members from ‘others’ / (solo) superfamily as well. Despite the high sequence divergence within human biogenic amine receptors (with seven clades), the

90

fact that many of the C. elegans GPCRs associated with this cluster do not cocluster with the human GPCRs suggests species-specific requirements and lineages of these receptors. Interestingly, this study brings together several uncharacterized and hypothetical proteins to this cluster of biogenic amine receptors. Also notably, gar-3 (NP_001024236.1) at the lowest E-value 4.00E-51 shows highest functional significance in the association.

91 .(a) (b) Figure 2.1.Others/Solo type receptors (maroon). quartet – puzzling steps of 10. Str super family (fuchsia/pink). Srg super family (blue) . typical membrane proteins (purple).0.000 puzzling steps were done for maximum likelihood method. hypothetical trans membrane proteins (red) are also shown. Generated newick tree files were colored by using MEGA 4. Out-group is not shown in the figure. Human GPCRs are denoted in green color. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. elegans like Sra super family (aqua).25 (A and B): Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of biogenic amine receptor (Cluster 23): Phylogenetic trees were generated using TREE-PUZZLE 5. serpentine receptors of C.

serotonin 2. elegans.26. This cluster contains 24 GPCRs from human and 68 GPCRs from C. Tag24. NC1-NC13 and NM1 clades.1 (dop-2) has 33. octopamine and adrenaline). serotonin 7. elegans are observed at the significant E-values and proposing to investigate for functional similarities. few serotonergic receptors and many orphan receptors are associated in cluster 24. 9 and 10 clades are predominately associated with hypothetical proteins.92 Cluster 24 Receptors of biogenic amines (dopamine. srv types from Str superfamily and sre. 11 and 13) retain pure set of GPCR members from Str superfamily. elegans GPCRs are noted within clades CC1. neighboring clades (NC1-NC3. Two pairs of ortholog sets are observed in NM1.0e-69 and NP_001024047. .2% of identity with human D3DR and got associated at 2. 8. 7. like tyra-3. Many neighbor members were observed with mixed distribution (receptors belong to srd. Ser-2 and so on from C.1% of identity with Q8NGU3 and got associated at 1.1 (dop-1) has 36. typical candidate GPCRs like Dop-1. NP_001024579. histamine. Overall. where the percentage identity ranges from 20 -37%. Many biogenic amine receptors of C.A-B). sra of Sra superfamily). Human BGA receptors are distributed in HC1-HC3 clades and C. Notably. serotonin 5.00e-49. whereas the NC4. 5. elegans (Figure 2. dopamine 1 and dopamine 2 branch near the human biogenic amine receptors.

elegans like Sra super family (aqua). Str super family (fuchsia/pink). hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C.1.(a) (b) Figure 2.0.26 (A and B) : Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of biogenic amine receptor (Cluster 24): Phylogenetic trees were generated using TREE-PUZZLE 5. quartet – puzzling steps of 10.Others/Solo type receptors (maroon).000 puzzling steps were done for maximum likelihood method. Generated newick tree files were colored by using MEGA 4. 93 . Srg super family (blue) . typical membrane proteins (purple). Out-group is not shown in the figure. Human GPCRs are denoted in green color. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.

elegans GPCRs associated with this cluster are distributed as NC1-NC7 (Figure 2. secretin.1 (pdfr-1) were found as ortholog for CALR_HUMAN (28. urocortins.94 2.6. C. is very closely related to PTRR_Hum receptor (that binds to parathyroid hormone and parathyroid hormone-related protein (PTHrP). NP_510496. glucagon-like peptides (GLP-1. calcitonin. corticotropin-releasing factor (CRF). glucagon. interact with these calcitonin receptors and can generate pharmacologically distinct receptors. GLP-2) and glucose-dependent insulinotropic polypeptide (GIP) and are related to calcitonin (CALR_Hum) and calcitonin gene-related peptide type 1 receptors (CGRR_Hum).1 (Secretin receptor family) retains good percentage .27 A-B).002. The human GPCRs within this cluster diversify and in CC1 GRFR_Hum coclusters with secretin type receptor (NP_498978. vasoactive intestinal peptide (VIP). NM1 and SS1 clades. The NC1 clade has sre and srw receptor types at good bs value of 100 which get associated at average E-value of . The Secretin receptor family of C. growth-hormonereleasing hormone (GHRH).1 (Secretin receptor family) and NP_001021172. elegans got association with this cluster. elegans. Small accessory proteins (Receptor activity-modifying proteins (RAMPs). elegans GPCR entries in which human GPCRs recognise structurally related ligands like polypeptide hormones of 27–141 amino-acid residues (pituitary adenylate cyclase-activating polypeptide (PACAP).5 % identity) and CRF2_HUMAN (30. Cluster 25 consists of 17 human and 31 C. In NC1 NP_498978. Human orphan receptor. Q8NHB4_Hum.1) from C.5 Result Summary for Class B (Secretin) Receptors Cluster 25 Class B receptors are represented by two clusters (25 and 26) consisting of classical hormone receptors from human and Drosophila methuselah (MTH) like proteins.4 % identity) at very significant E-value.

NC3-NC8 are associated with candidate GPCRs from of srh.3a (NP_498978. These associations provide examples where coclustering of orthologs may not happen suggesting functional equivalence even during large sequence variations in this cluster.95 identity (28. SS1 retains srh and str type receptors of Str superfamily.2) are associated at the significant E-values such as 5.1) and ZK643.00E-43 and 5. srd. Str superfamily members are found more in this cluster.2 (NP_510496. C18B12.00E-30 to the human Secretin type GPCR members and the associations can be further explored for the functional relevance with the respected GPCR profile. str type of Str superfamily.00E-40. .4 (NP_001021172.1).1) with a human GPCR VIPR_HUMAN. 5. In general. C13B9. which are distantly related to the human GPCR clades of this cluster and many more neighbor member from same Str superfamily has been observed in NM1.

quartet – puzzling steps of 10. Generated newick tree files were colored by using MEGA 4.Others/Solo type receptors (maroon). typical membrane proteins (purple). hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C. Out-group is not shown in the figure. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.1. Human GPCRs are denoted in green color. 96 . elegans like Sra super family (aqua). Str super family (fuchsia/pink).0. Srg super family (blue) .000 puzzling steps were done for maximum likelihood method.(a) (b) Figure 2.27 (A and B ): Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of secretin type receptors (Cluster 25): Phylogenetic trees were generated using TREE-PUZZLE 5.

Notably. elegans have been associated. SS1 also retains candidate receptors from Str superfamily. . NM2. with the Drosophila GPCR members.97 Cluster 26 This particular cluster is retained only with Drosophila GPCR members (Methods) and due to RPS-blast runs. Methuselah receptors and its paralogs of Drosophila solely represent cluster 26. and SS1 (Figure 2. respectively. As mentioned earlier. and oxidative damage. elegans GPCRs are distributed in NC1. the predominant occurrence of Str superfamily denotes the abundant availability of str type candidate receptors in nematode genome and is reflects the species-specific retention and with the limited coclustering while performing cross-genome phylogeny. heat. Interestingly. NM1. NC2 retains entirely of srh type receptors from Str superfamily at 69 bs value and many neighboring members also from Str superfamily with two hypothetical proteins. and the association can be further explored for the functional relevance with related fly GPCRs from this cluster. NC2. Drosophila GPCRs branch into two clades denoted as DC1 and DC2.38.1) are associated at the significant E-values of 0. The Drosophila mutant methuselah (MTH) was identified from a screen for single gene mutations that extended average lifespan of an organism and also increased resistance to several forms of stress.3) and srh-250 (NP_494681. three hypothetical proteins namely. all the clades retain receptors from Str superfamily. 20 candidate GPCR members from C.011 and 0. whereas C. including starvation. sri-20 (NP_505665.28 A-B).

Out-group is not shown in the figure. quartet – puzzling steps of 10. serpentine receptors of C. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. elegans like Sra super family (aqua). hypothetical trans membrane proteins (red) are also shown. Srg super family (blue) .Others/Solo type receptors (maroon).0. typical membrane proteins (purple).(a) (b) Figure 2. Str super family (fuchsia/pink). Human GPCRs are denoted in green color.000 puzzling steps were done for maximum likelihood method.1.28 (A and B): Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display) Cross-genome phylogeny of secretin type receptor (Cluster 26): Phylogenetic trees were generated using TREE-PUZZLE 5. 98 . Generated newick tree files were colored by using MEGA 4.

coclusters with NC1 clade . NM1 and SS1. CC1.1 (lat-1). elegans GPCRs are branched distantly from the root and has been observed in NC1 with mostly srh. albeit observed in SS1. sri from Str superfamily.00E-71 provide clues to connect further for the functional relevance with respective human GPCR profile from the cluster. et al. . has an orthologous relationship with the C. Several of these receptors from human have functional domains such as epidermal growth factor (EGF).29. elegans GPCR with cell adhesion receptors. characterised by a long extracellular N-terminus and GPCR proteolytic site (GPS) domain. they branch into several distantly related clades together with C. are represented in cluster 27.1) and lat-2 (NP_001040724.6 Result summary for cell adhesion receptors Cluster 27 This cluster contains 29 human GPCRs and 17 C. Most of the human GPCRs from this cluster are orphans with no known ligands (Hideo Taniura. lat-2 gene has significant sequence similarity with a paralog . The rest of C. leucine rich repeat (LRR). Also receptors such as lat-1 (NP_495894.99 2. belonging to cluster 27.A-B). elegans gene product NP_001040724. The clear underrepresentation of C. 2006) The Q9HAR2 (LEC3 Lectomedin-3) in NM1. elegans GPCRs (Figure 2. str..1) are associated at the E-values such as 5.00E-68 and 7. E-value of 7e-71).This could be due to the large size of Q8WXG9 (6307 amino acids) and can even be viewed as an outlier in this cluster.6. Large number of GPCRs belonging to Cell adhesion receptors. elegans members and are denoted as HC1 and HC2.NP_495894. Hence. is noteworthy. hormone-binding domain (HBD) and immunoglobulin (Ig) domains. lat-1 associates with human Q9HAR2 at an E-value of 5e-68.1 (lat-2. Even though one of the human GPCR (Q8WXG9) .

1. quartet – puzzling steps of 10. Human GPCRs are denoted in green color. Out-group is not shown in the figure. serpentine receptors of C. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.29 (A and B ): Cross-genome phylogeny of cell adhesion type receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of cell adhesion type receptor (Cluster 27): Phylogenetic trees were generated using TREE-PUZZLE 5. hypothetical trans membrane proteins (red) are also shown. 100 . typical membrane proteins (purple).000 puzzling steps were done for maximum likelihood method.Others/Solo type receptors (maroon). Str super family (fuchsia/pink).(a) (b) Figure 2.0. elegans like Sra super family (aqua). Generated newick tree files were colored by using MEGA 4. Srg super family (blue) .

mgl-2 (NP_492720.30 A-B).1) at E-value 8.1) has 48. NETKFIGFT are well-conserved amongst these members (alignment data not shown).00E-87. provide clues to connect for the functional relevance with human CARs and are notably observed at CC1. The primary structure and pharmacology of mGluRs are evolutionarily wellconserved in Drosophila. The human metabotropic glutamate receptors mGluR1.1) at E-value 3. .2) at E-value 5. elegans GPCRs and retains majorly with human metabotropic glutamate receptors (MGRs).1. elegans had been observed as an ortholog of Q8NFS4 and has 51. mgl-2). Cluster 28 Cluster 28 associates with eight Human and 20 C. calcium-sensing receptors (CASR) and retinoic acid-inducible G-protein-coupled receptors (RAIG) are available as in our previous work. Notably mgl-1 (CAM33507.6. especially from Str superfamily with the glutamate receptor of humans do exist and could be identified by our sequence analysis.00e-70 and mgl-3 (NP_741400. 2000). mGluR4. mgl-2 (NP_492720. C. MYTTCIIWLAF.1).101 2. (G/S)RE.9 % sequence identity with mGluR2. mGluR5. Metabotropic glutamate receptors (MGR).7 Result summary for class C (glutamate) receptors Receptors of Class C are divided mainly into four clusters: clusters 28 to 31. Also. mGluR6. mgl-1 of C.00E-85 and 5. Specific sequence patterns like SGREL(S/C)Y.00e-85 has got associated in this cluster (Figure 2. Q8NFS4 (Metabotropic glutamate receptor 7 variant 3) form a clade and thus inferred as CC1 with the presence of metabotropic glutamate receptor of C.2) has 46. elegans GPCRs. hypothetical receptor CAM33507. TKT. (mgl-2) (NP_492720. and higher mammals (Adams.00e-87. and mgl-3 (NP_741400. mGluR2. Str type receptors from Str superfamily. elegans. γ-aminobutryic acid (GABA) receptors. This cluster is illustrative to explain that related C.4 % sequence identity with mGluR3 suggesting similar ortholog pairs exist. mGluR3. elegans (mgl-1. Interestingly. 3. Srh. mGluR8.3% sequence identity with mGluR5 and the mgl-3 (NP_741400.2) at the significant E-values such as 8.00E-70 respectively. NC1-NC6 include Srd.

Generated newick tree files were colored by using MEGA 4. Out-group is not shown in the figure. typical membrane proteins (purple). elegans like Sra super family (aqua). Human GPCRs are denoted in green color. Str super family (fuchsia/pink). serpentine receptors of C. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.000 puzzling steps were done for maximum likelihood method.30 (A and B ): Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of glutamate receptor (Cluster 28 ): Phylogenetic trees were generated using TREE-PUZZLE 5.0.1. 102 .Others/Solo type receptors (maroon). Srg super family (blue) .(a) (b) Figure 2. quartet – puzzling steps of 10. hypothetical trans membrane proteins (red) are also shown.

00E-26) and srh-275 (NP_504876.A-B).31.1) at the favourable E-value (0. elegans GPCRs (Figure 2. .1) associated at the significant E-value (2. Srxa type GPCRs at NC3 and two other Srh type receptors observed at NM1. 8NGW9 and Q8NGZ7) form a clade with a C. at 23.1). the current clustering approach suggests that NP_501400. Q8NGV9. a hypothetical protein namely F35H10. Thus. Interestingly.1 may be a putative ortholog of this extracellular calcium-sensing receptor precursor/Parathyroid Cell calcium-sensing receptor. NC1 and NC2 clades retain only candidate GPCRs from Str superfamily.103 Cluster 29 Human calcium-sensing calcium-sensing receptor receptor (CASR_Hum-Extracellular Cell calcium-sensing precursor/Parathyroid receptor) forms cluster-29 along with a set of five orphan receptors and 14 C. A sweettaste receptor of 3GCPR / PBP1_GPCR_family_C_ like receptor is associated at an E-value of 2. Human Calcium-sensing receptor CASR and the orphan receptors (Q8NHZ9.075) in SS1 are worth to analyse further for possible functional relevance with human glutamate type receptors.5 %identity.10 (NP_501400. elegans GPCR (NP_501400.00 E-26 and got branched with Q8NGV9. in contrast to Sre.

quartet – puzzling steps of 10.1.0. hypothetical trans membrane proteins (red) are also shown. serpentine receptors of C. Generated newick tree files were colored by using MEGA 4.(a) (b) Figure 2.31 (A and B ): Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of glutamate receptor (Cluster 29): Phylogenetic trees were generated using TREE-PUZZLE 5. Human GPCRs are denoted in green color.Others/Solo type receptors (maroon). Out-group is not shown in the figure. Str super family (fuchsia/pink). 104 . Srg super family (blue) . Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. elegans like Sra super family (aqua).000 puzzling steps were done for maximum likelihood method. typical membrane proteins (purple).

Human GPCR clade (HC1) contains the human retinoic acid induced GPCRs and orphan GPCR members. (a) Figure 2. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. hypothetical trans membrane proteins (red) are also shown.32. NM1 and NM2 clades also retain receptors of Str superfamily. NM2 and SS1. C. typical membrane proteins (purple). elegans like Sra super family (aqua).Others/Solo type receptors (maroon). Overall.32 (A and B) Cross-genome phylogeny (Rectangular Display & Radial Display) of (b) glutamate receptor Cross-genome phylogeny of glutamate receptor (Cluster 30): Phylogenetic trees were generated using TREE-PUZZLE 5. elegans GPCRs form different clades as SS1. NC1. C. Srg super family (blue) . serpentine receptors of C. quartet – puzzling steps of 10. NC1. NC2. respectively. NM1.003). elegans receptors from Str superfamily belonging to srj.1. elegans GPCRs associated with cluster-30 are from Str superfamily receptors.105 Cluster 30 Cluster-30 comprises of four Human GPCRs and 23 C. . str-123 (NP_510135. A-B). Human GPCRs are denoted in green color. Out-group is not shown in the figure. elegans GPCRs (Figure 2. srh and str types. NC2 and SS1 clades include C.000 puzzling steps were done for maximum likelihood method. Generated newick tree files were colored by using MEGA 4. Str super family (fuchsia/pink).1) in NC3 is observed at the lowest E-value (0.0.

10).1 (NP_500579.33 A-B).1) are associated at significant E-values and can be used further to relate functional commonality with human glutamate type receptors. The four human GABAB receptors branches with a good bs value of 80 and form the HC1 clade. respectively and SS1 is associated with Str superfamily. elegans GPCRs (Figure 2.106 Cluster 31 Cluster 31 has four human and eight C. The NP_741740. The hypothetical proteins such as T32170 (C31B8. The GABAB receptors are present in this cluster.2 also gets associated with GBR1_human at a highly significant E-value of 4. NC1 and NM1 clades include receptors from Str and Sra superfamily.00E-48 and the NP_493575. GABAa receptors are members of the ionotropic receptor superfamily which includes alpha-adrenergic and glycine receptors. ZK180.2) and Y41G9A.00E-09. HC1.1 (gbb-1) receptor was picked as ortholog to GBR1_human by reverse blast best hit procedure with 1.4b (NP_741740. Additionally. Both of these C. . elegans GPCRs get branched in the human clade.

0. 107 .33 (A and B): Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of glutamate receptor (Cluster 31): Phylogenetic trees were generated using TREE-PUZZLE 5.1. Srg super family (blue) . quartet – puzzling steps of 10. serpentine receptors of C. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. elegans like Sra super family (aqua). Str super family (fuchsia/pink). Generated newick tree files were colored by using MEGA 4. Out-group is not shown in the figure.Others/Solo type receptors (maroon). Human GPCRs are denoted in green color.000 puzzling steps were done for maximum likelihood method. typical membrane proteins (purple). hypothetical trans membrane proteins (red) are also shown.(a) (b) Figure 2.

8 Result summary for frizzed/smoothened receptors Cluster 32 Cluster-32 comprises of receptors with similar domain architectures: a 200-residue long N-terminal domain (which contains the predicted orthosteric ligand binding site). The neighboring clusters were annotated from NC1 –NC8. and NP_503964. the cysteine-rich domain (CRD domain. elegans entries are observed in SS1 clade.5% identity with human GPCR sequence. two hypothetical proteins and receptors of Str superfamily are observed in NM1. mom-5 (NP_492635.2) are associated with human Frizzed/smoothened type receptors.0e-75 and retains 34.2 shares 34. srj. which is likely to participate in Wnt ligand binding) apart from the GPCR domain.1 (mom5) of C.6. This cluster contains 11 human and 42 C. Interestingly. The NC3 clade is of srbc type which forms a clade (with 67 bs value and 0.34 A-B). lin-17 (NP_491028. srh types from Str superfamily (with <3.2) and Frizzled homologue (cfz-2) (NP_503964. In CC1 cluster.108 2. NP_492635. Interestingly. two srh type receptors of C. . elegans GPCRs (Figure 2. Apart from these entries.061 Evalue) with candidate receptors from Srg superfamily.0 average E-value with good bs values).5% identity with FZD4 (human GPCR). NP_491028. elegans is an ortholog of FZD1 (human GPCR) which got associated with this cluster at an E-value of 1.0% identity with FZD5 (human GPCR). The Human GPCR (FZD1-10) got associated in CC1.2 (cfz2) shares 40.1) also an ortholog. Neighboring clusters NC2-NC8 share most of the members from str. got branched in the CC1 itself indicating that there may be a high functional similarity characteristic of close homology between these receptors.

serpentine receptors of C. Str super family (fuchsia/pink). elegans like Sra super family (aqua). hypothetical trans membrane proteins (red) are also shown.(a) (b) Figure 2. 109 .0. Srg super family (blue) .Others/Solo type receptors (maroon).1. quartet – puzzling steps of 10. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays. typical membrane proteins (purple). Human GPCRs are denoted in green color.34 (A and B ): Cross-genome phylogeny of FRZ/SMT type receptor (Rectangular Display & Radial Display) Cross-genome phylogeny of FRZ/SMT type receptor (Cluster 32): Phylogenetic trees were generated using TREE-PUZZLE 5. Generated newick tree files were colored by using MEGA 4.000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure.

few candidate receptors from the largest Str superfamily were only showing high E-value thresholds (Figure 2. serpentine receptors.7 CONCLUSION Reported associations in the current study for selected human and C.35 A and B). respective PSSM profiles.35. hypothetical receptors are associated at the significant E-value range of <1 and orthologs are reported at the significant E-value thresholds (such as <1) in the overall distribution of human-nematode GPCRs in the cross-genome GPCR associations. In future. and a very small percentage (2% ) of association was obtained at E-value thresholds more than 5 (Figure 2. the current approach is effective in connecting the remote homologues to the given representative profiles.C. in the evolution. In associating cross-genome GPCR sequences.001 to 1). Though the RPS –blast is a sensitive approach in associating the queries to the profile (independent of sequence identity). elegans (around 1159) GPCRs were associated by RPS-Blast technique to produce 32 cross-genome GPCR clusters of biologically important GPCRs. particularly str-type receptors. a clear understanding about the established associations could be obtained. 84% of cross-genome clustering was done successfully at significant E-value thresholds (ranges from 0. and type of GPCRs .B).110 2. additional 14% association was observed at the E-value thresholds ranges from >1 to >5. Also notably. and as we know the E-value thresholds obtained for this profile-based clustering technique is purely dependent upon the selected representative human GPCRs. Interestingly. and lack of input of all the available /possible receptor type within profiles. by observing the fine-grained analytical approaches like identifying conserved domains and motifs. This may be due to the long lineage of human GPCRs with the nematode serpentine receptors. The selected human (353) . E-value limit plays a critical role. elegans GPCRs provide information at preliminary level of understanding the cross-genome clustering and possible related sequences across taxa.

01.1 (F57A8. the observed orthologs were associated at the E-value ranges between 8. However. 13 candidate GPCRs were observed from the cross-genome GPCRs of peptide receptors (Table 2. Notably. candidate GPCRs such as frizzled homolog (cfz-2) (FRZ/SMT receptor of Cluster 32). 3E74.00 E-87) and high (0.1 from BGA receptor type of Cluster 20. Particularly. serpentine receptor srh-78 from PR type of Cluster 4. Therefore the approach could further tried/improved with alternate representative human GPCRs to assess the coverage of E-values for cross-genome clustering. GPCRs such as NP_507020. predominately.2). NC and SS.6E-16 and 2E-07 respectively to human GPCRS. dopamine receptor family member (dop-1) and DADR_HUMAN of BGA type receptor (Cluster 24) are the observed two examples for cross-genome GPCR association found at the low (8. in the current study.00E-73 to 0.4) (CMK receptor type of Cluster13) establishes favourable association at the relatively significant E-value such as 1E-80. NP_505583. serpentine receptor str-262 from secretin type receptor of Cluster 25 are found as most/very distantly related to human GPCRs observed highly relax/lenient E-value cut-off. Ortholog pairs such as (mom5. Meanwhile.2). FZD1_HUMAN) of FRZ type receptor (cluster 32).2). the stringency has been relaxed to facilitate cross-genome GPCR association. The association was obtained by applying the coverage of E-value ranges from 5. In this manner. Apart from orthologs.53) E-value thresholds among the identified orthologs of the dataset (Table 2. E-value thresholds (statistical application) is used effectively in profile-based clustering technique to identify/discriminate the associated GPCRs as closely or distantly related sequences across taxa. among the 27 orthologs identified (Table 2. every chance given for nematode GPCR to associate to related human GPCR profile (a sequence feature/property) was started with statistically stringent E-value cut-off. Also. Serpentine receptor sro-1 (N&L receptor type of Cluster 14). serotonin /octopamine receptor (ser-2) (BGA receptor type of Cluster-24). .111 in the dataset. When required.53 in the dataset.00 E-87 to 0. but also as NM. the closely associated GPCRs were distributed in the tree topology not only as CC.

res. >1 to <5 for 14% (orange color) and >5 for 2% (red color) in dataset. . . Kindly Note: Alignment files and supporting tables for all 32 clusters are available in the following URL for downloads: http://caps. The observed trend in cross-genome GPCR phylogeny also supports the employment of RPS-BLAST in associating the unknown GPCRs to the known/previously annotated GPCR profiles and to establish the associations even at remote homology.5 (orange color) and .35 (A and B ) Distribution of C. The proposed protocol for associating C.0 (green color). orthologs at different E-value thresholds: such as .ncbs. (B) Bar diagram illustrates the distribution of receptors at superfamily level. elegans GPCRs to known profiles of human GPCR clusters suggest that there has been high representation of C.zip. coclustering of orthologs (Table 2. hypothetical protein. elegans GPCRs with 32 human GPCR profiles at differentE-value thresholds: threshold of <1.1. associated hypothetical proteins.2) and also a trial study with known association clearly supports the RPS–BLAST clustering technique.1 to . elegans GPCRs (except in the cases of chemokine receptor cluster (Cluster 12. The distribution of serpentine receptors at the superfamily level. elegans GPCRs at E-value thresholds various (A) Graphical pi-chart representation for the association of C.112 Figure 2.in/download/crossgenome GPCR/supplementary files.0 for 84% (green color).13) and cell adhesion receptors (Cluster 27).5 (red color) in dataset.

species-specific members [SS] were introduced to follow the branching patterns in the dendrogram of different clusters. A broad spectrum of sequence relationships between human and C. related (neighbor clusters) and distantly related (neighbor members) sequences. 23 and 32). The resultant 32 enriched cluster and their phylogenies were analysed and discussed in detail for the distribution in cross-genome studies (Result summaries). 11. elegans GPCRs could be seen: for example. as in clusters 3-5. 8 and 21in Table 2. In this study. 16-17. Terms. neighbor clades [NC]. 8.2) among GPCRs from the two genomes helps to correlate the evolutionary integrity between the two genomes. sufficient polyploidy amongst members in a cluster (example as in clusters 6 and 11). the current method helps to connect the reported hypothetical proteins with the associated known receptor types at sequence level further to compare for the functional relationships. in many instances. No inter-mixing of sequence groups across individual genomes (called speciesspecific (SS) clades) and in Human GPCR (HC) clades) have also been . npr-9 is ortholog to GALR peptide receptor) validating our associations and clustering techniques.113 Since the significant E-values play a major role and act as a preliminary reference for sequence comparison. Interestingly. not sufficient inter-mixing (as observed in clusters 10 and 26) and strong species-specific tendencies (example as noticed for nucleotide and lipid receptors (Clusters 14 to 19). the association of GPCRs in the two genomes by our sequence analysis suggests that we can capture remote homology from 12 to 20 % (average cluster identity) and can include highly related (coclusters. like human GPCR clade [HC]. coclusters [CC]. there is inter-mixing in biogenic amine receptors (clusters 20 to 24). we could observe the orthologs coclustered within the same clade (example: in cluster 1. neighbor Members [NM]. The identification of putative orthologs (example as in clusters 1. 5. orthologs). Further. unannotated/ hypothetical proteins are associated with GPCR clusters at statistically significant E-values provoking their function to be interrogated by experiments (for example.

are helpful to understand the species-specific repertoire of GPCRs for the Sra.human GPCRs 8 11 8 8 8 8 8 8 14 8 12 10 No.No 1 2 3 4 5 6 7 8 9 10 11 12 Receptor type PR PR PR PR PR PR PR PR PR PR PR CMK No. are biologically useful to connect the conservation at sequence level. uncovers information on putative orthologs. Overall.of. elegans GPCRs in 32 Clusters S. Srg and others/Solo type of superfamilies. Str. with their Pfam domain knowledge and GO annotations. emerging from work described in Chapters 4 and 5 of this thesis. A recent publication on identifying conserved motifs in the aligned set of cross-genome GPCR clusters. Clusters 10 and 26). The motifs belonging to these receptors and the distribution of these receptors among biologically important eight subtypes of human GPCRs (kindly refer to Chapter 3) further helps to address the nematode specificity and to provide guidelines to understand species-specific sequence properties. function annotation of novel genes and functionally important /related sequences in two genomes for further practical applications. Nematodespecific serpentine receptors.1 Distribution of Human and C. elegans GPCRs 32 40 34 32 54 42 34 34 54 26 60 17 . C.114 noticed in some instances (for example. Table 2. cross–genome study such as the one reported in this Chapter.of. next to structure then to functional benefit. using sequence search and clustering strategy.

And cluster 26 retains around 14 GPCRs from Drosophila genome. elegans GPCRs according to the eight subtypes of human GPCRs.human GPCRs 15 7 18 7 18 8 18 8 4 8 22 23 16 Droso (14) 29 8 5 4 3 11 No.1 (Continued) S.115 Table 2.of.of.No 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Note: Receptor type CMK N&L N&L N&L N&L N&L N&L BGA BGA BGA BGA BGA SEC SEC CAR GLR GLR GLR GLR FRZ/SMT No. C. . elegans GPCRs 13 64 53 21 30 28 89 8 49 30 77 68 29 20 16 20 14 23 8 40 List of cluster wise distribution (for 32 clusters) of C.

00E-54 10 11 12 ACM3_HUMAN SSR5_HUMAN 5H1B_HUMAN BGAR PR BGAR NP_001024236.00E-52 4.1 24 26 1.1 NP_510101.00E-40 3.1 NP_510833.00E-41 5.1 NP_509896.1 NP_493193.00E-69 2.No 1 2 3 4 5 6 Human GPCR GBR2_HUMAN FZD1_HUMAN CALR_HUMAN Q9NZD1 GALS_HUMAN TRFR_HUMAN Type GLR FRZ SEC PR PR PR C.2 NP_001040810.00E-48 2.00E-87 1.00E-59 2.00E-45 19 20 21 22 23 24 25 Q8NFS4 V2R_HUMAN Q9HAR2 NK1R_HUMAN 5H1A_HUMAN Q96AM5 NY4R_HUMAN GLR PR CAR PR BGAR PR PR CAM33507.1 NP_497452.1 NP_508234.00E-75 5.00E-71 7 8 9 Q8NG71 GP10_HUMAN AA2R_HUMAN SEC PR BGAR NP_510496. elegans GPCR NP_741740.3 NP_001024569.00E-30 26 CCKR_HUMAN PR NP_001022842.1 NP_500930.00E-38 4.53 .1 6.00E-49 14 15 Q9NZR3 MTH8_DROME BGAR SEC NP_001024047.1 24 0.1 25 11 23 23 2 24 1.2 6 17 8 2.1 28 6 27 11 22 5 11 5.1 NP_509515.00E-43 3.01 27 DADR_HUMAN BGAR NP_001024579.00E-30 3.1 NP_501701.1 NP_505077.00E-38 1.00E-71 7.00E-47 2.1 NP_001021172.1 7 5.1 NP_001021365.116 Table 2.00E-47 4.1 NP_492635.1 NP_001040724.1 NP_001024316.00E-51 1.1 Describtion GABA B receptor subunit (gbb-1) mom-5 Calcitonin receptor activity Str-138 npr-9 hypothetical protein transmembrane receptor (Secretin family) Neuropeptide Y receptor activity hypothetical protein G-protein-linked Acetylcholine Receptor family member (gar-3) rhodopsin family tag-24 FSHR (mammalian follicle stimulating hormone receptor) homolog Dopamine receptor family member (dop-2) Srw-102 Gonadotropin-Releasing hormone rececptor (GnRHR) related (srh-2) npr-2 Metabotropic Glutamate receptor family hypothetical protein lat-2 hypothetical protein ser-4 hypothetical protein hypothetical protein Cholecystokinin Receptor homolog family member (ckr-2) Dopamine receptor family member (dop-1) Cluster no 31 32 25 5 1 5 E-value 0 8.1 8 0.1 NP_491990.2 List of Identified Ortholog S.00E-48 16 17 18 GRHR_HUMAN MTH9_DROME OX2R_HUMAN PR SEC PR NP_491453.00E-73 1.00E-50 13 TSHR_HUMAN PR NP_505548.

As Sydney Brenner suggested the conserved features of C.1 INTRODUCTION “A man may fish with the worm that hath eat of a king. . One of the annotated C. elegans is an effective model organism for studies in the fields of genome biology. elegans olfactory receptor (odr-10) has been selected for modelling three-dimensional structure. elegans –help to understand macromolecular evolution and also guide to connect the functional resemblance with other metazoans and higher order organisms. ELEGANS AND IDENTIFICATION OF CONSERVED MOTIFS IN SERPENTINE RECEPTOR SUPERFAMILIES 3. and eat of the fish that hath fed of that worm”– Shakespeare from Hamlet This olden quote would be an appropriate starting point to recollect the importance of not only the trend of evolution. but also the role of nematode serpentine receptors in comparative genomics. elegans has been studied previously (Chapter 2) the current objective is to perform phylogenetic analysis on serpentine receptors (SR) of C. C. animal development. behavioral studies and evolutionary studies.117 CHAPTER 3 PHYLOGENETIC ANALYSIS OF SERPENTINE RECEPTORS OF C. Since the cross-genome phylogenetic analysis on selected GPCRs of human and C. elegans exclusively.

3. amphibian and invertebrate paralogs (Coulier et al 1997). Such evidences emphasize the need of understanding the occurrence of SR gene clusters in C.  Since the only one receptor that has been annotated as olfactory receptor (odr-10) in C. Atleast 12 -17 signaling pathways of nematode are observed to be relevant with higher order organisms and involving with GPCRs reported by Scientific Frontiers in Developmental Toxicology and Risk Assessment. elegans and further motivate to connect the functional relevance with GPCRs of other nematodes. elegans GPCRs Genes spanning distant taxa are worthwhile target for comparative genomics. eukaryotic organisms / higher order organisms for the vast structure and functional implications. elegans and Notch in Drosophila share homology and notably these two receptors are tend to be a functional homologues also (Yoo and Greenwald 2005). elegans (Mori 1999. lin-12 – a membrane protein in C. interestingly. . elegans which plays major role in number of developmental processes (Coulier et al 1997) has conserved kinase subdomain II-VII (Hanks et al 1988) when aligned with nearly 40 homologues sequences and extended conservation with mammalian.118 3. In a comparative proteomics study. C. Washington DC. This metazoan is interesting because of retaining significant amount of homology with vertebrates and non-vertebrates.2 HOMOLOGUES OF C. elegans for a better understanding of family-specific features at sequence perspective. Stein et al 2003) . 83% of the worm proteome was found to have human homologues genes (Lai et al 2000. elegans is an important genetic resource for doing cross-genome studies. 2000. With this respect. FGF (fibroblast growth factors) in C.3 OBJECTIVES  The main objective of the current study is to perform phylogenetic study on serpentine receptors of C. For instance. avain. fish. In other instance.

elegans (Robertson and Thomas 2006.  The identified sequence-specific properties of SR families can be used in machine learning approaches (SVM) to train and detect the putative chemosensory receptors in other nematode species. elegans In the adult worm.119 Sengupta et al 1996). inner labial. one third of the cells (nearly 302) are neurons and hence the study on neuroanatomy in nematode is focused at the synaptic level.. Woollard 2005). Roayaie et al 1998. Each organ possesses two supporting cells . nearly 10% of the nervous system participate in nematode olfaction. 32 neurons. phasmid.4 CHEMOSENSORY RECEPTORS IN C. Thomas and Robertson 2008) and is mediated by members of the seven-transmembrane G-protein-coupled receptor class and genetic analysis of sensory neuron-specific G-proteins indicate the functional participation of olfaction. elegans. There are four types of chemosensory organs such as amphid. phylogenetic cluster association could guide to collect homologues sequences of odr-10 and which further helps to collect odr-10 like sequences from other nematode species and higher eukaryotes.e. around 959 somatic cells are present and among them. 3. i. eukaryotes and higher order organisms. elegans Chemoperception is a central sense in C. The external/chemical signals are mediated predominately by GPCRs in C. odr-10 has been modelled and could be explored for ligand binding sites and map hot spot residues. White et al 1986). By using homology modelling. elegans (Hilliard et al 2002.5 CHEMOSENSORY NEURONS AND OLFACTORY APPARATUS IN C. and outer labial organs are present in C. 3. nociception and pheromone responses. Notably. The synaptic connectivity is well-understood by using electron microscopic studies (Chen et al 2006.

ASG. as defined by the individual Pfam (Version 9. This may be due to the importance and necessity of . brigasse. elegans Nearly ±1280genes ±420 pseudogenes were identified as related to chemoreceptors in C. Through this pore. the abundant occurrence of serpentine receptors has been observed and it covers almost 7-8% in genome size (Chapter 1 also). ASH. ADF. sensory neuron endings are exposed out to sense external stimuli.6 FAMILIES AND SUPERFAMILIES OF SERPENTINE namely di-acetyl RECEPTORS IN C. elegans. In the anterior end of the C. ASE. and ASK) are exposed to the cilia and the other three chemosensory neurons (such as AWA. neurons present in the anterior end participate as chemoattractants and the posterior end is meant for the chemosensory avoidance. elegans species than in C. elegans possesses almost 70% more chemosensory genes (718) than the other subspecies. Phasmid pore is comparatively smaller than amphid pore and contains PHA and PHB neurons (Hilliard et al 2002) Generally. Interestingly. AWB. C. The later three sensory neurons are also called as wing cells. ASJ. AWC) are embedded in the sheath cell.0) chemosensory gene families (Bateman et al 2002). ASI. Genes related to chemoreceptors are larger in C. Interestingly. It is worthwhile to study the sequence properties of serpentine receptors to connect structural and functional relevance with other higher eukaryotic organisms.e. odr-10 is found to be sensitive in recognizing volatile odorant (2. two amphid pores are present and each pore contains 11 chemosensory neurons and one thermosensory neuron called AFD.3-butanedione) (Sengupta. Neurons are generally referred by three or four alphabetic letters. Due to the need of nematode habitat i. eight neurons (namely ADL. elegans and are classified under 20 families (Thomas and Robertson 2008).. Among the 11 chemosensory neurons. et al 1996) .120 namely sheath and socket cells and forms the pore. lack of auditory and visual sense. 3.

The occurrence of gene duplication. elegans. sru. Str superfamily is the largest and notably srh family is the largest family in the nematode chemoreceptors. remanei. the frequency distribution of chemoreceptor is species-specific. like sre and srxa families. The Sra superfamily retains families such as sra. have single orthologs in both C. Serpentine receptors (SR) are broadly classified as Sra. Srg and other/Solo type superfamilies. sri. Thus. briggsae and process of ongoing gene duplication and loss on each lineage. Interestingly. more than half of the genes in C. srbc. Srg superfamily includes families like srg. srv. srh. remanei. srsx. brigasse. srw. At one instance. C.121 chemoperception in C. movement and diversification in C. srw and srz families. As explained in another case study. srab. srb. Str superfamily covers srd. srxa. srt. There is substantial evidence reported for the event of gene loss and duplication in serpentine receptors of C. sre. elegans have very few clear orthologs across the other species such as C. and srz families of C. These are relatively stable families and have substantial numbers of apparent gene duplications and losses. elegans than in C. str family along with related sri and srj families are observed to be related to odr-10 (olfactory receptor) in C. Such observations clearly show the event of gene duplication and losses and emphasize the need for careful assignments in the gene cluster arrangements and also in the phylogenetic tree observations. srx. elegans (due to birth-death evolution). str families and others and Solo type includes srbc. elegans (Sengupta et al 1996). These are dominated by species-specific expansions. elegans (Robertson 1998) suggest the need for careful assignment of gene description. which presumably arose by a . redundancy. particularly. srj. Str. Overall. in phylogenetic clustering. briggsae and C.

srx to melatonin receptor 1A (human MTNR1A) and srxa to melaninconcentrating hormone receptor (mouse MCHR-1) are found to be related. . srab receptors to human thyroid stimulating hormone receptor. anions. candidate sre type receptor to sphingolipid G-protein coupled receptor 1 (human EDG1) are observed as significantly related at sequence level. the diverse receptors such as srg. srj. str. certain serpentine receptors show remarkable functional relevance with human and mouse(Thomas and Robertson 2008). srb receptors to melaninconcentrating hormone receptor of mouse. srw type receptors. srv are related to opsin type receptors from many other species. Notably. ketones. observed to be related to the neuropeptide receptors and various SR families such as srh. Na+). pyrazines. sri. Nematode responds to the water –soluble compounds such as salts. sru. some nucleotides. and chloride ion concentrations and basic pH changes. srbc type receptors to angiotensin II receptor type 1 (human MAS1). and some vitamins and various volatile compounds as alcohols. srt. some amino acids (like lysine and histidine). thiazoles. elegans responds /senses to a variety of chemical attractants and repellants. and srd.122 3. C. and aromatic compounds and biotin. candidate receptors from sra family found to be related to melanin-concentrating hormone receptor (mouse MCHR-1). srsx type receptors to somatostatin receptor (mouse Sstr2). cations (K+. They respond to various chemical stimuli by diverse behavioral responses (Pace et al 1985. are highly related to each other at intra-genomic level. 3. For example.7 FEATURES AND IMPORTANCE OF SRs There are reports in the literature that C. esters. elegans shows remarkable olfactory ability in discriminating various external cues and is mediated by the olfactory pathway. Sklar et al 1986).8 SRs: FUNCTIONAL RELEVANCE WITH OTHER EUKARYOTIC GPCRs Interestingly. Also.

Candidate receptors from sra. srsx. mouse and humans which is seen in the canonical GPCRs. srv.1 METHODOLOGY Data collection SEVENS database (Ono et al 2005) provides more than 1000 GPCRs for C.1 Pie-diagram to show the distribution of serpentine receptors (SR) in the dataset Note : The pie-diagram illustrates the distribution of serpentine receptors in dataset. srxa type receptors from Srg family (blue). and srm from Solo/others superfamily were collected and distribution of serpentine receptors 4%. srsx. and srbc. srh. srj. and as mentioned in the literature. elegans genome exclusively. srt. 7%. srj. srx. 3. srg.9. srw.123 3. but Drosophila ORs exhibit . str from Str superfamily.9 3.1). 69% and 20% respective superfamilies in the dataset (Figure 3. srx. srb and sre from Sra superfamily (aqua). srv. Distribution of serpentine receptors in dataset 20% 4% 7% SRA 69% Figure 3. and srbc. srd. srb and sre type from Sra superfamily. 682 receptors have been collected along with Odr-10. and srm from Solo/others superfamily (brown) were exhibiting 4%. str from Str superfamily (pink).9. sri. srd.2 Prediction of TM-helices by HMMTOP The collected serpentine receptors were predicted for transmembrane helices and their membrane topology. Candidate receptor types such as sra. srw. srh. srxa type receptors from Srg superfamily. sri. srt. srg. 69% and 20% respectively. For the current study. 7%. As mentioned in the literature. N-out topology is observed predominantly in worm.

2). 3.9. The generated tree topologies were analyzed for the cluster association with reference to superfamily. 3. an appropriate procedure namely MAFFT has been employed to align more than 650 SR sequences. Receptors for about 15 families have been selected and studied for the conserved motifs by using TM-MOTIF package (Chapter 4). HMMTOP-prediction server was used to predict the membrane topology of the SRs and 80% of sequences were predicted for “N-OUT” topology.124 reverse topology (Benton et al 2006). Cluster association at reliable bootstrap (>=50) values were grouped and inferred at superfamily level (Figure 3. 3. Sequences of serpentine receptor show high degree of diversity and varying sequence composition giving rise to considerable indels in the alignment window. As the number of serpentine receptors varies from family to family.9. . A gap penalty of 1. only few representative serpentine receptors have been considered for the current study.0 (Saitou and Nei (1987).53 and JTT 200 matrix was used to generate alignment.5 Identification of Motifs in SRs Phylogeny-guided cluster association has been used to build multiple sequence alignments.9. Few representative sequences at the family level have been selected and aligned by online MAFFT alignment program.3 Alignment Procedure by MAFFT Since number of sequences for alignment and evolutionary distance between protein sequences play crucial role in the alignment procedure.4 Phylogeny of Selected Serpentine Receptors The aligned SR sequences were used for generating phylogeny and NJ method of tree generation has been done for the 1000 BS replicates by using MEGA 5.

Thereby.10 RESULTS The generated phylogeny for the selected SRs reveal four distinguishable clusters of the four receptor superfamilies such as Str. elegans (odr-10).2). Among them. The tree topology is given for the circular display in MEGA 5. candidate receptors from Str superfamily tend to be predominant and observed for their huge occurrence in the phylogeny. Conserved motifs with their respective topology and possible substituting AAS were examined. 3.125 The generated MSA of 15 SR families. others/Solo (Figures 3. nearly.3). The phylogenetic analysis enables the collection of neighboring sequences for odr10. Notably the largest Str superfamily (in pink) shows high number of occurrence and odr10 (in red) annotated olfactory receptor tends to be with the str family of Str superfamily members (in pink).0. The only annotated olfactory receptor in C. Srg (in blue). The phylogeny shows the clear “speciesspecific” relationship within the four superfamilies. were used as input to the TMMOTIF package. 43 homologues sequences have been identified (Figure 3. Figure 3. Sra. which is reported to belong to the Str superfamily. others/Sole (in brown). Srg. indeed is co-clustered with candidate receptors from Str superfamily. Sra (in aquva).2 Phylogeny on selected serpentine receptors (circular view tree) Note: The generated NJ method of phylogeny shows the species-specific cluster arrangements for the serpentine receptor superfamily Str (in pink). . along with their respective multiple FASTA sequences of 15 SR families.

126 In the interest of collecting the homologues for odr-10.123 and125 from Str_C6 can be further explored for ligand binding properties and secondary structure predictions (Figure 3. only the subcluster related to odr-10 has been studied in detail.124.144.4) also the sequence identity observed between these receptors is nearly 80%. .99. Str-112 and related str type members such as 115. 114. 135.92 and 93 from Str_C3.This sequence association could be an interesting evidence to refer the effectiveness of phylogeny clustering in identifying/associating the related homologs rather reliable association. Among the collected 43 sequences.101.146. the str-type receptors such as Str.141.148. The pairwise alignment between odr10 and str-112 shows the good quality alignment with the query coverage (alignment length versus query sequence length) of more than 95% and (Figure 3. Str_C1 clade retains five related str type receptors with significant BS values.118 from Str_C5 and Str-126.106.4 (str-112) (Chen et al 2004)). six subclusters have been identified and are named as Str_C1 to Str_C6 (Figure 3. As a result.97.264. Early literature also strongly suggests that this particular receptor and str-89 as structural homologues to odr-10 (wormbase IDs such as WBStructure010755. 134.3). 145.109.3) since they are found to be related to odr-10 . Notably. The same way.138.113 can be further studied for the behavioral assays to detect di-acetyl and related compounds. Str 131. candidate receptors such as Str89.149.139. serpentine receptor namely str-112 shows highly significant BS value as 100 and remains as a closest homologue to odr-10. receptors such as str-151. WBStructure010818 (F10D2.108. 111.103 from Str_C2.130.90.96. 136.140 from Str_C4.143.

and the obtained results for the motifs and its occurrence in the predicted membrane topology were tabulated (Table 3.127 Figure 3. and the resulted alignments were used to identify the conserved motifs along with substituting amino acid residues (AAS) at the family level by using an in-house program (MotifS program.1 and 3A.8. effort has been made in identifying the conserved motifs and nearly 92 conserved amino acid patterns have been identified and are tabulated with respect to the serpentine receptor family with predicted .10. As a pilot study. 3.1 Identified Motifs in SR Families : A Pilot Study From the phylogeny. the sample sequences for each serpentine receptor family varies (refer 3.by R. The aligned set of serpentine receptors were given as an input to the TM-MOTIF.1) for 60% level of conservation.Sowdhamini unpublished results) and recorded by using TM-motif package (Chapter 4).3 Subcluster showing odr-10 and its homologues Note: The circular view of NJ tree shows odr-10 and its related homologs from Str_C1 to Str_C6 and notably the closest homologue str-112 is observed along with odr-10 in Str_C1. the selected representative sequences from each clade were taken and observed for the conserved amino acids.1) and are aligned by the MAFFT alignment program. As discussed in the methods.

this pilot study uncovers the unique sequence feature of each serpentine receptor families and provides clues to develop the procedure on observing conservation and number and physiochemical property of substituting amino acids (AAS). ETD (C` terminal) could help to define sequence features particularly to sra family. considerable chances of missing the conserved AA patterns but the study can be further improved by retaining unbiased sample size.2 Homology Modelling of odr-10 odr-10 has been selected for structure prediction and the following procedure has been followed by homology modelling. the identified motifs such as MIF. and srb members in Sra superfamily. . IYL motif is conserved both in the sri. srj family related to Str superfamily. However.5). Prominently. 3. FEN (ECL3). QLF motif is observed in the ICL3-TM6 of the str family. WTDD (ECL1). And as sre family is most diverse when compared to sra.10. PIY (N`terminal). Also notably. YRY motif is observed in TM3.10. RFQAKEN (ICL3). ICL2 in many of the serpentine receptor families.128 membrane topology.2. 3. srab. notably odr-10 is belonging to this family and this identified family-specific motif is observed in the odr-10 and is mapped on the modelled structure of odr-10 (Figure 3.1 Pairwise alignment of odr-10 with bovine rhodopsin sequence The membrane topology of odr-10 was predicted by HMMTOP and the N-out topology was observed as in the canonical GPCRs. applying consensus prediction methods at various percentage level of conservation. Since the pilot study is varying in sample size.

respectively) colouring scheme for the respective TM-helices. Green.10. (G) green.4). (Y) yellow. 3.4 Pairwise alignment of odr-10 with bovine rhodopsin sequence Note : The pairwise alignment of bovine rhodopsin and odr-10 is given in (V) violet (I)indigo. Alignment is done by using MAFFT. Yellow. ICL2 and the Str superfamily-specific “QLF” motif in ICL3 has been highlighted in red colour.5% and a structure-guided alignment was provided as input to MODELLER (Sali and Blundell 1993) (for generating a three-dimensional model of odr-10 using homology modelling technique. ICL2 and the Str superfamily specific “QLF” motif in the ICL3 has been highlighted in red colour. Blue. (O) orange.2 Alignment by MAFFT The pairwise alignment of bovine rhodopsin and odr-10 is given in VIBGYOR (Voilet. The percentage identity between the template and odr-10 is 19. (R) red colouring scheme for the respective TM-helices. Indigo. Among the generated 20 models. . Alignment is done by using MAFFT. (B) blue.5) and was further energy minimized by using SYBYL software package (Tripos associate Inc).2. Orange and Red to denote TM1 to TM7.129 Figure 3. The conserved “YRY” motif in the TM3. best model in terms of “least energy” was selected (Figure 3. The conserved “YRY” motif in the TM3. The crystal structure of bovine rhodopsin was selected as a template and a pairwise alignment was done by using an alignment program called MAFFT (Figure 3.

882 Å.130 3. minimized models with only TM helices were subjected to structure validation. .2% (Figure 3.23 Kcal/mol after energy minimization by SYBYL (Tripos associate Inc.5). PROCHECK (Laskowski et al 1993) results of generated energy minimized model shows 82% of the residues are within strictly allowed regions and 14% are within partially allowed regions of Ramachandran plot (Figure 3.2. The seven transmembrane helices were colored in VIBGYOR colouring scheme.).5). to be 3. The selected threedimensional model shows a final energy of -1020. The structure reports showed that the percentage of residues within allowed regions was 93. After excluding the intra and extracellular loop regions.5 Three -dimensional model of olfactory receptor odr-10 and structure validation Note: 3D model generated by MODELLER is displayed in ribbon representation.3 Structure validation for Odr-10 model Structure validation was performed by referring the pre-existing experimental data using PROCHECK server (Figure 3.5) indicating high quality of the model. The str family specific motif “QLF” is identified by TM-MOTIF package and is denoted in sphere shapes in TM6 and details of structure validation is given in chart.3% and those within partially allowed regions was 5. RMSD was measured between the target structure (bovine rhodopsin) and odr-10 model and found Figure 3.10.

Although the reported topology (N-out topology) is favourable as predicted in higher order organism. elegans genome. worm ORs (for example.10. containing fish-like ORs is known to recognize only the water-borne odours.2. Sequences were aligned by using MAFFT server using JTT-200 matrix and the derived OR alignements were employed to generate phylogeny in MEGA 5. The observed phylogeny also depicts . 3.05 software for NJ method with 1000 Bs replicates. The long evolutionary lineage between these two taxa can also be a strong reason for not exhibiting signnificant cocluster tendency in the phylogeny. This was observed as a distinct clade and retains intra. nor do fish-like ORs of humans co-cluster with nematode ORs. The nematode life style and recognizing limited and simple odors like Di-acetyl etc could be another reason for odr-10 staying an outgroup. starting from lower chordate like fishes (Chapter 6 for more details). Interestingly.6) in a phylogeny derived from an alignment with predominant human OR sequences.10. odr10) remain distinct members in the phylogeny.5 Odr-10 an outgroup to HOR The nematode OR (odr-10) stays as an outgroup (Figure 3. along with respective protein annotations.2. The supplied human OR sequences were annotated with the respective cluster numbers.131 3. a cluster (HsC1).4 Preliminary phylogenetic analysis Preliminary study was performed with a phylogenetic analysis on selected human ORs with odr-10. These are aligned with a nematode olfactory receptor. namely odr-10.genomic retention or cluster-specific property in the human OR phylogeny. odr-10 retains species-specific tendency and exhibit as an outgroup to human ORs. despite evoultionary conservation observed in HsC1 cluster including ORs from fish and humans. Distinct “Class I type receptors’’ HsC1 Notably. It is also useful to understand the evolutionary trends in higher order organisms. This additionally suggests that agreement in topology may not necessarily include olfactory receptors to cluster together. which is the only functional OR reported from C.

elegans Note: The observed 10 human OR subclusters (Chapter 6 for more details) were denoted in Aqua (HsC1). 80% of sequences were reported for the N-out topology. str112 seems to be the closest homologue to odr-10. The collected SR sequences were aligned and clustered using neighbourjoining method. intra-genomic clustering in human ORs were observed and nematode OR tends [odr-10] stays as an outgroup. Each human OR is designated with cluster number and HS as prefix.6 Phylogeny on selected human olfactory receptors with an olfactory receptor (odr-10) from C. Odr10. Interestingly. This is a case study to demonstrate that reliable . Orange (HsC7). 3. Fuchsia (HsC9) and lime (HsC10) colours respectively in tree topology. Green (HsC5). fish-like ORs and other human ORs are present in separate and distinct clades reflecting taxonomic hiearchy and species-specific requirement of ORs. and a subcluster related to this receptor have been well studied for the collection of odr-10-like sequences. the only annotated olfactory receptor. 43 homologues of odr-10 have been collected through phylogenetic clustering and are denoted in the clusters such as Str_C1 to Str_C6. Yellow (HsC6). Teal (HsC3).11 CONCLUSION 683 serpentine receptors were collected from SEVENS database and were examined for their predicted membrane topology by HMMTOP (Chen et al 2002). Three-dimensional structure of this particular olfactory receptor. odr-10 Figure 3. Most of the serpentine receptors were clustered at the family level and depict the sequence-specific features at the superfamily level.132 evolutionary hierarchy at various taonomic levels in cross-genome phylogenetic study where worm ORs. odr-10. Purple (HsC2). Notably. Red (HsC8). has been modelled and studied for secondary structural details. Blue (HsC4).

429 and the total energy is 1020. and OR 83b . Primarily. an attempt has been made to model odr-10 by using bovine rhodopsin as template. Distance dependent dielectric constant 1 and nonbonded interaction cut off 8 has been set) and the resulted model shows the bond stretching energy 10.5).249. However. Though the remote sequence identity (19. angle bending energy 336.133 3D model can help to recognise the functional important residues like ligand– binding sites and general mechanism of odorant binding proteins.229. A consensus method of predicting transmembrane topology can be viable in detecting the best coverage of predicted topology for both helice and loop regions.a ligand gated ionchannel of Drosophila (Harini and Sowdhamini 2012) also suggest an established protocol for modelling and validating membrane proteins of our interest.234 kcals/mol. and a multi template approach for modellling can also be implicated.5% ) exist between template and odr-10 in pair-wise alignment. the . the least energy model was further energy minimized by using SYBYL software package (Tripos force field. the primary objective of the current study is performing phylogenetic analysis on selected serpentine receptors and to identify the SRs associated with the only one annotated olfactory receptor (odr-10). Among the generated 20 models by MODELLER (Sali and Blundell 1993). torsional energy 255.882 Å. This further can guide to compare with other alternate templates (where the recently solved crystal structures with high resolution can be used). Powell's gradient (100 iterations). the predicted N-out topology of odr-10 as the same as canonical GPCR (also retains N-out topology) drives to model this particular olfactory receptor. previous lab publication to model GPCR receptor (Kanagarajadurai et al 2009). RMSD between the bovine rhodopsin and odr-10 found to be 3. van der waals energy -1568. PROCHECK (Laskowski et al 1993) results of generated energy minimized model reports 82% of the residues are within strictly allowed regions and 14% are observed within partially allowed regions of Ramachandran plot (Figure 3. Since. ionicchannel (Shah and Sowdhamini 2001).245.

This helps further to propose representative candidates for secondary structure modelling .TM5 ECL3 . These ‘hot-spot residues’ can be used to relate with the functional sites of the receptor(s) further to extend the possibilities in ligand-receptor binding.YNK Sre family: MIF. Table 3. PIY WTDD RFQAKEN FEN LNPL ETD TM2. select olfactory receptor sequences (having data on odor binding) for docking and molecular dynamics analysis can be carried out.1 List of identified “motifs” in serpentine receptor superfamilies SRA superfamily: Topology N` terminal TM1 TM2 TM4 TM5 TM6 TM7 ICL1.TM2 ECL2 ICL3 N` terminal ECL1 ICL3 ECL3 TM7 C` terminal STR superfamily: Srd family: GPC YFV PYR Sri family: IYL KHQ VLI Srj family: IYL RALIVQT. The next chapter deals exclusively about TM-MOTIF in identifying motifs in TM proteins for numerous practical applications.3 of Chapter 6) to connect the sequence-structure-function relationship of ORs in five dedicated genomes. In future. YGQTGLL ISI. FMF. ANL SGM. IPI AIIL FGNYR RSW PIFGI Topology Sra family: SLN KISQ . Such sequence features could be useful as constraints in sequence searches. PFIAL STKILL CATF WDDPL.1 in Appendix). dimer-inteface predictions (Nemoto and Toh (2005). to train SVM models further to predict putative receptors from other nematode species. ECL1 TM7 TM7and C` terminal TM1 ICL2 TM7 TM1 TM6 TM7 ICL1 ECL2. In sequence analysis. virtual screening with various ligands/receptors. STG FNL. ICFLT. This would provide an insight into functional characterization of these receptors.3) propose closest homologues for odr-10. ligand binding sites (through docking) which could enable better understanding of the mechanism of olfaction. YSFG VVW.9. Nemoto and Toh (2009).LTF NLF.134 observed sub-clusters Str _C1 to Str_C6 (Figure 3. 92 serpentine receptor family-specific motifs have been identified (Table A1. By using TM-MOTIF. The essence of the study has been used elaboratively in developing an integrated database called DOR (Database of Olfactory receptors) (refer 6.

congenital stationary night blindness and causing obesity (Schoneberg et al 2004). gonadotrophin releasing hormone receptor. diabetes. They are also associated with oligomerization and . light and odors and so on (Marinissen and Gutkind 2001). by melanin-concentrating hormone receptor 1. They are implicated in various diseases and are related to more than 30 different human diseases such as cancer. (Schlyer and Horuk 2006). serotonin receptors. ovarian hyperstimulation syndrome.. They recognize and mediate various stimuli such as hormone (example. luteinizing hormone/choriogonadotropin receptor). neurotransmitter (by muscarinic acetylcholine receptors. AMP. catecholamine receptors. follicle stimulating hormone receptor. growth factor. Membrane proteins are abundant and ubiquitous. ADP.4. growth hormone secretagogue receptor.135 CHAPTER 4 TM-MOTIF: A PACKAGE AND AN ALIGNMENT VIEWER TO IDENTIFY CONSERVED MOTIFS AND AMINO ACID SUBSTITUTIONS IN ALIGNED SET OF SEVEN TRANSMEMBRANE HELIX PROTEINS 4.5-HT1. GABAB receptors. ATP.1 INTRODUCTION Transmembrane proteins belong to the largest protein families and are popular drug targets.2. thyrotropinreleasing hormone receptor. They are involved in important biological functions and participate in signal transduction pathways. peptide hormone receptors etc. purine receptors (P2Y): adenosine.).6. hyperthyroidism. metabotropic glutamate receptors.

structural details.1. 4. features and functional expressions.. drug design etc.a neurodegenerative disorder (Gleim et al 2009) and a single hydrophobic to hydrophobic substitution in the transmembrane domain impairing aspartate receptor function (Jeffery and Koshland 1994) stand as suitable examples to highlight the need of identifying the transmembrane motifs and the conservative amino acid substitutions. require knowledge on sequence analysis. also involvement of NPxxY motif of the V2R in clathrin-mediated endocytosis (Bouley et al 2003) are commendable.1. Apart from the conserved motifs. Functional Importance of Conserved Motifs in TM-Proteins Various earlier studies emphasized the role and importance of conserved motifs in membrane proteins. hot spot residues and so on. substituting AAs also play an important role in determining the functional expressions. ligand binding sites. The implication of membrane proteins in vast practical applications such as pharmacy. The conserved amino acid patterns present in the helices and in the loop regions of membrane proteins play an important role in retaining conserved evolutionary trends. wherein glutamic acid/aspartic acid maintains the receptor in its ground state (Rovati et al 2007) also NPXXY motif observed in the TM7 and C` terminal is crucial for structural constraints in rhodopsin (Fritze et al 2003). a single amino acid mutation in rhodopsin motif [R194/K195]xE. Couple of cases like. in causing retinitis pigmentosa (RP) . The role of characteristic E/DRY motif is essential for regulating GPCR conformational states. interacting interfaces.136 understanding the structural details of oligomeric interfaces helps in identifying active stage of membrane proteins and to locate binding sites of ligands. Studies explaining the role of LWYIK .

137 motif in HIV type-1 transmembrane gp41 protein for viral infection (Chen et al 2009) and CCR5a chemokine receptor acts as a coreceptor for macrophage-tropic HIV-1 strains involving a sequence motif (TXP) in the second transmembrane helix (Govaerts et al 2001). due to large number of sequences. immunoreceptor tyrosine-based inhibitor motif (ITIM) in cell proliferations (Duchene et al 2002) motifs in protein-protein interfaces (Neha Vyas et al 2008). characteristic motifs in GPCRs (Kim et al 2008). previous . also referred as leucine zipper protein–protein interaction motifs (Weiming Ruan et al 2004 ).1. Motif Related to Structural Integrity and Stability Conservation of seven amino acids in TM-proteins (LIxxGVxxGVxxT) is related to dimerization. and usually occurs in the helix−helix interface. Aside from the convenience of automation. A membrane-spanning heptad repeat motif is also found to be useful in mediating interaction between transmembrane segments. to motifs in membrane proteins (Tusnády et al 2008). These motifs are thought to assist structural organization. olfactory receptors (ORs) (Malnic et al 2004) are few examples to illustrate the functional importance of conserved motifs in various cellular activities. permeabilizing motifs related to host membrane destabilization in alphaviruses (José Nieva et al 2004).2. The detected AxxxA motif is responsible for the thermostability of protein structures in thermophiles. The GxxxG motif (also known as packing motif) is essential for transmembrane helix-helix interactions in both membrane and water-soluble proteins (Russ and Engelman 2000). apart from oligomerization (Lemmon et al 1994). Many computational approaches and related data repositories are upcoming to analyse sequence properties. (Marsico et al 2010). in particular. 4.

to identify conserved TM-MOTIF. and map the identified motifs on the colour-annotated .2. The knowledge on identifying cluster-specific motifs or family-specific motifs will help to classify the proteins (Attwood and Findlay 1993) and further can be associated with its structural and functional relevance. Motif-based construction of functional maps for olfactory receptors could be an appropriate citation to quote the implication of motifs in practical applications (Liu et al 2003). Comparison of the occurrence of motifs in the perspective of topology will provide a clue to connect with the structural and functional aspects. 4. orange (O) and red (R) colors. It also serves as an effective alignment viewer in displaying predicted seven transmembrane boundaries in seven varying colours. blue (B). namely violet (V). OBJECTIVES OF TM-MOTIF The current study is aimed to design a computational tooli.3. Impacts of Motifs in Evolutionary Bioinformatics Sequence analysis at fine–grain level. Thus.138 attempts by other methods do not consider mapping predicted secondary structures and evolutionary perspective of motif retention in one algorithm or within a single repository.. 4. yellow (Y). the tool to identify motifs in TM-proteins is highly useful to the field and is my current objective. indigo (I).1. to serve with the dual objective motifs (default by 60% level of conservation) and map the discovered motifs on predicted membrane topology of the set of aligned transmembrane protein sequences (TM proteins). green (G). in observing conserved domains and motifs always provide a path to understand the evolutionary conservation observed within and across genome.e.

4. The provided mouse-over option facilitates to display not only the identified motifs also amino acid substitutions with its physico. KEY FEATURES OF TM-MOTIF  An in-built dataset of previously established phylogenetic clusters (Metpally and Sowdhamini 2005) of selected human and Drosophila GPCR clusters.3. elegans GPCR clusters of eight major types of 32 clusters (Chapter 2) and clearly distinguishable 10 human -mouse OR clusters (Chapter 6) from cross-genome clustering studies were incorporated in TM-MOTIF package for the observation of conserved motifs and AAS at various percentage levels of conservation. Apart from the identified conserved residues at 60%.chemical properties in each position of the alignment.139 membrane helices.  TM-MOTIF assists user to perform a BLAST search (using option “Run-BLAST”) to collect the nearest homologue for their sequence of interest from the in-house dataset further to .  TM-MOTIF facilitates the user to submit their sequence of interest (membrane proteins) in FASTA format along with its respective MSA in the given text box further to obtain various TM-MOTIF display options. a profile-based clustering of selected human and C. wherein the predicted seven transmembrane helices were displayed in seven colours (VIBGYOR colouring scheme) and the identified motifs were mapped on the respective topology. physico-chemical properties of the substituting amino acids are also documented.

And an overview of the tool-guide is given pictorially (Figure 4.2). Respective cross-genome GPCR/OR cluster alignments were used to detect the conserved motifs by using in-house program (Step 3). The identified motifs were mapped on the membrane topology for TM-MOTIF display (Step 4).  TM-MOTIF encourages user to select any one of the reference sequences (whose structure is solved) to align with their sequence of interest and the resulting pairwise alignment in VIBGYOR display serves for the initial requirements for the homology modelling.1 Flow-chart Note: Flow-chart depicts the step wise procedure involved in developing TM-MOTIF. In principle the generated in-built dataset (Step 1) was used primarily to detect the membrane topology (by HMMTOP prediction method (Step 2). user provided option to choose (Step 5) a BLAST search to identify homologs from the in-built dataset (given in maroon box) or align sequence of their interest with any one of the given reference sequences (given in green colour). METHODOLOGY A flow-chart (Figure 4. Separately. 4.4.140 observe motifs and AAS.1) provides the pictorial representation of the methods and the steps involved in developing TM-MOTIF. Figure 4. .

glutamate receptors (GLU) and frizzled /smoothened (FRZ) were incorporated as in-built GPCR cluster dataset in the package (Figure 4.4. human-C. chemokine receptors (CMK). 4.1.2 and 4. In-Built Dataset of Cross-Genome GPCR and OR Cluster Dataset 4. 4. Through neighbour–joining method. PRALINETM (Pirovano et al 2008) and MAFFT (Katoh et al 2002) were used to align the cross-genome GPCRs of human-Drosophila GPCR clusters. biogenic amine receptors (BGAR). nucleotide and lipid receptors (N&L).4. 10 human-mouse cross-genome OR clusters were established and are also included as “OR-subclusters” in the package.1. elegans GPCR clusters were established at varying E-value thresholds. The alignment procedure was .1. elegans GPCR clusters and human-mouse OR clusters.1 Human-Drosophila cross-genome GPCR clusters From our previous lab publication (Metpally and Sowdhamini 2005). human and C. elegans cross-genome GPCR clusters Through a profile–based clustering approach (RPS-BLAST).4. secretin receptors (SEC). And the resulted 32 cross-genome GPCR clusters were also considered and incorporated to the TM-MOTIF package (Chapter 2 also).1. 4.141 4.4. Alignment Procedures for Cross-Genome GPCR/OR Clusters Appropriate alignment tool is important in generating high-quality alignments. phylogenetically established 32 human-Drosophila GPCRs clusters of eight major receptor types such as peptide receptors (PR).4. respectively. In our current study.3 Human-mouse cross-genome OR clusters Olfactory receptors (ORs) belong to Class-A type of GPCRs. the effective alignment procedures like CLUSTALW (Thompson 1994).2.3).2 Human-C. cell adhesion receptors (CAR).

142 different to cluster datasets. By selecting “Alignment with reference sequence” (refer Methodology) user can benefit by obtaining any one of the Display options -“Run TM”. and viewing the MSA for the available display options of the TM-MOTIF such as “Run TM”.6.3.1.The four important output files in section 4. The user can also start by submitting their sequence of interest or alignment (pipelines referred in Right side) and can perform “Run-BLAST” (Methodology). Whereas. genomic combination and average cluster percentage identity. “Run TM-Motif” options. “Run Motif”. sequence length.4. 4. user submitted queries are treated by standalone CLUSTALW alignment method incorporated in the package. Care was taken to retain sequences with only 7(±2) TM helices for in-built GPCR/OR cluster dataset. Figure 4. Prediction of Membrane Topology for TM Helices and Loops Each candidate receptor (GPCR or OR) from the cluster dataset was predicted for the membrane topology by using the standalone version of HMMTOP incorporated in the package. while running BLAST user’s query is searching against in-built cluster dataset and finds out homologues sequence of the respected cluster.2: Pictorial representation for the tool guide of TM-MOTIF which depict the available options: User can start with selecting the in-built cluster dataset/organisms (pipelines referred in Left side).2 Tool guide of TM-MOTIF : an overview Figure 4. due to the varying parameters such as number of sequences. User-submitted query/queries is/are also considered for the same cut-off and prediction procedure. “Run TM-Motif). “Run Motif”. . Sequences which are predicted between five to nine TM helices were alone considered for the current analysis.

The conservation of each residue in the set of aligned sequences was noted as ‘consensus’ and documented if the percentage conservation at a position is from 60 to 100%. The . indigo (I). Detection of Motifs and Amino Acid Substitution (AAS) in the Cross-Genome Alignment The in-built cross-genome GPCR/OR cluster alignments were used as an input (test set) to our in-house program (MotifS program-by Sowdhamini. 4. green (G). polar positive (+).143 4. unpublished results) to identify residue conservation and substitutions in each position of the alignment. orange (O) and red (R) colors namely “VIBGYOR colouring scheme”. yellow (Y).5. the substituting AA residues were also identified and the properties for the AAS (like hydrophobic ( @). The significant observation of preserved motifs and AAS in the helices and loop regions of the cross-genome GPCR cluster dataset was published recently (refer Chapter 5) and the same principle was implemented in the package and also applied to detect “motifs” in OR cluster dataset and the same is applicable for user submitted queries and the obtained “consensus” from this program has been displayed along with MSA .4.4. Mapping of Identified Motifs on TM-helices and Loops in MSA The discovered motifs by the in-house program (Motifs written by Sowdhamini). aromatic (*). Here. The predicted seven TM helices are displayed in seven varying colours. Once conserved amino acid patterns were recorded. conservation simply refers to an average of all possible pairwise sequences and the score is consulted from a normalized AA exchange matrix. blue (B). polar negative (-) and polar uncharged ($) are denoted by given symbolic representation.4. was mapped on predicted membrane topology (Methodology) of multiple sequence alignment (MSA) both for in-built cluster alignment and user-submitted query. A motif is defined by at least three consecutive conserved AAs with high amino acid conservation (default is set for 60% conservation). such as violet (V).

7. BLAST version 2.6. The results are displayed in a new window according to the specifications by the user in setting parameters (refer parameters in TM-MOTIF package for more detail). Motifs observed in the loop regions are highlighted in “grey” colour. the over/under predicted topologies are given in pale cream colour for the whole sequence.3). considerable amount of false predictions occur due to “false merge” and false-split” of TM-boundaries and causes over. are shown in self-highlighted darker shades of VIBGYOR colouring scheme using TM-MOTIF package. Here.and under prediction for TM helices. by performing BLAST search using the profile alignment option of CLUSTALW. The candidate receptors predicted for 7±2 only considered for the study. consensus approach of prediction methods could be useful for improving the accuracy of predictions for membrane topology.4. 4. and co-aligned with the related GPCR/OR cluster dataset and made available for TM-MOTIF display.2 and 4. Identification of Homologues Sequences for user Submitted Queries by Performing BLAST By using “Run-Blast” in TM-MOTIF package.18 was incorporated to the package (option “Run -BLAST” and Figure 4.144 identified motifs. usersubmitted query is aligned with the nearest homologues sequences. user could collect the homologues sequences from the inbuilt data set. 4. In such cases. In order to avoid the distracted display of VIBGYOR colouring scheme. And inevitably.4. with its prealigned set of sequences in the cluster. the full length of such sequences are aligned with other candidate receptors with considerable indels in the MSA. if within a predicted TM helix.1. Sequences with over/under predicted TM-helices are denoted in a “ pale cream” colour since. The default threshold is set for the sequence identity of 60% to recognize hits. Pairwise Alignment in TM-MOTIF TM-MOTIF provides an option to select anyone of the listed reference sequences (whose structure is solved experimentally) and to align .

5 4.3). Label 7 refers for the display options like “Run TM”.145 user’s sequence of interest to obtain a pairwise alignment in the proposed VIBGYOR colouring scheme and aligned by CLUSTALW (option “Align with reference sequence” also Figure 4. common turkey β-1 AR. . Figure 4. Label 3 denotes parameter settings to set threshold for consensus and %id in BLAST. Label 4 provides input window to submit FASTA sequence or MSA. receptor type of the in-built dataset. Label 6 refers option “compare with reference sequence”.1. Label 2 directs for select organism combinations. Japanese flying squid rhodopsin. describing available input and output options (display options).3 Snapshot for the available main menu of the front window of TM-MOTIF with user interactive features Note: Front window of TM-MOTIF displays the main menu with user interactive features: Note : Label 1 refers cluster number. “Run TM-Motif”.2 and 4. There are seven reference sequences included in TM-MOTIF (bovine rhodopsin.3). human dopamine D3 receptor and human CXCR4 chemokine receptor). RESULTS Software Input and Output Options The main menu of the front window of the TM-MOTIF package. human adenosine receptor A2A. “Run Motif”. human β-2 Adrenergic receptor. Label 5 provides option “Run BLAST”.5. whose crystal structures were solved experimentally and are available for user to select as relevant reference sequence or template 4. parameter setting and choice of available organisms are provided as a snapshot (Figure 4.

1 Output Options Display of predicted 7 TM.5) This facilitates the user to view the transmembrane proteins in the large sequence alignments. The Cluster number. 4.4). Input Options The user can submit sequence in FASTA format or MSA by using the available text box given as Submit your FASTA sequence(s) or Submit your Input MSA (Figure 4.5.3.helices in VIBGYOR colouring scheme: (by using “Run TM” option) The predicted TM helices (1-7) are highlighted in seven different colours (VIBGYOR colouring scheme). with residue conservation as consensus. Figure 4. also to keep track of record on current location of membrane topology. 4.2. as mentioned in methodology (Figure 4.5.4 Options given for the submission of input sequences in TMMOTIF package Note : A snapshot showing the given text boxes for the submission of multiple sequence in FASTA and . There are also choices given to select organism of interest by the user: GPCR or OR cluster dataset.3. .5.aln format Also related option “Display Alignment with TM-motif ” is selected. receptor type and organisms are mentioned in main menu of the package.146 4.

2 Display of Identified Motifs and AAS in MSA: (by using “Run Motif” option) If the user is interested to record only the identified motifs in the selected cluster or user-provided MSA. Besides.Identity Figure 4. observed motifs are highlighted in “grey colour” on MSA (Figure 4. The conserved amino acid residues are displayed below the alignment as “text” and referred as “consensus”.6). and property of amino acid substitutions on MSA. percentage residue conservation. Here. . mouse-over option is also provided on each position of the MSA to guide the user to document the observations like alignment position. it is possible to display only the identified motifs and AAS in the MSA (using “Run motif” option).3. substitution (AAS).147 Consensus Mouse-over option VIBGYOR colouring scheme Average Seq. 4. Average sequence percentage identity of this particular cluster (human OR-Cluster 6) is 44% .5 Sample output for the option “RUN –TM” Note : Snapshot showing TM-MOTIF for the display of “RUN –TM” for predicted seven TMhelices in VIBGYOR colouring scheme.5.ICL2 topology of human OR cluster 6 and observed substitution is by another polar positive residue. A mouse –over option at the position 140 is given to denote the conservation of arginine (R) in the MAYDRY VAIC motif in the TM3.

6 Sample output for the option “RUN –MOTIF” Note : Snapshot showing TM-MOTIF for the display of “RUN –MOTIF” for the identified motifs in MSA . This is probably the most crucial output delivered by the package since such an annotated alignment is biologically meaningful. respective topology in onego for the effective visual inspection and corresponding output files for documentation are also generated at each level of performance (output files for more details).3. and the embedded motifs in the alignment are denoted with the self-highlighting colours of VIBGYOR colouring scheme (Figure 4. The conserved DRY motif is shown for the cross-genome human-Drosophila GPCR cluster at the alignment position 204.7 A and B).3 Display of Detected Motifs on TM-helices: (by using “Run TMMotif” option) It is also possible to display identified motifs and predicted membrane topology simultaneously on user-selected cluster or user-submitted MSA. All the three display options facilitate to display the detected motifs on not only in the TM helices but also in the intracellular and extra cellular loops and is useful to understand sequence properties at predicted loop . Display of “consensus” and navigating mouse–over option at each position of the alignment are also available to facilitate the observation on motif (level of conservation) AAS. As discussed by using “Run TM” option.5. 4.148 Figure 4. the predicted membrane topology are d isplayed in VIBGYOR colouring scheme.

7. Notably the conserved motif DRY motif is observed in the TM3. Here 4.149 regions for comparative sequence analysis and to generate loop libraries. or eliminating possible outliers so to improve alignment quality and conservation.B denotes the display of rest of the three helices and the large alignments can be visualized by using the scroll bars given in the right-hand side of the display window. And this situation could be overcome by the re-edition/re-aligning the respective sequences.7 Sample output for the option “RUN – TM-Motif” Note : Snapshot showing TM-MOTIF for the display of “RUN –TM-MOTIF” where identified motifs are mapped on predicted TM-helices in MSA.7. .ICL2 region of the topology and the average sequence identity of the human GPCR cluster 1 is 30%.9) in the alignment provides a caution to the user about the false prediction and could be rectified by the consensus of prediction methods. A Scroll bar in the Alignment viewer B Figure 4. Also the display of over/under predicted TM-helices (Figure 4.A denotes the display for the four helices and 4.

Notably.4 Alignment with Reference Sequence User-submitted sequence can be aligned with any one of the selected reference sequences by using the option “Select reference Sequence” (Figure 4.3.150 4. Odr-10 (a characterized olfactory receptor from C. this illustration can be viewed as a practical application of TM-MOTIF package in guiding effective homology modelling. elegans) was guided by this option of the package for pairwise alignment with bovine rhodopsin as an appropriate template. Predicted topology are highlighted in VIBGYOR colouring scheme with highlighted motifs as consensus with navigating mouse – over options to display position . A case study on a selected sequence. This particular option helps user to prepare a pairwise alignment with their sequence of interest. Q965V1_HUM is aligned with bovine rhodopsin sequence (1F88) further facilitate homology modelling). . Figure 4.conservation and AAS of the alignment. which can enable the generation of a good quality 3D-model (Figure 4.8).8 Snapshot for the display of pairwise alignment of user’s input sequence with selected reference sequence Note: Mouse –over option at the position 135 shows the conservation of arginine (R) and a type is polar positive from the classical E/DRY motif.3).5. The user sequence namely.

“Run– BLAST” option is highly useful. 4. because more than 10% of sequences in the cluster show over/under prediction” in the NOTE.9 Snapshot Depicts the Display of Over Predicted TM-Helices The snapshot showing two GPCR sequences are under predicted for the seven helices in the selected OR cluster alignment and hence indicated in “pale cream colour”.8) for potential functional values.3. care has taken to present the over and under predicted GPCR sequence for seven helices and are displayed in the “pale cream colour” so as to understand clearly about the number of helices predicted in each sequence of GPCR cluster alignments. This particular display of TM-MOTIF guides the quality of the alignment and the number of predicted TM-helices in the cluster dataset along with the average sequence identity. Respective sequence identity of the cluster is given and a message saying “Consensus approach for membrane prediction is advisable. .5.5. Figure 4. The cluster from which maximum number of hits was obtained by BLAST search for the given query against the in-built GPCR and OR clusters are chosen for result alignment and for deciphering receptor type and functional relevance (Figure 4.6 Display of Over predicted helices Apart from displaying the proteins predicted for seven helices.151 4.5 Identifying closest homologues of user sequence in selected organisms When user is interested to search the homologues sequence for their sequence of interest from the available GPCR and OR cluster dataset.3). This facilitates correlation of sequence properties (such as presence of motifs or sequence identities) and structural properties (like secondary structural topology) with their associated (Figure 4.3.

4) by using “Select Organism Combination” user can select any one or two organisms from the available dataset which includes H. .3 and 4. D.txt : This output file provides a list of all the consensus residues and alignment position that satisfies the user’s specification of threshold values including the percentage of conservation and percentage of substitutions according to amino acid type. Melanogaster and C. Options like “Select GPCR Cluster” and “OR-subclusters” (also refer Methodology) are also helpful to the user to select their interested cluster type for intra.152 4. the default threshold is fixed as 60% (Figure 4. it is mainly dependent on the number of sequences. three output files would also be generated:  Zconsensus. Clusters are noted with respective cluster number and receptor types in the main menu of the front window.and inter-genomic cluster alignments. elegans to display cluster–specific or cross-genome cluster alignments. However.6.6. while identifying the conserved residues in the aligned set of sequences. sapiens. In principle. length of each candidate sequence and an overall average sequence identity of that particular cluster dataset which varies cluster to cluster in the dataset 4. It is always preferable to select two organisms that are not too distantly related while performing cross-genome alignments. DEFAULT PARAMETERS User is given opportunity to set the threshold from 30 % to 100% for recognizing the “consensus” and % identity in BLAST (Methodology). M.Output Files Along with the discussed three display-output options discussed above. User can select any one of the reference sequence in the given option “Select reference Sequence” or use “Run–BLAST” for pairwise alignment or MSA. Musculus.1 TM-MOTIF.

153   Zpattern. This file is the primary source for the user-submitted query to provide either alignment options or display options. TM-MOTIF could evolve into a specialized alignment viewer for transmembrane helix-rich proteins with added features such as a graphical display to .txt: This file generates the list of all conserved amino acid positions and substitutions observed at the sites. the number of hits observed in each in-built cluster can be obtained from the file 4. BioPerl.  Zuser.  Zblast_sorted.  TM-MOTIF package could be extended to other genomes and membrane-bound helical proteins like ion-channels and transporters in future. pir format.The package can be executed in LINUX OS and requires the following backhand programs to be installed prior to use: PerlTk.aln format.pir: This is the result file for the user-submitted sequence/MSA. FORTRAN compiler and standalone versions of CLUSTAL W and BLAST 2.txt: For the user option of “RUN-BLAST”. aligned by CLUSTAL W alignment procedure and given in .aln: This is the result file for the user-submitted sequence/MSA. Zmotif.txt: This output file generates a list of motifs with substitutions discovered in the alignment along with their start-and-end positions.7.  Zuser. This file is the primary source for the detection of motifs and AAS by for the MOTIFS program. aligned by CLUSTALW alignment procedure and given in. CAVEAT AND FUTURE DEVELOPMENT This tool has been coded in Perl language (using Tk module for GUIs).

Selection of “combination of organisms” guides the user to understand the sequence properties at cross-genome level and the package is highly suited to perform comparative genomics studies.10 and in Chapter 6).  The TM-MOTIF package is available for open access to users for academic purposes and is integrated with DOR (Database of Olfactory receptors) and can be downloaded (Figure 4.   . 4.8.154 provide a 2D cartoon representation of the helix topology embedded with identified motifs.res. CONCLUSIONS  TM-MOTIF. user could generate TMdomains and loop libraries in turn will be useful for the AA composition in the TM-domains and loop library for the applications in the homology modelling. an aid could be included to edit sequences only for the TM-helices or loop regions. AVAILABILITY The TM-MOTIF package is integrated with DOR (Database of Olfactory receptors) and can be downloadable from the URL http://caps.in/DOR.9.helices and loop regions in MSA.  Seperately.ncbs. The VIBGYOR colouring scheme for TM-helices helps the user to track record of the current location of membrane topology and appreciate relative positions of motifs in the large sequence alignment. 4. So. a software package and an alignment viewer. helps to map discovered motifs on predicted TM.

strength of conservation. amino acid conservation (motifs). . which in turn can help to connect with functions at intra.  The package is very efficient to analyse sequence properties at intra. genome of interest.155  The provided mouse-over option assists the user to obtain knowledge on position. In essence. the package is highly suited for cross–genome sequence analysis. observed motifs.genome level to identify “receptor-specific motifs” (otherwise cluster-specific motifs). phylogenetic analysis and evolutionary lineage. Such sequence analysis has vast applications in comparative genomics. amino acid substitutions (AAS) in MSA for better understanding on the sequence properties.  Besides. The resulted effective alignment displays enable user to choose the best sequence /template in least time and to perform homology modelling and cross-genome sequence comparisons. to identify receptor-specific features and evolutionarily conserved motifs across genomes of various datasize. the package is handy to forward the results to map hot spot residues on the structure. effective alignment visualization with more information like membrane topology. analysis of residue conservation could depend on the critical parameters such as average sequence length. the current interest in developing a handy computational package is to perform critical analysis to understand sequence properties focused mainly on conservation and substitution at sequence level. percentage identity. Overall.and inter-genomic level in broader scope. common motifs occurring in more than one genome (at cross-genome level).and inter. possible substitution at cross-genome sequence alignments provide to nourish knowledge on evolutionary consistent or distant at sequence level observations. So. clustering techniques.

or cell growth and differentiation and abnormal function cause diseases (Wettschureck and Offermanns 2005).156 CHAPTER 5 ANALYSIS ON CONSERVED MOTIFS AND PERMITTED AMINO ACID EXCHANGES IN CROSS-GENOME GPCR CLUSTERS 5. As described in Chapter 1. The key features of membrane proteins are helpful in comparing the predicted helix boundaries (TM-domain). extracellular loops and flanking N’ and C’ termini. oligomerization and cause diseases. loop lengths. regulation of cell contraction and migration. modulation of synaptic transmission.2 OBJECTIVES The characteristic structural features of GPCR are the retention of seven α helices with three intracellular. 5. These cell surface proteins are popular drug targets in pharmaceuticals. sequence features such as conserved motifs and substituting amino acids and its physicochemical properties in the set of aligned homologues sequences or . hormone release and actions. GPCRs are diverse and play vital role in several physiological functions such as perception of sensory information.1 INTRODUCTION Membrane proteins are biologically most significant and participate in various cellular activities such as signal transduction.

i.3 RESIDUE CONSERVATION IN CROSS-GENOME SEQUENCES Number of earlier studies (Leonov and Arkin 2005) emphasized the role of conserved residues in the predicted helices and in the loop regions of transmembrane proteins for the structural integrity and functional implications.. particularly clusters of GPCRs belonging to two genomes. then the alignment can be referred as cross-genome GPCR alignment. provide knowledge on the extent of residue conservation and diversity existing in the cross-genome sequence level (Bjarnadottir et al 2006). It also provides a handle on the physico-chemical properties of the substituting amino acids at intra-and inter-genomic GPCR clusters advocate . reasons for species-specific features and functional importance. along with amino acid substitutions (AAS) in the set of aligned homologues sequences.e. association of homologues GPCRs produces GPCR-clusters and when GPCRs are dealt with from more than one genome.157 phylogenetically associated sequences (clusters) at intra. provide knowledge in understanding the preserved trends in evolution. Principally. By using phylogenetic procedure. 5. when GPCR sequences of more than one genome are aligned together. The conserved amino acid patterns. These cross-genome GPCR cluster alignments are most interesting in studying the residue conservation imprints (motifs) preserved over evolutionary lineages. they are referred as crossgenome GPCR-clusters. along with AAS. in this chapter.and inter-genomic levels. These kinds of exercises in identifying conserved motifs. So. the current study is aimed to identify conserved motifs. motifs present in the helices and in the loop regions.

1: Pictorial representation of “DRY” motif in TM3/ICL2 of the membrane topology. yellow (Y). Figure 5.1) is important for GPCR. play crucial role in retaining critical and characteristic function for various receptor types. FF motif in mediating transport (Nufer et al 2002). 5.1 Pictorial representation to denote the occurrence of highly conserved “DRY motif ” in TM3. the evolutionarily conserved “DRY” motif present in the TM3-ICL2 (Figure 5. serine and threonine residues in conserved patterns such as SxxSSxxT and SxxxSSxxT for oligomerization clearly emphasize the role of residue conservation for various GPCR-functions (Dawson et al 2002 ). and mutation on this residue pattern leads to various functional consequences (Rompler et al 2006).158 about the evolutionary conservation or deviations that exist within or across the genome. indigo (I). .ICL2 Figure 5.function in rhodopsin-like receptors (Class A type GPCRs). observed in cross-genome GPCR/OR clusters. orange (O) and red (R) colours respectively. For example. green (G). Blue (B). TM1 to TM7 were given in violet (V).4 IMPACT OF AMINO ACID CONSERVATION AND TYPES OF SUBSTITUTIONS The conserved motifs. Also the role of conserved 6-8 residues in SNARE protein assembly and function (Laage et al 2000).

. in causing a neurodegenerative disorder called retinitis pigmentosa (RP) (Scott Gleim et al 2009). sapiens. a single amino acid mutation in rhodopsin motif [R194/K195]xE. Denotes the available cross-genome GPCR cluster dataset (H. Figure 5. melanogaster). Similarly. Denotes the analysis of motifs and substituting amino acids in respective membrane topology across selected genomes. Such examples demonstrate the need of identifying the motifs in transmembrane proteins and the observed amino acid substitutions. Step 2. 5. a single hydrophobic to hydrophobic substitution in the transmembrane domain impairs aspartate receptor functions (Jeffery and Koshland 1994).159 The current study is equally focused in observing AAS occurred in the conserved motifs for explaining the impact of amino acid substitutions in functional diversity.2 Flow-chart describes about the steps involved in the study Note : Step 1. abnormalities and mutagenesis.2) summarizes stepwise procedure for identifying conserved amino acids (motifs) and substituting residues at each position of the MSA.5 METHODS The flowchart (Figure 5. Alignment procedure (by PRALINE TM ). For instance. Step 3. D. Step 4. Denotes the prediction of membrane topology by HMMTOP and given in left hand-side and discovering motifs and property of replacing aminoacids (By using MotifS program given in right-hand-side.

biogenic amine receptors. was used in analyzing conserved motifs and documenting the proposition and property of substituting amino acids in all 32 clusters (step 1 in Figure 5. human-C. Alignment tools play a crucial role in understanding sequence features even at remote homology. derived from a previous lab publication (Metpally and Sowdhamini 2005).5. cell adhesion receptors. crossgenome GPCR clusters (such as human-Drosophila. The 32 clusters fall into eight major receptor types. glutamate receptors and frizzled /smoothened receptors.2). Such classifications were useful to analyse conserved key motifs and amino acid substitutions (AAS)) along with the observed physico-chemical properties and to report cluster-specific or receptor-specific motifs at cross-genome level. 2005). human-Drosophila GPCR cluster alignments and MAFFT (Katoh et al 2002) was used to align the human-C. were used for the current study. selecting an appropriate alignment tool helps in improving the alignment quality and to analyze sequence properties at each position in the alignment critically. chemokine receptors. CLUSTAL W (Thompson 1994) was used to deal with human-GPCR.5. nucleotide and lipid receptors. . such as peptide receptors. As discussed earlier ( Chapter 2). a cross-genome GPCR cluster dataset (human-Drosophila GPCR cluster) of 32 clusters. 5. Also intra-genome cluster (human-GPCR cluster).1 Cross-genome GPCR cluster dataset For the current study. In the current study. associated by RPS-BLAST clustering (Chapter 2). Thus. elegans GPCR clusters (step (2) in Figure 5. Cluster 26 retains Drosophila-only GPCR clusters (Metpally and Sowdhamini.2).2 Alignment Procedure The phylogenetically established GPCR cluster association enabled the assembly of the set of homologues sequences from human and Drosophila genome. elegans GPCR clusters).160 5. secretin receptors.

5. The MotifS program uses “Birkbeck matrix” scoring scheme (that employs structurebased sequence alignment of several homologues protein families) for recording permitted AAS after normalization for the inherent frequency of occurrence of different amino acids. is used to document the identified motifs along with the AAS. were provided as input to an in-house program (MotifS. The conservation of each residue in the set of aligned sequences was noted as consensus and was documented from 60100%.4 Program to Detect Motifs and AAS The aligned set of sequences of cross-genome GPCRs. The properties of substituting amino acid residue were denoted by symbolic representation. polar negative and polar uncharged property of amino acid residue.5. One of the result files. polar positive. namely “file. 5. The symbolic representation for denoting AAS at each position in MSA helps to understand the composition of amino acid conservation and replacement. Incorporating the knowledge on predicted membrane topology and the identified motifs and AAS for each sequence in MSA helps us to perform easy analyses at cross-genome level (step (3. The motifs program is incorporated to TM-MOTIF package ( Chapter .161 5. Symbols like @. respectively.4) in Figure 5.2. Once motifs were identified. organized as 32 clusters.1 version package (Tusnady and Simon 2001) ( step (3) in Figure 5. -. written by Sowdhamini) to identify motifs (three consecutively conserved AAS with minimum of 60% conservation). the substituting or replacing amino acid in the identified pattern is recorded and has been classified based on its physico-chemical properties. aromatic. $ were used to represent the hydrophobic. +.summary”.2).3 Prediction of membrane topology Each sequence of the cross-genome alignment was examined for the predicted membrane topology by using HMMTOP 2.*.

BGA CMK. BGA PR. motif with AAS and respective symbolic representation for the human-Drosophila GPCR cluster dataset.162 4) and is used for the current study to document the location of identified motifs with respect to membrane topology.6 RESULTS Conservation of residues were identified with AAS patterns for each of the 32 human-Drosophila GPCR clusters and property at alignment positions have been detected to understand the prominent frequency of AAS and the key property of that respective AA in the expected conserved pattern.CMK CMK.CMK. Motifs observed for single receptor types were also documented. identified motif.CMK VMP(TM2) LPL(TM5) YLLNLA(TM2 )1 TASI(TM3)1 LGF(TM5) 1 LYA(TM7)2 Motifs in multi-receptor types 21 22 23 NLA(TM2)3 ADLL(TM2)3 CWLP(TM6)3 PR. Table 5. The results report the observed membrane topology related to motif. 5.N&L.1 Motifs@ observed in the transmembrane helices and loop regions of human and Drosophila GPCR clusters+ Motifs in Single receptor type No 1 2 3 4 5 6 7 8 Motif VGL(TM1)1 GNL(TM1) 1 1 Motifs in two different receptor types No 17 18 19 20 Motif AIA(TM3)2 CIS(TM3) 2 2 Receptor Type PR BGA BGA CMK BGA PR BGA PR Receptor Type PR. BGA PFF(TM6)1 NSC(TM7)1 .N&L PR. Motifs identified for particular receptor type were also documented to denote the cluster-specific/receptor-specific sequence properties.CMK.CMK PR.

BGA Receptor Type BGA CMK PR 27 28 29 30 31 32 33 Motifs in Loop regions* MRTVTN(ICL1) PR 1 Motifs in two different receptor types 12 13 14 15 16 SLA(TM2)2 IYL(TM2) 2 KLRN(ICL1)2 LDR(ICL1)1 DRYLA(ICL2) RYL(ICL2)3 WPFG(ECL1) LCK(ECL1) 1 1 1 BGA.CMK.CMK PR.SEC PR PR. elegans GPCR cluster dataset (Alignment files are available at http://caps.CMK LFL(TM2)2 TLP(TM2) LPF(TM2) 2 2 @ The observed motifs were tabulated along with distribution of various receptor types of human and Drosophila GPCR clusters.zip).N& L PR.in/download/crossgenomeGPCRs/align.BGA CMK. peptide receptors retain 20 motifs and covers nearly 64% of the identified motifs. A total of 33 motifs were identified and 76% of them are within TM helices.ncbs.Drosophila GPCR cluster dataset (Table 5. + Topologies of observed motifs are given within brackets and number of occurrence is denoted in superscript with respect to the number of receptor types.CMK PR. * Motifs corresponding to the classic DRY motif are shown in italics.CMK PR. .N&L PR PR PR. predominantly in TM2 and TM7 in the human. 5.res. Interestingly.163 Table 5. whereas other receptor types like chemokine receptors.7 OCCURRENCE OF MOTIFS FOR SINGLE RECEPTOR TYPE Multiple sequence alignments from 32 GPCR cluster dataset were analyzed for the presence of motifs for human-Drosophila and human-C.1 (Continued) Motifs in Single receptor type No 9 10 11 Motif WLGY(TM7)1 HCC(TM7)1 NPI(TM7) 1 Motifs in two different receptor types No 24 Motif DLL(TM2)4 Receptor Type PR. N&L.CMK.1).

There are nine motifs observed in two different types of receptors in the current study. human-Drosophila and human-C.8 MOTIFS OBSERVED IN HUMAN-DROSOPHILA CROSSGENOME CLUSTERS 5. PFF motif in TM6.3. The ranking of residue conservation in helices and loop regions for human-only.1).8. 18% and 36% of motifs in the cross-genome cluster dataset. The same way. Significant conservation occurs in the human-only and human-Drosophila GPCR clusters at the TM3 region. Further. The current study is not including the N’ and C’ termin i of the sequences and the study is focused only on selected set of sequences for the eight particular receptor types. HCC motif in TM7 are observed exclusively in chemokine receptors. elegans GPCR clusters are given in modellings (Figure 5.a-i). human-Drosophila and human-C. TASI motif in TM3.1 Motifs Observed in Transmembrane Helices Notably. VMP motif in TM2. YLLNLA motif in TM2. the conservation of these motifs can be correlated to the cluster. The overall residue conservation is observed in the helices and the loop regions of human only.164 nucleotide and lipid receptors and biogenic amine receptors contain 52%. elegans GPCR clusters. LGF motif in TM5 and NSC motif in TM7 are observed exclusively in peptide receptors (Table 5. SLA motif in TM2 is observed both in peptide and .or receptor-type specific properties at the sequence level. VGL motif in transmembrane helix 1 (TM1). This could be due to the direct involvement of TM helices in ligand binding in the case of peptide receptors. 5. WLGY motif in TM7 are identified solely in biogenic amine type receptors. GNL motif in TM1.

This emphasizes the utility of cross-genome clustering techniques. The NLA motif in TM2 occurs in three different receptor types like peptide. biogenic amine type receptors. nucleotide. AIA motif in TM3. The CWLP motif in TM6 is identified in peptide.1).1). knowledge on receptor types for inferring the conservation of motifs across different receptor types at the cross-genome level. This motif has been observed for the maximum occurrence in our cluster dataset. lipid and biogenic amine receptors. The other motif DLL is also observed in TM2 helix in few clusters of peptide. The same motif is also observed as ADL in TM2 in few clusters of all these four types of receptors (Table 5. but not in nucleotide and lipid type receptors. eight different motifs were noted (Table 5. chemokine. but also in nucleotide and lipid type receptors. In a similar manner. but also with reference to their topology.1) among them. The significant occurrence of motifs in multi receptor type also tabulated (Table 5. Interestingly. LPL motif in TM5 and LYA in TM7 and explains the sequence conservation across two different receptor types and provide clues to connect common sequence properties (Table 5. except peptide-type receptors. chemokine and nucleotide and lipid type receptors. chemokine.2 Motifs Observed in Loop Regions While observing motifs in the loop regions.8. The well-known E/DRY motif in ICL2 has the . In a broader sense. 5. such as LFL. IYL motif in TM2 and CIS motif in TM3 are observed not only in chemokine type receptors.165 biogenic amine receptors. this significant conservation of motifs in TM2 explains the conservation of motifs not only with reference to the amino acid residues. TLP and LPF motifs in TM2. peptide and chemokine type receptors retain prominent conservation of motifs.1) and as ADLL motif in TM2 is observed in all three types of receptors.

there are 133 cluster-specific motifs observed in transmembrane helices and 62 cluster-specific motifs observed in the loop regions. whereas MRTVTN in ICL1 and LDR motif in ICL2 were conserved exclusively in peptide type receptors. However. ADLL observed across two taxa. KLRN motif is observed in biogenic amine receptors (Cluster 21) and in secretin receptors (Cluster 26) in ICL1. However. N&L. the current study reports the presence of eight function-specific motifs in ECL2. CLP motif from PR (Cluster 7) has AAS in the pattern as [C/P][L/F][P/C/S]. In the current study. Notably. BGAR. For example. The average sequence length of each of the TM-helices and loops were calculated from set of sequences based on the HMMTOP boundary predictions. Cluster 26 has a set of homologues sequences from Drosophila only GPCR clusters. distributed in PR. Interestingly. Cluster 21 has GPCR sequences from both human and Drosophila genomes and one could notice the common motifs such as GNL.166 conservation as DRYLA in peptide (Cluster 3) and chemokine type receptors (Cluster 12) and RYL in nucleotide and lipid type receptors (Cluster 15). This particular cluster can be a best illustration to emphasize the need of crossgenome phylogenetic analysis at sequence level even at distant relationships and during strong evolutionary drifts. several motifs were identified in only one of the 32 cluster of receptors (Cluster/receptor specific –motifs). WPFG and LCK motifs were found exclusively in ECL2 of peptide type receptors. ASG motif in ICL1 is conserved exclusively in glutamate receptors. Since the conservation of amino acids in the ECL2 is crucial for the participation of ligand binding. GLU. . Notably. FRZ/SMT receptors. The average percentage of residue conservation in each TM helix and loop region were examined for the eight types of receptors.

C. Generally. elegans GPCRs varies in each cluster. SEC. This study can also be elaborated further by adding closely related candidate GPCRs (from other nematode species) into the respective clusters and to improve data size and to observe residue conservation at various percentages so to define sequence features to support vector machines such as SVM. the motifs are limited and are documented at the 30% conservation (due to evolutionary lineage) . In most of the clusters. respectively. percentage residue conservation in ICL2 is higher than the other loop regions. AA conservation is high at TM2 for BGAR. Significant conservation of 55%. elegans GPCR clusters.50% of conservation at TM2. elegans GPCR CROSSGENOME CLUSTERS Since the selected human-C. the maximum amino acid conservation occurs as 42% and 46 % in TM2 and TM3. elegansonly GPCRs for eight major receptor types.9 MOTIFS OBSERVED IN HUMAN. as expected. 61% occurs in TM1.167 Interestingly. it retains only 30. TM6 and TM7. 5. TM2. the pilot study can be studies delicately to C. Although the occurrence of motifs is high in PR. . and FRZ type receptors. elegans GPCRs possess remote homology. GLUR. 295 motifs were observed in the human and C. TM3 within CMK receptors. Since the number of human and C. 80%.

intracellular loop.and inter-genomic levels (for 60% of conservation) provide information about the optimal residue conservation and also provide preliminary knowledge about level of sequence conservation. e. The reported motifs at intra. human-Drosophila GPCR clusters (shown in panels d.3 (a-i) Bar diagram showing the percentage residue conservation in TM region. i ) respectively. human-C. h.168 Figure 5. elegans GPCR clusters (shown in panels g.a-c). I am providing few examples from the cross-genome GPCR alignments (Figure 5.3 Percentage residue conservation in TM helices and loops in GPCR Clusters Figure 5.4. . b. c). f). In the interest of highlighting the observed classical motifs at cross-genome level. extracellular loop of human GPCR clusters (shown in panels a.

13) . Arginine is conserved comparatively well and the substitutions are of polar uncharged ($) or positively charged residues (+) of the same kind (example in Biogeneic amine receptors in cluster 24).a) in peptide receptors. However. a weak conservation of tyrosine is observed when compared to aspartate and arginine (Figure 5.169 5. but in Drosophila. NPIIY and NPLIYA motif in the peptide and chemokine type receptors. in the cross-genome GPCR alignments.10.4. there is high degree of conservation and the substituting AA also mostly belong to aromatic group (example in Chemokine receptors in Cluster 12. . for example.a) where predominately troptophan (W) is conserved in most of the clusters. KLR / RLAR motif.4. the preservation of characteristic DRY motif was observed in our current study (Model 5.10. human GPCRs. 5. some of the weak conservations in other cluster types are not recorded.2 Identified KLK/R and RLAR/K motif in Secretin Receptor Another highly conserved motif.1 Conserved D/ERY and NPXXY motifs in GPCR Clusters As cited in many literature evidences (Rovati et al 2007). The characteristic NPXXY motif in the C’ terminal of the GPCR sequences in MSA could be recorded. the highly conserved characteristic E/DRY motif located at the boundary between transmembrane domain (TM) III and intracellular loop (ICL) 2 of Family A GPCRs play a pivotal role in regulating GPCR conformational states. The importance of DRY motifs in connection with active MG4R in humans is well-known (Yamano et al 2004). Notably.10 CHARACTERISTIC MOTIFS FROM CROSS-GENOME GPCR CLUSTERS 5. due to the cut-off for the percentage level of conservation threshold of 60%.4. In particular.b). is seen within the third endoloop of the family B human secretin receptor (Figure 5.

170 Block deletion of KLRT and mutation of Lys323 (K323I) is known to reduce cAMP accumulation, and these mutations do not affect ligand interaction. Thus, the KLRT region at the N-terminus of the third intracellular loop, particularly Lys323, is important for G-protein coupling. Also, it is noticed that for the RLAR motif, substitutions from Arg (R330) to Ala (342A), Glu (342E), or Ile (342I) as well as to block deletion of the RLAR motif were all found to be defective in both secretin-binding and cAMP production. KLK/R and RLAR/K pattern is seen to be conserved in two proteins, GLR and GLP1, which belong to the secretin family noted in Cluster 25 of our GPCR cluster dataset. Alhough due to the strict conservation threshold of 60% level, some motifs are not recorded, due to the biological significance, this occurrence is highlighted and given in Figure 5.4.b. 5.10.3 Conserved PMNYM / PMSYM motif in BGA Receptor The PMNYM / PMSYM pattern is conserved in TM5 of GPCRs. TM5 has been suggested to be implicated in self-association and may be involved in the dimerization of the receptor A2aR (Human adenosine receptor). In adenosine A2b receptor, asparagine (N) residue is replaced by serine (S) generating the motif PMSYM, thus differentiating the two isoforms of receptors functionally. It is suggested that the motif PMNYM of A2aR and PMSYM of A2bR may be involved in TM assembly of the two isoforms of the receptors, respectively. The information may provide an insight into the molecular mechanism of receptor-ligand interaction leading to design of tailored compounds. Notably, the consensus was not achieved at 60% threshold, and the PMNYM pattern is not documented in the result file. However, a careful observation of the alignment, helps us to identify the important PMNYM/PMSYM in GPCR cluster 23 (Figure 5.4.c).

171

Figure 5.4(a-c) Illustration of characteristic motifs (observed at 60% conservation) Alignments showing conserved E/DRY, KLR/RLAR and

PMNYM/PMSYM motifs in GPCR clusters (noted in the panel a, b, c respectively). 5.11 SUMMARY The current approach for identifying conserved motifs and substituting AA residues are effective in recognizing functionally important residues in GPCR cluster dataset. Along with the well-known characteristic motifs (Figure 5.4.a-c), other preserved motif patterns in the MSA were also identified for their occurrence at 60-100% conservation. The reports display the residue conservation / identity, permitted AAS (based on their respective physiochemical property) at each position and cluster-specific motifs. This current approach can be applied to other

172 membrane-bound receptors (such as olfactory receptors) and protein families to detect the conserved motifs. It will be interesting to map the identified motifs on predicted topology in MSA which may be helpful to perform evolutionary studies at the cross-genome level. Due to remote homology, there are chances of missing the key motifs in the generated MSA in some cases, especially in cross-genome GPCR alignments. The current study (based on the recognition of motifs, derived from average AAS scores) is helpful in recognizing both classical and newer motifs, which have not been hitherto attributed any functional significance. The current approach of analyzing sequence properties in the set of aligned sequences can be applicable to compare with a reference sequence (of known 3D structure) to understand sequence similarity in the predicted topology and preserved motifs with AAS at each position. This method can be used as a guiding principle for 3-D modelling of GPCR sequences. Homology modelling, together with such motif analysis, could uncover additional spatial clusters or ‘spatial motifs’, which may be critical for function. This study can be further extended to comparative genome sequence analysis involving GPCRs from other genomes in future. Also the related supporting tables can be downloadable from the URL http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3163927.

173

CHAPTER 6 GENOME WIDE SURVEY OF OLFACTORY RECEPTORS (ORs) IN SELECTED EUKARYOTIC GENOMES

6.1. 6.1.1.

PHYLOGENETIC STUDY ON SELECTED HUMAN ORS Introduction There have been a number of earlier studies which emphasised the

importance of ORs (Chess et al 1994), and tremendous efforts have been made in updating the knowledge of ORs at multiple levels such as creating data repositories (Crasto et al 2002), understanding receptor specificity, olfactory neural circuit, wiring specificity, olfactory map at different developmental stages (Chou et al 2010). Several recent studies in odor recognization for intelligent systems like e-nose, machine olfaction, mobile robots (e.g. pippi) and their application in food industry and medical diagnosis are highly appreciable. All such sophisticated analyses primarily depend on the initial sequence analysis of these receptors. Thus, performing a genome-wide survey on OR sequences from selected eukaryotic genomes will facilitate to identify the conserved evolutionary trends at intra- and intergenomic levels in order to explore the structural and functional significance. 6.1.2. Objectives and Scopes Objective of my current study is to perform genome-wide survey on ORs and performing phylogenetic analysis on selected eukaryotic

174 organisms such as yeast (S. cerevisae), fly (D. melanogaster), worm (C. elegans), mouse (M. musculus), rat (R. norvegicus), dog (C. familiaris), human (H. sapiens) and few non-human primates. The aim includes retrieval of OR sequences, predicting membrane topology, identifying conserved motifs, orthologs, creating non-redundant data repositories and analyzing phylogenetic clusters at intra-and inter-genomic levels (applicable to certain genomic combinations). The study helps to analyze sequence association as “clusters” in the phylogeny (Metpally and Sowdhamini 2005) and to identify conserved sequence features as cluster/ species specific “motifs” by using TM-MOTIF package (Chapter 4). The obtained preliminary knowledge on sequence information, through genome-wide survey, along with additional features like generated 3D-models, predicted dimer interface (collaboration with other research group) has been integrated to construct a non-redundant data repository

called DOR (Database of Olfactory receptors) (refer Section 6.10). 6.1.3. Olfactory Receptors The process of olfaction can be effectively communicated by the most important receptors, i.e., olfactory receptors, which are G-protein coupled, seven-transmembrane–domain proteins located on the surface of the dendritic cilia of olfactory neurons. And in this section (6.1), I would like to discuss about the availability, predicted membrane topology and phylogeny of selected human olfactory receptors in detail and ORs in other eukaryotes will be discussed in subsequent sections from 6.2-6.9.

175 6.1.4. OR: Membrane Topology As mentioned in the Introduction chapter, there are several prediction methods available online to predict the secondary structure of membrane proteins. The prediction methods are mainly based on the “hydrophobicity” profile of the TM-helices. For the current study, I am using HMMTOP-prediction server (Tusnady and Simon 2001) to predict the membrane topology. However, consensus analysis with more than one prediction method helps to improve the accuracy. Generally, ORs are predicted for the N-out (N-terminal out) topology as canonical GPCRs in higher eukaryotes such as mouse, rat, human and C. elegans (Bargmann 2006, Sengupta et al 1996), whereas the reverse topology (i.e., N-in and C-out topology) has been observed in the Drosophila ORs (most of the insect ORs) (Bargmann 2006, Benton et al 2006), and is also referred as reverse/inverted topology (please see Section 6.4 for details). 6.1.5. Prior Studies on ORs Olfactory receptor genes are generally expressed in bipolar neurons and the dendritic membrane terminates with filamentous process to increase the surface area to capture diverse stimuli from the environment. In general, the morphology of the olfactory receptor cells are common in different taxa (vertebrates, insects and nematode) (Ache and Young 2005), although the overall morphology is conserved, they tend to be adaptive. This happens in a habitat-dependent not a species-dependant manner (Stensmyr et al 2005) and this phenomenon is much helpful in interpreting the trend of evolution in olfaction and cluster associations of diverse taxa.

fish and amphibians (Hayden et al 2010). Mombaerts 1999) and it constituted only 3% of the whole human genome. around 1000 genes were identified and estimated for contributing to olfactory receptor family (Crasto et al 2002. invertebrate species have independently expanded chemosensory GPCRs to perform olfaction (Bargmann 2006. Mombaerts et al 1996. It is generally observed that the humans have reduced olfactory acuity when compared to rodents and non-human primates. But. Rouquier and coworkers (Rouquier et al 2000) identified around 72% of human olfactory receptors and early research indicated the loss of genes during the process of evolution in human olfactory gene family. rat). it was less common in lower primates and was very rare in mouse or zebra fish. Glusman et al 2001). This phenomenon was observed as most common in human and prosimian primates. Buck and Axel reported diverse family of GPCRs in the rat epithelium and their participation in olfaction during 1991 (Buck and Axel 1991). such as the absence of functional ORs in dolphin and the deterioration of vision in moles can be used to understand the mechanism of species requirement on sensory acuity. This clearly shows the loss of receptor function by the transformation of functional genes into pseudogenes. Robertson and Thomas 2006). But. studies in identifying functional ORs in diverse genome have emerged and various molecular and bioinformatics approaches have identified a great number of ORs in vertebrates such as mammals. Since then. Pioneer studies played a major role in identifying and documenting human ORs (Sharon et al 1998. In mammals (mouse. The two extreme examples.176 All organisms recognize vast array of odorants by using ORs (member belongs to class A type GPCRs) by activating the G-protein based cascades by the action of various ligands binding to the receptors. birds. .

prediction of membrane topology.1. Another interesting fact is that. when required additional specifications have been mentioned.1). The genetic description (example : Homo sapiens in this case) of each sequence was verified at this . et al 2001). Studies also report about the occurrence of class I type (fish-like ORs) and class II type (mammalian-like ORs) receptors in human OR family (Zozulya.nih. 22. Besides this. Resultant phylogeny can be further analyzed for cluster association.gov/protein) are the major data resources to retrieve OR sequences from selected eukaryotes. and X carry only one OR gene in humans. 6. Human ORs are distributed predominantly on chromosome 11 and this shows the central role of chromosome 11 in olfaction (Rouquier et al 2000. NCBI protein resource (http://www. Interestingly. Human ORs are documented with subclusters in the phylogenetic tree due to the event of evolutionary divergence and duplications.1. ORs are also distributed on other chromosomes such as 1. 6. average percentage identity and cluster-specific motifs and is followed for the given exercises. alignment procedures.nlm.ncbi. 6. Methodology This sub-section describes a step-wise procedure which includes data collection. Retrieval of OR sequences Data repositories like ORDB (Crasto et al 2002). and 14. there is only one allele expressed for olfactory receptor gene in any given olfactory receptor neuron and the underlying mechanism of the other excluded allele is still unclear (Chess et al 1994). These events cause diversity at inter-genomic level.177 Earlier studies also reported nearly 350-370 full-length functional olfactory genes (Zozulya et al 2001) (Glusman et al 2001) and more than 900 pseudogenes in the human genome. 9.1.6. chromosomes 10.6. Crasto et al 2001). phylogenetic tree construction for uni-genome or cross-genome phylogeny (Figure 6.

Apart from convenience. due to “false merge” and false-split” of TM-boundaries.2.server (Tusnady and Simon 2001) to predict membrane topology. elegans ORs.178 step and genes referring to putative. wherein XLOR for Xenopus levis . it has more significance in adding the chromosomal location of each receptor in the current dataset to each gene symbol. and cause underprediction and overprediction for TM helices. The sequences predicted only for 7±2 TM helices were considered for current study (applicable to other genomes also). CeOR for C. MMOR for mouse ORs and so on. partial and incomplete OR sequences were not considered. Prediction of membrane topology : Human ORs The collected non-redundant OR sequences were submitted to HMMTOP.1. collected sequences were submitted to the CD-HIT server (Huang et al 2010) and sequence with >95% identity were removed to avoid redundancy. consensus approach of prediction could be useful for improving the accuracy of predictions for membrane topology. Since diverse nomenclature has been used for referring OR sequences. the collected OR sequences were referred with their protein ID.. By using HMMTOP. N-terminal region of the sequences were present outside of the cell) and the selected 371 OR sequences were prepared with short description and made ready for the alignment procedure.6. in particular for the sequences from chromosome 11 and 1 which are marked with a suffix as “_chr11” and “_chr 1” respectively. 87% of human OR sequences were predicted for N-out topology (i. Primarily. However. 6. . This labeling is convenient and helpful for legible phylogenetic displays.e. (NCBI record identifier) and are denoted by gene symbol with prefix HS to denote the organism name as Homo sapiens. FOR for fish ORs. Considerable amount of mispredictions do occur. hypothetical.

3.1 Flow-chart for the sequence analysis on olfactory receptors Note: (Figure 6.179 6.6.mas) in MEGA to construct phylogeny. The initial alignment was exported to MEGA 5. This particular step is crucial while aligning ORs from different genomes.4. Phylogeny on selected human olfactory receptors The multiple sequence alignment (MSA) of selected human ORs was used to construct a phylogeny by fixing 1000 bootstrap replicates for . 6.6. MAFFT and MEGA 5. HMMTOP.05 software (Tamura et al 2011) and at this stage. where there could be the problem of remote homology (see section 6.1.1) The given flow-chart depicts the stepwise procedure involved in generating phylogenetic analysis on selected olfactory receptors. Figure 6. The obtained final alignment session could be saved (file.05 respectively. Alignment procedure MAFFT online alignment server (Katoh et al 2002) was used to align OR sequences by using parameters such as JTT 200 for scoring matrix (Jones et al 1992) with gap opening penalty as 1. a careful editing was done to refine the quality of the alignment.53. The pictorial representation denotes the data collection & curation.1.4 with Drosophila ORs and nematode ORs). prediction of membrane topology. alignment procedure and creating phylogeny using various tools like CD-hit.

yellow (HSC6).1.2. Analysis of phylogeny The constructed tree topology was analyzed for cluster association. B) The observed 10 subclusters were denoted in aqua (HSC1).180 neighbor joining (Nj) method.6.2 (A and B) Phylogenetic display of selected human olfactory receptor Figure 6. green (HSC5).mts and the radial and rectangular displays were used for analyzing the tree topology. were designated as “clusters”. where association is based on clades.1 (a) (b) Figure 6. orange (HSC7). Notably.e. . blue (HSC4). red (HSC8). more than 50) in the phylogeny were considered as reliably associated.5. 11(pink) and from other chromosomes green color. Sequences with significant boostrap value (Bs) (i. HSC1 stays distinct in the tree topology and all the ORs related to this subcluster (HSC1) are located in chromosome 11 (noted in black colour circles). Phylogenetically grouped OR sequences. respectively. 0. The generated tree session files were saved with the extension . in tree topology along with the average percentage identity (Kindly read anti-clock wise). 6. indigo (HSC3). violet (HSC2). A) Rectangular display of human OR phylogeny shows the distribution of ORs from chromosome 1 (blue colour). as mentioned in the case study of mouse OR classification (Zhang and Firestein 2002). fuchsia (HSC9) and lime (HSC10) colors..

7.1.181 6.2.1.2A and B). 87% of sequences were predicted for N-out topology (predicted by the HMMTOP server).7. HSC1 cluster stays distinct and interestingly all sequences associated to this particular cluster were originated from chromosome 11 (Figure 6. distinct HSC1 cluster was considered to be related to class I type receptors (also known as fish-like ORs to sense water borne odors) in Homo sapiens and ORs dispersed in other subclusters could be referred as class II type receptors (also known as mammalian-like ORs to sense air-borne odors). 6. Results The performed intra-genomic phylogeny of selected 371 human ORs. 6. Among 10 human OR subclusters. Class I and II type receptors in human OR phylogeny Through prior literature (Zozulya et al 2001). Attempts were made to confirm the HSC1 as fish–like ORs by performing a cross–genome phylogenetic analysis of established human ORs with the few selected OR sequences from various fish genomes (Section 6. The OR sequences (54 in number) observed in the HSC1 showed the average sequence identity as 44% (Figure 6. exhibits 10 different subclusters. HSC9 and HSC10 clusters also exhibit reasonable sequence identity as 52%. Among them.2 B). Sequence features of 10 human OR-subclusters Among the collected 371 human olfactory receptors.2) and the results were as expected. HSC2 showed the highest average sequence identity (54%).1. .7. The cluster associations were labeled as HSC1 to HSC10 (referring to the organism name followed by cluster number).1.

The results showed that the number of residues predicted for TM helices range from 19-23 amino acids and notably the average number of residues were observed as 23. 34 . 15 for intracellular loops (ICL1-ICL3 loops) and 17.182 However. The observed long length of ECL2 could be due to ligand binding properties and long length of ICL2 could be due to the occurrence of conserved motif “MAYDRYVAIC” and its functional importance in structure stability could be probable reasons. Representative OR sequences Among the 10 human OR subclusters. average sequence identity. 23. for the current study.1). In general. average length and number of sequences observed in each cluster (from HSC1-HSC10) vary and influence the cluster-specific properties (like motifs) as are tabulated (Table 6. 21. around 50 OR sequences were selected to represent each cluster and atleast three representative sequences from each clade was selected with significant Bs value.1. 19 for TM1-TM7 helices. Notably. 22. the sequence predicted for N-out (356) and N-in (45) were taken into account for generating phylogeny but. essentially sequences predicted for N-out topology observed with 7±2 predicted TM helices were only considered for analyzing the average number of predicted residues in the helices and in the loop regions.7.11 for extracellular loops (ECL1-ECL3 loops) respectively. These representative sequences would be appropriate candidates ORs to perform modeling and to predict secondary structural features.3. 21. This may further help . 22. 12. TM7 exhibit relatively less number of average residues and among the predicted loop regions ICL2 and ECL2 are longer than the other loop regions. 21. 6.

Motif analysis on human olfactory receptors In the interest of identifying the conserved motifs and substituting amino acids (AAS) in the observed 10 human OR subclusters.183 to connect the structure and functional properties at the sequence level. of sequences. The identified motifs (at 60% level of conservation). average alignment length. The average sequence identity of selected representative sequence with the associated OR sequences were ranging from 40 to 53% which provide significant level of confidence.4. along with respective membrane topology.1 Analysis on sequence features of 10 human OR subclusters Cluster No HSC1 HSC2 HSC3 HSC4 HSC5 HSC6 HSC7 HSC8 HSC9 HSC10 [ No of Sequence 54 40 61 43 35 9 33 24 34 38 Average alignment length 320 317 305 313 315 315 314 307 317 316 Average Sequence identity 44% 54% 50% 53% 49% 44% 46% 49% 52% 52% Note : Table for the observed no. Table 6. 6. Overal1. The residue conservation is documented not . and average sequence identity of the 10 human OR subclusters.1.7. the respective MSA (aligned by MAFFT) of each cluster has been used as an inbuilt dataset to the TM-MOTIF package (Chapter 4). 162 motifs were identified from HSC1-HSC10. were documented by using TM-MOTIF package.

the “PMY” motif (Table 6.2) is observed in TM1.2).2 List of conserved motifs in 10 human OR subclusters (60% level of conservations) . ICL1 topology and will be the best example to denote the sequence conservation retained at all the clusters. Apart from the conserved characteristic motifs such as “MAYDRYVAIC” motif in between TM3 and ICL2 and “NPXXY” motif in TM7. Motifs observed for one particular cluster (cluster-specific motifs) and more than one cluster with respective topology were also reported (Table 6. This particular “PMY” motif is evolutionarily important and advocates knowledge about the passed evolutionary trends from the aquatic (Class I type) to terrestrial habitant (Class II type) (Freitag et al 1995) by occurring at HSC1-HSC10 as it occurs in all clusters from HSC1 to HSC10.184 only to the “consecutive three AA residue conservation” but also with additionally conserved residues at 60% level of conservation. Table 6.

rat (283 GPCRs).2 (Continued) 6. GPCRs from the genomes such as human (351 GPCRs). fly (64 ORs). worm (odr-10 and homologues) and yeast (5 ORs).5.185 Table 6. SVM Analysis For the preliminary analysis to predict putative olfactory receptors by using support vector machine (SVM) techniques. the collected (371) human olfactory receptors were kept as positive dataset and GPCRs as negative dataset. physico-chemical property of residues were highly helpful to define feature to . mouse (331 GPCRs). The collected 371 human OR sequences were used to train SVM along with OR sequences from other genomes such as mouse (338 ORs).7. frog (15 ORs). Here. sequence properties like predicted helices and loop regions. worm (735 GPCRs) and fly (100 GPCRs) were also used in the current study for the non-OR dataset.1.

the analysis was next aimed to align human olfactory receptors with selected olfactory receptors from various fish genomes and to observe the influence of fish ORs in the previously established human OR phylogeny. The accuracy for the training set was obtained as 87. CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND FISH GENOMES 6. Objective As phylogenetic analysis of selected human olfactory receptors showed HSC1 as distinct cluster and is assumed to be related to class I type receptors. 449 proteins were observed for the annotation details as OR positives (data unpublished). As a result. The study is useful to perform survey in human genome with trained SVM. A human proteome database containing 89822 protein sequences was downloaded from the IPI database (http://www. 592 proteins were predicted as sequence properties of ORs by SVM.1). 00%. ac. uk/IPI/). 45 sequences were verified for the UNIPROT Ids. 79% and the testing set accuracy was 86. 6.2.1.63 % and the sensitivity 85. performing cross-genome phylogenetic analysis with selected fish ORs and 371 human ORs will be helpful to identify fish-like ORs in the already established human OR phylogeny (Section 6. in that 33 were reported for the “reviewed” status and 12 for “un reviewed” status in UNIPROT database. A pilot study was carried out with the dataset and the features have been set to identify the putative olfactory receptors in humans (Kandaswamy et al 2010). ebi.186 the SVM. specificity 87. . Out of these 592 gene products.2. Among them. 7154. 55% and MCC 0. Thus. 58 sequences were predicted as putative ORs for which the sequence identity is observed in the range of 60-90% with true positives.

2.187 6. 6. This explains the sense of olfaction evolved from the lower chordate to the higher chordate organism with respect to the environmental requirements. The homologous sequences from diverse fish genomes such as Tetraodon nigroviridis. Freitag et al 1998). Kang and Caprio 1991). Misgurnus anguillicaudatus. it will be interesting to discriminate the class I and II type receptors in human OR phylogeny to study in further details. In particular. The water soluble odorants are recognized by fishes to fit to their aquatic habitat (Friedrich and Korsching 1997. Duchamp-Viret and Duchamp 1997. it has been studied that the class I type receptors may be specialized for the detection of water-soluble odorants. Review of Literatures The class I type receptors are generally associated with sensing water-borne odors (Ngai et al 1993.3. Fish ORs Twenty five OR sequences were collected from the human olfactory receptor dataset and submitted to the online PSI-BLAST (http://blast.nlm. zebrafish (Danio rerio) (Barth et al 1996). whereas terrestrial vertebrates possess class II type receptors to detect volatile compounds. Bozza and Kauer 1998). Ictalurus .gov/) with default parameters against the fish genomes. Tareilus et al 1995. So. The availability of repositories for GPCRs and ORs in fish genomes like pufferfish (Tetraodon nigroviridis). Kashiwayanagi and Kurihara 1995.2.ncbi. whereas class II type receptors recognize volatile compounds (Freitag et al 1998).nih. and frog (Ji et al 2009) also facilitate to perform sequence comparison studies across genomes. apart from retaining few of the class I type receptors (Firestein and Werblin 1989. Danio rerio.2. Lampetra fluviatilis (Freitag et al 1999).

3A and B). The sequence identity (using needleall program (Needleman and Wunsch 1970) between these fish ORs with human ORs varies from 15% to 35% and sequence similarity has a range from 26% to 52% (Table 6. gi 83752926. This association indicates that the evolution of olfaction in higher order organism primarily originated from the aquatic organism in sensing water –borne odors (Class I type) then evolved further to sense air-borne odors (Class II type) to adapt to the terrestrial habitat (Figure 6. 31 OR sequences were predicted for the N-out topology.4.3A and B). Carassius auratus and Oncorhynchus nerka were collected from the first hit and organized in FASTA format. the cocluster arrangements occurred only with the HSC1 cluster and not with other human OR subclusters. 6. gi 83752750 and gi 13177509) of them are neighbor members in the HSCI cluster or class I type receptors. 403 OR sequences (371 OR sequence from human and 32 from fishes) were aligned using the MAFFT alignment server with default parameters and are used to construct boostrap construction tree for the neighbor joining method (Nj) for 1000 replicates. Osmerus mordax. Among the collected 32 ORs. . Notably. Takifugu rubripes. among the 32 fish ORs.2). Oncorhynchus tshawytscha. Totally. four (gi 83752816. which clearly suggest that the HSC1 cluster from human OR phylogeny belongs to class I type receptors (necessary to sense water-borne odors). Notably. Results Cross-genome phylogeny with human ORs and selected fish ORs showed coclusters (Figure 6.188 punctatus.2.

But the cross-genome phylogeny (refer B) on selected human olfactory receptors with fish ORs shows significant coclustering. This clearly indicates the characteristic feature of HSC1 as class I type receptors in sensing water-borne odors. particularly to HSC1 cluster. Alioto and Ngai 2005). Among these residues.5. tyrosine (Y). Sequence conservation: across fish and human ORs So far.2.3 Phylogeny of selected olfactory receptors in Homo sapiens and fish genomes Note : The phylogenetic display of human Olfactory receptors (refer A) shows HSC1 clade (in aqua blue) as distinct. and is common in human.189 A B HSC1 CLASS I type Figure 6. However. . no convincing evidence for class-specific sequence motifs for fish-like receptors (class I type receptors) and mammalian-like (class II type receptors) have yet been obtained. except HSC1. Notably fish ORs were not coclustering with any other cluster. methionine (M). efforts were made in observing few characteristic motifs from the human ORs and observed their conservation at cross-genome alignment. 6. mouse and zebrafish ORs (Zhang and Firestein 2002. and cysteine (C) residues are found to be related to OR-specific functions. Earlier studies support that the characteristic motif “MAYDRYVAIC” is present at TM3 and ICL2.

Lysine (K). Alanine (A) and Threonine (T) residues play major role in OR functions. To discriminate the class I and class II type receptors among human ORs. Performing cross-genome phylogeny with selected amphibian ORs with already established human OR phylogeny will be helpful to understand human OR clusters for class I type and class II type receptors (Section 6. a study has to be conducted with adequate OR sequences from amphibian genome. However.4). The fish ORs are denoted with the prefix “FOR_” and human ORs with “HS”. and that the downstream histidine (H) is recommended for site-directed mutagenesis studies in earlier literature (Figure 6.4 Snapshot of Alignment window for the motif “KAFSTC” in human ORs and in few fish ORs at cross-genome alignment Kindly note the conserved “KAFSTC motif observed in ICL3 of the cross-genome OR alignment on selected human and fish ORs. Also.4).190 Figure 6.3 for more details). the phenylalanine (F) and serine (S) residues are not as common in zebrafish ORs. . The “KAFSTC” motif is conserved in the ICL3 and extended in TM6 of human ORs and this motif is observed in the fishes especially in zebrafish (Figure 6. which is expected to have both class I and class II type receptors (Freitag et al 1998).

191 Table 6. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Fish ORs FOR_13177509 FOR_83752926 FOR_83752816 FOR_83752750 FOR_13177509 FOR_13177509 FOR_83752816 FOR_13177509 FOR_83752926 FOR_83752816 FOR_83752926 FOR_83752750 FOR_83752750 FOR_83752816 FOR_13177509 FOR_83752926 FOR_83752750 FOR_83752750 FOR_83752750 FOR_83752816 FOR_83752926 Human ORs HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A5_C1 HS52I2_C1 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A5_C1 HS56A5_C1 HS56A5_C1 HS52I2_C1 HS52I2_C1 HS52I1_C1 HS52I2_C1 HS56A3_Chr11 HS56A3_Chr11 HS52I1_C1 HS52I1_C1 HS52I1_C1 Sequence identity 15% 16% 17% 17% 30% 30% 31% 31% 31% 31% 31% 32% 32% 32% 32% 32% 33% 33% 34% 35% 35% Sequence similarity 26% 26% 25% 27% 53% 47% 47% 54% 49% 50% 52% 52% 51% 49% 50% 49% 51% 51% 53% 53% 52% 6.3. the current study is aimed to introduce few ORs from frog genome to the already established human OR phylogeny. .2). No.3 CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND AMPHIBIAN GENOME 6. Since amphibians have two classes of receptor types (class I and II). section 6. S. this study will be more helpful in discriminating both class I type (to sense water–borne odors) and class II type receptors (to sense air-borne odors) in the previously established human OR phylogeny.3 Sequence identity of neighboring fish ORs and human class I type receptors observed in cross-genome OR phylogeny.1 Objective As cross-genome phylogenetic analysis on selected human olfactory receptors with selected fish ORs showed HSC1 in human OR phylogeny corresponds to class I type (fish like ORs to sense water-borne odors.

nlm. membrane topology).2 Literature survey on class I and II type ORs Xenopus laevis possesses gene repositories for two distinct classes of olfactory receptors and class I is related to receptors of fish and the other class is similar to receptors of mammals (Freitag et al 1995). Earlier studies reported that OR sequences such as XB107.gov/) searches with default parameters to collect homologues from Xenopus Levis.3. fish-like receptor genes (class I type receptors) are exclusively expressed in the lateral diverticulum of the nasal cavities (Breer 2003).nih. which may contribute to ligand specificity (Freitag et al 1998). 352 and 154 are class II receptors in frog genome (Mezler et al 2001). 177. three sequences were designated as class I type receptors. The anomalous topology and presence of lesser number of TM-helices were striking features in the frog OR dataset . In frogs.192 6.3. 239. 180. 6.ncbi. Studies comparing the structural features of both receptor classes from various species revealed the fact that they differ mainly in their extracellular loop 3. six are denoted as class II type receptors. 350. which reduced to 14 sequences after redundancy filter (like description. Initial results gave rise 28 homologues. whereas mammalian-like receptors are expressed in the sensory neurons of the main diverticulum to sense the air-borne odors/volatile odors. 242 are class I receptors and XB178. Amphibians provide "an unique opportunity to compare olfactory receptors of both classes in one animal species" (Mezler et al 2001). Among them.3 Amphibian ORs Twenty representative OR sequences from human OR phylogeny were collected and submitted to online PSI-BLAST (http://blast. 238.

0 for the neighbor joining method (Nj) for 1000 replicates. Frog ORs were distributed particularly in three human OR subclusters. for sensing water-borne odors.4 Results Cross-genome phylogeny of human ORs with frog ORs showed remarkable coclustering in selected human OR subclusters (Figure 6. HXC2 and HXC3 (Figure 6. This clearly indicates the characteristic feature of HSC1 as class I type receptors in sensing water-borne odors (HXC1) and other clusters (HXC2. For cross-genome phylogeny. HXC3) belong to class II type receptors.193 and cross-genome phylogeny was performed for bootstrap construction tree using MEGA 5. . 6. And the cross-genome phylogeny (refer B) of selected human olfactory receptors with frog ORs (brown colour) exhibit coclustering at three human OR clusters. A B HXC2 (Class II ) type) HSC1 (Class I type) [[ HXC1 (Class I type) Figure 6.5 Snapshot depicts the coclustering of fish ORs and frog ORs in human OR phylogeny Note: The phylogenetic display of human Olfactory receptors (refer A) shows HSC1 clade as distinct and coclustered with ORs from fishes (denoted with an arrow mark and fish ORs in brown colour).3.5). the observed three coclusters of human and frog ORs are labeled as HXC1.5). HX refers to Homo sapiens and Xenopus laevis in crossgenome phylogeny and HXC1 is found to related to class I type receptor and HXC2 and HXC3 with class II type receptors of both genomes. Here. HSC2 and HSC4. namely HSC1.

(Table 6.4 and 6.HXC2 and HXC3 to indicate the class I and II type receptors from frog ORs with human ORs (given in B).6 Snapshot depicts the coclustering of fish ORs with class I type receptors of human ORs in HSC1(given in A). Note: The phylogenetic display of human Olfactory receptors (refer A) shows HSC1 clade as distinct and co-clustered with ORs from fishes (denoted with an arrow mark and fish ORs in brown colour).194 A HXC1 Frog ORs for class I type receptors B HXC3 HXC2 Figure 6. This clearly indicates the characteristic feature of HSC1 as class I type receptors in sensing water-borne odors (HXC1) and other clusters (HXC2. HXC3) belong to class II type receptors.6) . And the cross-genome phylogeny (refer B) of selected human olfactory receptors with frog ORs (brown colour) exhibit coclustering at three human OR clusters. for sensing water-borne odors.also exhibiting the coclusters like HXC1.

5. 56A5. 1617227 and 1617231 co-cluster with HSC1 (noted as HXC1 in cross-genome phylogeny) (Figure 6. 9650880. 51S1. 51A4. 56A1.2 Cocluster HXC2. 51G1. 52l2. 1617249. few sequences (such as gi 9650878. As we know from previous experiments. 51V1) (referred as HSC1 in section 6. 9650888.3.6). 9650884 and 9650892 and are . the established sequence association refers to the class II type receptors both in human and in frog genome. 56B1. 7530156.sf=81321). HSC1 was identified as fish-like ORs in the human OR phylogeny (see Section 6. 6. 51A2. 96050886. The given snapshot shows the cocluster of human and frog ORs i. This further suggests that the HSC1 retains class I type receptors to sense water-borne odors in human. to sense water-borne odors). 56A3. HXC1 (Figure 6. 51A7..6) and as mentioned previously. 1617229.5.3). the seven frog ORs which are annotated as class I type receptors are co-clustered with the human class I type ORs such as HS52l1. A pairwise sequence identity between selected frog ORs and human ORs related to HXC1 range from 18-35% and the sequence similarities range from 32 -57% (Table 6. Sequences belonging to human subcluster (namely HSC2) including human OR sequences like HS2D2 and HS10AD1 cocluster with frog ORs like gi 9650890.3.cgi?model=0037432. 51M1. the mentioned frog ORs which coclustered to this cluster also belongs to “Family A G protein-coupled receptor-like” and are designated as olfactory receptor class I (Xenopus laevis) in the SCOP definition (URL: http://supfam2.1 Cocluster HXC1 Class I type receptors Notably.1).6). 56B4.cs. 51L1.ac. 51G2.e.uk/ SUPERFAMILY/cgi-in/genome.cgi_xl=yes.bris.2.195 6. Interestingly.class II type receptors In the other cocluster HXC2 (Figure 6.

Thus.3 Cocluster HXC3 .class II type receptors The OR sequences labeled as olfactory receptor and class II type receptor (gi 1617247 and gi 9650882) from the frog genome coclustered with human OR subcluster namely HXC3 (Figure 6.6). The observed cross-genome phylogeny with human-fish ORs and human-frog ORs helps to discriminate class I and II type receptors in human OR phylogeny (Figure 6. The distribution of frog ORs in human OR phylogeny denotes the distribution of class II type receptors in the clusters of HSC2-HSC9. Generally. Due to the introduction of frog ORs. but not with HSC1.196 annotated as class II type receptors. ORs in HSC1 are referred to as class I type receptors and stay distinct from other subclusters. the observed sequence identity between frog ORs and associated human ORs in HXC2 range from 33% to 43% and the similarity range from 33-60% (Table 6.6). .5) 6.3.5. Notably. especially in discriminating class I and class II type receptors in the human OR phylogeny. human ORs are abundantly located in chromosome 11. Crossgenome phylogenetic studies were helpful to identify these kinds of clusterspecific features at cross-genome level. considerable cluster rearrangements have been observed in the human OR subclusters.

No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Note: Frog ORs (Class I ) XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 Human ORs (class I) HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 Sequence identity 18% 18% 18% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 20% 20% 20% 20% 20% 20% 20% 20% 22% 23% 23% 24% 24% 26% 26% 26% 29% 29% 30% 30% 30% 31% 32% 33% 34% 34% 34% 34% 34% 34% 34% 34% 35% 35% Sequence similarity 32% 49% 49% 51% 35% 33% 33% 32% 50% 50% 50% 35% 32% 34% 31% 52% 52% 52% 36% 35% 32% 32% 54% 53% 54% 37% 33% 32% 36% 55% 55% 57% 38% 36% 36% 32% 50% 50% 53% 36% 30% 30% 33% 52% 52% 54% 36% 33% 34% ORs from Xenopus Levis are labeled with XLOR as prefix instead of “gi” and human ORs are given in common name with HS as prefix. .4 Sequence identity of neighboring frog ORs and human class I type receptors observed in cross-genome OR phylogeny S.197 Table 6.

6 Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC3) S. No. No 1 2 3 4 5 6 7 8 9 10 Human Ors (class II) HS10AD1 HS10AD1 HS2D2 HS2D2 HS10AD1 HS10AD1 HS10AD1 HS2D2 HS2D2 HS2D2 Frog ORs (class II) Sequence identity Sequence similarity XLOR_9650890 XLOR_9650890 XLOR_9650890 XLOR_9650890 XLOR_9650884 XLOR_9650886 XLOR_9650888 XLOR_9650888 XLOR_9650884 XLOR_9650886 33% 33% 37% 37% 37% 38% 38% 39% 41% 43% 33% 50% 54% 54% 37% 55% 55% 57% 60% 60% Table 6.198 Table 6. 1 2 3 4 Human ORs(class II) XLOR_1617247 XLOR_1617247 XLOR_9650882 XLOR_9650882 Frog ORs (class II) HS11L1_Chr1 HS6Q1_Chr11 HS6Q1_Chr11 HS11L1_Chr1 Sequence identity 26% 28% 36% 41% Sequence similarity 40% 42% 55% 58% .5 Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC2) S.

DOR22A. Dmel22a. 6.1. DmelCG12193. For example. Clyne et al 1999).2 Drosophila ORs As referred in 6. The same way. Or22A. several “gene synonyms” such as 22A. are available to represent candidate OR 22a and is referred as “DOR 22a”for the current study. Dmel Or22a.1. with the availability of complete genome and the databases for Drosophila olfactory receptors (Crosby et al 2007). Collected sixty sequences were predicted for membrane topology and notably. Since many of the ORs have been referred with many gene synonyms. 90% of OR sequences were predicted for the N-in topology with 7 ± 2 TM helices. AN11. OR22a. Drosophila is a favorite model organism.1. insect olfaction is one of the most fascinating areas.1 Background As mentioned in Chapter 1. the other OR sequenc es were also labeled with DOR as prefix. DOR22a. particularly that of Drosophila chemosensory receptors. the same procedure has been followed for the current study with Drosophila ORs.1. .199 6.1 (Methodology).4. CG12193.4. Various earlier studies showed the importance of understanding insect olfaction (Siddiqi 1990.4 PHYLOGENETIC ANALYSIS ON DROSOPHILA OLFACTORY RECEPTORS 6. care has been taken to designate OR sequences. it has motivated to compare Drosophila ORs with other olfactory receptors from various eukaryotic genomes to understand evolution in olfaction and to implement the conserved features.

considerable distance was observed with other pheromone-like receptors. The observed tree topology resembles the earlier studies (Warr et al 2001). Inevitably. DOR 104. Here. both ORs are of antennal receptors. some inserts were retained in the alignment due to the presence of sequences like OR 83b. DmC8 (Figure 6.3 6.b.7 and 6.1 Results on Drosphila OR Phylogeny Analysis Cluster association: 10 subclusters The generated phylogeny for 60 diverse Drosophila olfactory receptors showed 10 subclusters (Figure 6.200 Sequences were submitted to MAFFT alignment procedure with the default parameters (JTT 200 scoring matrix and gap opening penalty as 1. . Notably. interestingly. Also.7 and 6.85e was edited.4. Thus. notably the pheromone–like receptors DOR47b and 65a were observed at DmC5.53). 6. However. the known 24 antennal receptors (Dobritsa et al 2003) were distributed predominately in eight subclusters (except clusters. But. i.3. these long loop lengths were not excised. DOR 88a.8) and the observed cluster association indicates the specific sequence properties among different clusters. 67 b and 45a.8). TM5 of 83a.. the long extracellular loop 2 between the TM4. the obtained tree topology shows 10 different subclusters. DmC4.e.4. In the previous work (Robertson et al 2003). But the current study is varying in the alignment procedure and lacking the gustatory receptors (GR) in the phylogeny. which was observed in the neighboring DmC6. in the current analysis.

As mentioned in the classical publication (Robertson et al 2003).OR 65 a-c and OR 94a-b in DmC5.8).201 R6 DO DO R 9a DO R8 8a DOR 71a 65 3c R8 DO DOR 5c b DOR 46aB aA DOR 46 DO R4 DO R DO R3 DO R 100 D R O 67 DO R 85 f c 3c 33 a R DO 92 69 a a R DO DOR 47b 65a d 67 10 0 DO R 9 DOR 4b 94a R DO 19 a 10 0 D R O 2a 100 DO R 64 100 10 99 DOR 23a DOR 96 89 R DO b 33 R6 DO 7a 100 10 0 0 10 0 99 70 72 64 100 85d DOR 85b DOR DOR 85c 86 7a 97 91 57 DOR 21 43b 1 00 43 33 6 8 25 6 40 50 48 65 DOR 56a 99 100 10 0 66 DOR 43a DOR 30 a DO R4 9b DOR 98 a DO R5 9 DOR 42 bb DOR 22b 2a R2 DO c 59 R a DO 85 R DO DO R 61 100 2 20 6 36 80 DOR 42a 42 9 33 6 64 97 99 65 98 99 98 DO R 10 0 DO R 63 a DO R 83 b 83a 10 4 5 63 8 R1 DO 3a R DO DOR 8 4a R2 DO OR 45b D DO R 47 b 67 DOR 22c DO R9 R DO DOR 45a DO R 2a 0. the closely related receptors like OR 22a.2 Figure. The current study clearly reports the diversity of Drosophila ORs and associations were grouped as 10 DOR subclusters.7: The observed tree topology of Drosophila ORs are denoted in blue and purple colour for the alternative clusters to differentiate cluster association (kindly read tree topology in clockwise direction).7 Phylogeny of Drosophila Olfactory receptors Figure 6. R DO a DOR 10a a 74 35 98 a a b DOR 1a .b and 59b-c in DmC1. These patterns could be illustrative to elaborate the highly conserved sequence association at family level in spite of general sequence diversity (Figure 6.6.OR85 b-d in DmC6 are observed closely in the tree topology as nearly as the same reported by the previous group.OR33a-c in DmC2.

202 The current approach is different from the other earlier studies with the novel features like employing FFT-alignment procedure. The resulted phylogeny may be different from previous results in fine features but showed 10 subclusters which are labeled as Dm (refers to Drosophila melanogaster) followed by cluster number (referred as DmC1-DmC10). Or22a and 35 a from DmC1. In the study. the antennal receptors.104. Nj method and without providing any outgroup (s) to structure the tree. they are distributed in different clusters in the tree-topology. JTT matrix. OR83b is observed in the DmC8 and notably the functional antennal receptors were not present in this cluster. the subclusters observed in this current study show the diverse features at inter-genomic level to discriminate diverse odors as single or mixed odors. OR83b is closer to GRs in phylogeny. re-emphasizing the fact that “the olfactory function has evolved separately several times within the superfamily of proteins” (Robertson et al 2003). this particular association lack high supporting Bs values and alignment procedures also play a major role in placing OR83b in tree topology.67b and can be further examined for the sequence analysis for common motif and ionchannel properties later. This cluster association can be used as an illustrative to explain the diversity of Drosophila ORs. I could observe OR83b with adequate OR sequences and forms a cluster association with other ORs like 83a. Earlier studies suggest (Dunipace et al 2001). Though these receptors show same functional properties and cellular localization. Notably. Separately. but not so closely related to ORs. In general. Or85b from DmC6. Or 35a from DmC9 are related in sensing pentylacetatesensitive receptors (Hallem et al 2004).63a. This may be due to specificity of receptors which are required due to shape and size of ligands but similar chemistry. .

the known 24 antennal receptors (Dobritsa et al 2003) were distributed in eight subclusters except in DmC4. in turn is showing their independent evolution. Cluster associated with OR83b and associated OR  .2005) among the 60 olfactory receptors was only 18%. There is no coclustering observed between these two types of chemosensory receptors.203 Trial phylogenetic study was performed with 60 selected GRs. 6. ORs and the resulted tree showed clear and distinct clusters of ORs and GRs. DmC8. referred as DmC1 to DmC10.6. DmC5 DmC4 DmC6 DmC3 DmC2 DmC7 DmC1 DmC9 DmC10 DmC8 Figure. Notably.8 Observed 10 subclusters of Drosophila olfactory receptors Note: The observed 10 subclusters of Drosophila olfactory receptors were labeled as DmC1 to DmC10 in clockwise direction. and the most related pairs like DOR 64 and 23a showed 100% identity and the isoforms like 22a and 22b showed the next highest identity of 77%. The observed average sequence identity (by using Alistat programEddy S.4.4 SUMMARY  The generated Nj method of Phylogeny on 60 selected Drosophila ORs exhibited 10 OR subclusters. and cluster association was indicated in the green color filled circles and particularly the antennal receptors are given in fushia color.

5. wherein animal olfactory systems are interestingly complex in sensing diverse air-borne odors.2 Insect ORs and mammalian ORs: (Evolutionarily unrelated) Insects ORs are seven transmembrane proteins and are PHYLOGENETIC ANALYSIS ON  SELECTED ORS FROM DROSOPHILA.5 CROSS-GENOME HOMO SAPIENS The objective of the current study is to perform a cross-genome phylogeny on selected ORs from Drosophila.  The pheromone–like receptors DOR47b and 65a were observed at DmC5 and both are antennal receptors. They show reasonable sequence similarity and orthology with other insect species such as Anopheles gambiae. YEAST AND evolutionarily distinct from mammalian ORs. a single member of the insect OR is strongly conserved . However. S.5. Wistrand et al 2006). Drosophila ORs retain reverse topology (Benton et al 2006. insect olfactory sensory neurons (OSNs) and mammalian OSN are anatomically similar.1 Background Olfactory system of Drosophila is simple. As in earlier studies. 6. but insect OSNs differ in possessing the sensilla in the antenna and maxillary palp in their olfactory system (Stocker 1994).63a. 6. Heliothis virescens and other endopterygota (Carey et al 2010). cerevisae and Homo sapiens.204 sequences such as 83a. This illustrates the relevance of localization with the functional expressions. The observed average sequence identity among the 60 olfactory receptors was only 18%. 6.67b (DmC8) can be further examined for the sequence analysis for common motifs and predicted for secondary structures and to observe ion-channel properties.104.

NP_014078. but functions as a chaperoning co-receptor. this establishes a considerable co-cluster arrangement with candidate receptors from yeast. But. lifestyle of fruit flies in sensing specific odors wherein mammals established a complex olfactory system to sense both air–borne and water-borne odors. .1) and used for the current study. Moreover. number of olfactory receptors of insects is smaller than in mammalian genomes. 6. Perhaps. 6. Benton et al 2006).1. NP_116627. but are relatively closer to fungal taxa. Drosophila ORs stay very distinct and away from the human ORs . NP_014094. NP_014105.5. insect ORs are evolutionarily unrelated to vertebrate ORs. Pitts et al 2004. OR83b acts as a co-receptor and forms heteromeric complex with ligand –binding ORs (Larsson et al 2004.1.Though anatomically mammalian and insect OSNs are similar.1.3 Membrane proteins in Yeast Six membrane proteins (OR-like) were collected (NP_012743. 371 human ORs and 6 candidate receptors from yeast were aligned and observed for the cluster associations at cross-genome level. Nakagawa et al 2005.4 Results The collected 60 Drosophila ORs.5. Neuhaus et al 2005. There is no coclustering observed between the insect ORs and mammalian ORs. The other possible reasons could be due to the independent evolution of fly ORs. this could be due to insect ORs exhibiting a long lineage of evolution with human ORs. NP_014081.1.1. Jones et al 2005).205 across insect genomes and is called OR83b (Krieger et al 2003. Notably. OR83b is not directly interacting with odors.

Cory Bargman Cory Bargman and associates have proposed a genetic approach to investigate odor response in C. the observed reverse topology in the fly genome could be another strong reason for the lack of coclustering with human ORs. Drosophila and yeast Note: The selected human (pink) and Drosophila ORs (blue) do not show any significant coclustering. elegans OR physiology:A special occurrence” . Elegans GENOMES “C. But Drosophila ORs (blue) shows considerable coclustering with yeast ORs. 6.9 Cross-genome phylogeny on selected ORs from human.6 CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED OLFACTORY RECEPTORS FROM HUMAN AND C.5 Summary There is no significant coclustering observed between selected ORs of human and Drosophila genomes.5. Figure 6. elegans – a nematode which possesses 14 .206 Probably. 6.

207 types of chemosensory neurons in sensing various odors. He stated that among “important” olfactory candidate genes. elegans from SEVENS database. odr-10 is the only one olfactory receptor sequence reported in C. the intention of the study is to find out is there any possible coclustering observed at the crossgenome phylogeny of selected human ORs with homologues of olfactory receptor of C.1 Odr -10 and homologs As discussed. elegans. a single neuron can express upto 4 different OR genes. amongst the collected ORs. elegans. Hits with significant E-value were considered for the current study and 82 homologues were collected for . more than 40 highly divergent receptors have been found. elegans GPCRs with the default parameters. In the current study. the only one sequence in the nematode genome was annotated as “olfactory receptor” and is odr-10. So. These genetic studies provide the first time an “in vivo “model for the specific interaction between a receptor of the seven transmembrane protein family and an odor ligand. Eleven of these are expressed in small subsets of chemosensory neurons. Attempts were made to collect the homologous sequences for odr-10 by running a BLAST search with default parameters against the database of already collected 1016 membrane proteins of C. A cross-genome phylogenetic analysis.1) might help to provide further annotation. A receptor gene called odr-10 is expressed in one of the sensory neurons and encodes a potential odorant receptor. with the collected homologues of odr-10. They do not show sequence homology. Odr-10 was given as query to search against the database of C. but exhibit structural homology with vertebrate OR proteins. 6.6. along with the selected representatives OR sequences from human OR sequences (Section 6.

elegans genomes Note: The Nj – method of phylogeny shows the cluster arrangements of serpentine receptors in C.6. 10 representative OR sequences (from HSC1-HSC10) were selected from previously established human OR phylogeny (section 6.10). . 78 sequences were predicted for the N-out topology. elegans OR and its related homologues (Figure 6. C. Odr10 was predicted to retain seven transmembrane helices and N-out topology (Colbert Ha and Bargmann 1997). Figure 6.The nematode olfactory receptor odr-10 is highlighted in star symbol at CeC3. elegans from the clusters CeC1 to CeC6.2 Results and Discussion The obtained cross-genome phylogeny exhibits seven distinct clusters in the tree topology and phylogeny was reported between selected human ORs. The non co-clustering representative OR sequences from human stay distinct and noted as Hum_C7 (read anti-clock wise). 6. Among the collected homologues. Separately. and with the selected human OR representative sequences along with collected homologues of odr-10 was used to generate cross-genome alignment.10 Observed cluster association in the cross-genome phylogeny of selected ORs from human and C.208 the odr-10 and among them seven hypothetical proteins were collected.1).

8). Most related pairs based on sequence identity were identified and particularly ten str candidate receptors associated to odr-10 has been reported for sequence identities and similarities (Table 6. Interestingly.209 As earlier studies have reported (Chapter 2) that the human and C. candidate GPCR namely (fol-3).7 and 6. elegans GPCRs show long lineage in evolution and thus no significant coclustering were observed in the cross-genome phylogeny.8). Odr-10 tends to be closely associated with a particular str–type receptor namely NP_505861. all the representative human OR sequences were clustered together and stays as a separate clade (denoted as HumC1) and shows the strong species-specific trend (Figure 6. and a srt type . The annotated olfactory receptor Odr-10 belongs to this largest Str superfamily and notably in the CeC3 cluster arrangement. The rest of the 35 candidate receptors from the Str family exhibit a separate association and form a separate cluster and is denoted as CeC1. odr-10 is associated with candidate receptors purely from Str superfamily. although they belong to same superfamily. Two candidate receptors from srsx family and a hypothetical protein (NP_494099.1) associated in this cluster which could be further explored for its functional relevance. among them five receptors belong to the same family i. among the collected homologues.. CeC3 retains 23 Str–type receptors and shares about 24% sequence identity. predominantly 58 ORs are from Str superfamily.1). These associations can be explained for the sequence diversity existing even at the family level. Notably.5% of sequence identity. the same way there is no coclustering observed between these taxa with reference to olfactory receptors.e. CeC2 is associated with six candidate receptors. Sru family of SRG superfamily and an unannotated GPCR (NP_496399.3 (Str-115) with 33.

210 receptor cocluster in CeC4, to represent diverse sequence property of this cluster (Figure 6.10). 5 srab candidate receptors from SRA superfamily are associated in the CeC5 and denote the sequence specificity and average identity for this cluster is 45%. Notably, CeC6 is associated with hypothetical proteins and typical GPCR of gar-3. Table 6.7 Significant cluster association for str type receptors in CeC3 and sequence pairs with high /low identity has been given
S. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 High Length identity to 379 346 349 337 340 351 340 339 334 341 336 337 351 333 359 353 340 339 686 346 334 325 20.6 28 29.8 29.9 32.1 32.1 32.9 33.5 33.5 33.6 33.6 36.6 39.9 39.9 46.7 46.7 56.3 56.3 57.8 57.8 86.2 86.2 Low identity to NP_505321.1(str-85) NP_507067.2(str-254) NP_503223.1(str-256) NP_505861.3(str-115) NP_503316.1(str-20) NP_507162.2(str-15) NP_503666.1(str-119) NP_505861.3(str-115) NP_509157.1(odr-10) NP_503493.1(str-160) NP_001023592.1(strNP_503493.1(str-160) NP_503666.1(str-119) NP_509720.1(str-74) NP_506742.2(str-45) NP_507067.2(str-254) NP_503223.1(str-256) NP_505321.1(str-85) NP_506821.1(str-88) NP_505322.1(str-87) NP_506177.2(str-181) NP_507193.1(str-151) NP_507192.3(str-149) 169) NP_507192.3(str-149) NP_506742.2(str-45) NP_509720.1(str-74) NP_503223.1(str-256) NP_507067.2(str-254) NP_506821.1(str-88) NP_505321.1(str-85) NP_506177.2(str-181) NP_505322.1(str-87) NP_507192.3(str-149) NP_507193.1(str-151) 17.6 14.7 16.2 14.7 16.7 16.6 17.4 17.7 18.5 17.1 16.2 16.9 NP_507048.2(srj-29) NP_506742.2(str-45) NP_507193.1(str-151) NP_503666.1(str-119) NP_507048.2(srj-29) NP_503666.1(str-119) NP_506742.2(str-45) NP_509720.1(str-74) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_509720.1(str-74) NP_506742.2(str-45) 15.3 15.3 17.7 17.8 16.5 17.9 15.9 16.5 17.4 18.8 NP_507018.1(str-233) NP_507048.2(srj-29) NP_507193.1(str-151) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29)

Protein identifier NP_507048.2(srj-29) NP_507018.1(str-233) NP_506518.2(str-230) NP_507068.2(str-97) NP_507162.2(str-15) NP_503316.1(str-20) NP_500472.1(str-122) NP_509157.1(odr-10) NP_505861.3(str-115) NP_001023592.1 (str-169)

Protein identifier

Protein identifier

211 Table 6.8 Sequence identity and similarity between odr-10 and associated SR
S.N o 1 2 3 4 5 6 7 8 9 10 Odr-10 NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) Associated OR NP_507048.2(srj29) NP_507162.2(str15) NP_507068.2(str97) NP_505321.1(str85) NP_507193.1(str151) NP_500472.1(str122) NP_505322.1(str87) NP_507192.3(str149) NP_506821.1(str88) NP_503666.1(str119) Sequence identity 20% 25% 27% 27% 27% 27% 28% 28% 28% 31% Sequence similarity 37% 43% 46% 44% 45% 48% 45% 46% 46% 49%

6.6.3

Summary  In the cross-genome phylogeny of selected representative OR sequences from human and Odr-10 and 84 related homologues from C. elegans, there is no coclustering. This may be due to the long lineage of evolution between human and nematode membrane proteins also due to their widely different olfactory behavior.  The observed CeC1 and CeC3 clusters retain sequences from the largest Str superfamily, wherein notably CeC3 cluster retains the characteristic olfactory receptor of C. elegans (odr-10) along with 22 candidate receptors exclusively from Str superfamily. This cluster can be illustrative of nematodespecific ORs observed at the cross genome level.

212  As odr-10 belongs to Str superfamily, in CeC3, the candidates from Str superfamily establishes the association. Notably (Str-115) shows 33.5% of sequence identity with the odr-10 and this cluster can be further analyzed for sequence properties such as motifs and orthologs. Distantly related homologues are hard to identify however typical sequence search procedures could be accompanied by cross-talks. Such coclustering of sequences (like odr-10 and Str-115) could establish distant relationships between them. Among the seven hypothetical proteins, two of them are associated in the CeC2 and CeC4 clusters which, in turn, can lead to interference with the functional properties of the associated serpentine receptors. 6.7 CROSS-GENOME PHYLOGENETIC ANALYSIS ON

SELECTED ORS FROM HUMAN AND MOUSE GENOMES 6.7.1 Introduction In mouse, olfactory epithelium is divided along the dorso-ventral axis into four zones, based on OR expression (Ressler et al 1993, Villeneuve et al 2000). The dorsal region, also referred as zone I, expresses about 50% of all OR genes, including class I type as well as class II type receptors. Ventral region, consists of endoturbinates II, III and IV, expresses only class II type receptors (Zhang et al 2004, Tsuboi et al 2006). Earlier studies also suggest that receptors for polar, hydrophilic and weakly volatile odorants are present in the dorsal region of olfactory epithelium; while receptors for non-polar, more volatile odorants are distributed in the ventral region (Abaffy and Defazio 2011), to exhibit different odor codings.

213 Expression data are also available for some of the mouse and rat class I type ORs. Both classes (mention classes) of ORs are expressed in the dorsal zone of the olfactory epithelium (Bulger, et al., 1999) and (Conzelmann et al 2000). Class II type receptors have been found in all four zones of the ventral zone. In a previous phylogenetic analysis on mouse olfactory receptors (Zhang and Firestein 2002, (Zhang et al 2007) by using consensus tree, nearly 1000 OR genes were classified into several OR families. For the classification they have set the rule as family members must comprise a strong phylogenetic cluster, which refers to a reliable clade, generally possessing >50% bootstrap value and have more than 40% protein identity. By this definition, mouse ORs were classified into 228 families. 6.7.2 Objectives Since OR sequence clusters were abundant in the mouse genome, the current study is aimed to perform a cross-genome phylogenetic analysis with a non-redundant set of 338 mouse olfactory receptors and the selected representative human OR sequences (around 50 in numbers). The current study will be helpful to identify reliable phylogenetic clades at cross-genome level and the conserved motifs across two genomes. 6.7.3 Human –Mouse OR Orthology Many earlier studies report the existing significant evolutionary relationship between human and mouse ORs. The orthology has been observed from 60%, 70-80% and 80% and >80% of sequence identity across these genomes. It has been observed that mouse ORs from chromosome 11shows synteny relationship with the human chromose17p 13.3. Indeed, OR clusters from these genomes share the highest sequence % identity, and the closest pair retain 74-88% identity at protein level (Sullivan et al 1996) across two genomes. Mouse ORs are reported for orthology with human ORs even in

214 the sub family level. The mouse OR sub families like 3A, 1A, 1D, 1E and 1P are all present in human OR clusters. Apart from human counterparts, mouse ORs retain orthology with other vertebrate genomes also. For example, mOR11-2c shows 81.48% sequence identity to olfactory receptor like protein- DTMT in canine (Parmentier et al 1992) and mOR11-2e shows 89.81% identity to the rat OR sequence, namely RATOLFPROQ (also known as M64391) (Buck and Axel 1991). Earlier studies have reported the synteny relationships derived from the Mouse Genome Database linkage maps with the specific cluster pairs (Lapidot et al 2001). 6.7.4 Complex Picture on Human-Mouse OR Orthology Further inspection of human and mouse orthology shows a complex picture, in few cases simple pair-wise orthology was seen, but in other cases multiple potential orthologous mouse ORs for single human OR sequence and vice versa is found. Yet in some cases (Makalowski et al 1996), there is not much significant orthology between human and mouse ORs of same sub-family. True OR orthologous genes are expected to share a function and therefore to display higher conservation at the residues which are related to the odorant binding site. To identify the conserved residues in the othologous and paralogous OR sequences, Pilpel and Lancet, 1999 conducted a “variability diagnostic plane” analysis. They used six human–mouse orthologous genes to identify inter-orthologue variability and 197 OR genes to evaluate interparalogue variability and showed the detailed study on the correlation between them. The results reported 17 CRS (complementarity-determining region (CDR) in the lower right quadrant (Pilpel and Lancet 1999) which represents residues that have high variability among paralogous genes, but

215 relatively low variability among orthologous genes and shown the functional diversity such as odorant recognition. 6.7.5 Methodology Among the collected 338 mouse olfactory receptors 90% of ORs were predicted for retaining seven TM helices, with N-OUT topology. Along with selected 50 representative ORs (reference of why they were selected) from previously established human OR phylogeny, 338 mouse ORs were aligned by MAFFT alignment program with JTT 200 scoring matrix and gap opening penalty as 1.53. DUFF gene - a human chemokine receptor was used as outgroup for the current study. The obtained cross-genome OR alignment was used to generate tree with Nj method with bootstrap and the generated tree topology with circular display was preferred. Around 10 mouse OR subclusters were differentiated by the presence of 50 representative human ORs. 6.7.6 6.7.6.1 Results Cross-genome OR cluster association The selected 50 representative ORs from human OR phylogeny were helpful to associate 338 mouse ORs into 10 mouse OR subclusters. While reading the tree topology, exempting the outgroup (DUFF_gene), 10 mouse OR subclusters were observed along with human ORs and are named as MMC1, MMC2 to MMC10 (Figure 6.11 and 6.12). Apart from 6 human ORs which are from HSC1 (fish-like ORs), rest of the 44 human ORs were distributed along with the mouse ORs and the occurrence of human and mouse ORs in the clusters referred as “coclusters” which shows higher BS values and closely related with human ORs. From the observed coclusters, 25 human-mouse OR sequence pairs were selected and their sequence identity ranges from 41-84 % in cross-genome phylogeny (Table 6.9).

ORs of H.216 Figure 6.11 Cross-genome phylogeny of selected olfactory receptors (ORs) from human and mouse genomes Note: Phylogeny of selected (50) representative OR sequences from 10 human OR subclusters (fuchsia) and mouse ORs (around 338 )in green color with a chemokine receptor (duff_hum) as an out group (red). DUFF HUMAN . HMC1 Figure 6. musculus are noted with prefix MOR.12 Phylogeny on selected human and mouse olfactory receptors with special emphasize to mouse class I type receptors. sapiens are noted with prefix HS and ORs of M.

The obtained cross-genome phylogeny clearly exhibited the coclustering arrangements with human ORs (class I type).2 Cross. all the OR representative sequences from HSC1 were clustered only with 74 mouse homologues and are noted as HMC1 (Figure 6.11).6. The alignment was performed as mentioned earlier (Section 6. Notably. even in the cross-genome phylogeny the selected human ORs from HSC1 shows a distinct clade but co-clustering with around 74 mouse OR sequences (this particular association is referred as HMC1 in the phylogeny and the mouse homologues were designated with prefix as MOR* followed by gene id (Figure 6. This exercise explains the usage of representative sequences in the cross-genome phylogeny to collect homologues. particularly to represent class I type receptor properties.7.217 As seen in the human OR phylogeny where HSC1 stays distinctly. wherein the added 74 mouse class I type receptors coclustered only with the given representative human class I type OR receptors as expected). This further emphasize the clear discrimination of class I.12). II type receptors in higher eukaryotes such as human and mouse. 6. DUFF_HUMAN stays as outgroup in the study. attempts were specifically made to collect mouse homologues for class I type receptors (from HSC1). 74 mouse OR sequences were aligned with collected 338 mouse olfactory receptors along with 50 human representative ORs. In order to ascertain mouse ORs belonging to class I type ORs.1).genome phylogeny with Class-I type receptor homologues 74 mouse class I type receptors were collected and the crossgenome phylogenyetic analysis was done (mention the other genome used for this along with 74 sequences. Though there are 52 . The chemokine receptor was selected as an outgroup for the cross-genome OR phylogeny. The chemokine receptor.

reflecting the occurrence of highest sequence identity between certain human and mouse ORs. ICL2. only six representative ORs were selected for the current study and these representative sequences were quite sufficient to establish a significant coverage of representing the HSC1 cluster properties.218 human ORs present in HSC1 of human OR phylogeny (section 6. suggesting that class I type mouse OR homologues shows significant sequence identity with human ORs.  . 6. These six human OR representatives produced the coclustering with the mouse homologues (Figure 6. MAYDRYVAIC motif in TM3.  Though ample co-clustering was observed.1). The collected mouse homologues for Class – I type receptor exhibited clear coclustering with Class I type receptors in the mouse genome. the labeled HSC1 of human ORs meant for the class I type receptors stays distinct. SY motif in TM5.7.7. Selected 25 representative human-mouse OR sequence pairs were showing significant sequence identity varies from 41%-84%. which could be of class I type receptors in mouse genome. 6. FSTCSSH motif in TM6 and PMLNPF motif in TM7 are conserved between human and mouse ORs. ICL1.7 Common motifs in the Cross-genome phylogeny: By using the TM-MOTIF tool. LHPMY motif in TM1. significant co-clustering arrangements were observed in the phylogeny.11).8 Summary  By performing cross-genome OR phylogeny with selected 338 mouse ORs with 50 human OR representative sequences. the 10 mouse OR subclusters were observed for the conservation of amino acids at cross-genome level. This strongly supports the presence of fish-like ORs in human and mouse OR clusters. Notably.

9 Percentage identity for selected human and mouse ORs for significant association from cross-genome OR phylogeny S. The conserved motif shows the evolutionary relationships between human and mouse ORs and preservation of sequence and structural properties for functional relevance.219  The ORs from different clusters of human ORs (class II) tend to spread along with mouse ORs for reporting the co-clusters.NO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Human OR Cluster No HSC2 HSC2 HSC2 HSC2 HSC3 HSC3 HSC3 HSC3 HSC4 HSC4 HSC4 HSC4 HSC4 HSC5 HSC5 HSC5 HSC6 HSC7 HSC8 HSC8 HSC9 HSC9 HSC9 HSC10 HSC10 Human OR HS1D2 HSorl16 HS1A1 HS1E1 HS10S1 HS12D2 HS4A16 HS4K5 HS8K5 HS8D1 HS9Q1 HS5W2 HS5T1 HS5AC2 HS8H1 HS6B1 HS14A16 HS11H1 HS0J3 HS10A7 HS2Y1 HS2K2 HS10AD1 HS2AK2 HS2M5 Mouse % Mouse OR OR identity Cluster No 18480460 52 MMC2 18480814 76 MMC2 18480066 84 MMC2 18479630 83 MMC2 18480630 54 MMC3 18479814 69 MMC3 18479942 54 MMC2 18480928 82 MMC2 18479442 76 MMC8 18480336 50 MMC8 18480006 74 MMC8 18479812 79 MMC10 18480754 74 MMC10 18479484 69 MMC9 18479794 72 MMC8 18480732 48 MMC5 18480640 52 MMC4 18480158 63 MMC5 18480958 61 MMC5 18480168 54 MMC6 18480552 81 MMC6 18480320 47 MMC6 18479756 41 MMC6 18480490 76 MMC4 18480592 73 MMC4 . Table 6.

8. Bos taurus (bovine). Pan troglodytes (common chimpanzee). Rattus norvegicus (rat).220 6. Canis lupus familiaris (domestic dog).3 Methodology Since the established human OR phylogeny proposed 10 distinct human OR subclusters. For the pilot study sequences from non-human primates and aves were considered for the limited . Pan troglodytes (common chimpanzee). Callithrix jacchus (common marmoset).1 Objectives A cross genome OR phylogeny was performed with OR sequences from non-human primates such as Ailuropoda melanoleuca (bear). Pongo abelii (Sumatran orangutan). Bos taurus (bovine). When compared to mammalian ORs. avian olfaction is poorly understood (Steiger.2 Background Cross-genome OR phylogeny with multiple organism is highly significant to identify cocluster association with ORs of diverse taxa.8. homologues were collected for the 50 human OR sequences for the interested seven non-human primates. and Gallus gallus (from class Aves) and to identify the coclusters among various taxa. 6. so preliminary attempts were made in this section to identify the coclusters of ORs from various taxa for common function-olfaction.8. Ailuropoda melanoleuca (bear). et al. Canis lupus familiaris (domestic dog). Pongo abelii (Sumatran orangutan). 6. Callithrix jacchus (common marmoset).8 PHYLOGENETIC ANALYSIS ON OLFACTORY RECEPTORS FROM SELECTED HUMAN AND NONHUMAN PRIMATES 6. and an organism from class Aves Gallus gallus. 2009). Rattus norvegicus (rat)..

12 ORs from Gallus gallus.3 47. Canine ORs were observed in human OR clusters such as HSC2. 371 were human ORs and a cross genome phylogeny was constructed for 1000 BS replicates. 505 sequences were coaligned by MAFFT alignment program and among them as mentioned 122 from are from non-human primates. Ailuro EFB18423 (Figure 6.8.8 39.1 Percentage Identity 82.2 Pan_XP_524919. particularly HSC1 (fish-like ORs) coclusters with Rat NP 00100126.2 canis_XP_545735.2 Ailuro_EFB18423.2. 6 and 7.1 Gallus_NP_001008754.1 85.9 C1 HS51M1_Chr11 C1 HS51M1_Chr11 C1 HS51Q1_Chr11 C3 HS10S1_Chr11 C7 HS6N1_Chr1 C7 HS6N1_Chr1 C9 HS13D1 C9 HS13D1 . Notably ORs from Gallus gallus (aves) is observed in human OR clusters such as HSC5.2 XP_002822168. 4. Care has been taken while selecting sequences to have 7 ± 2 predicted TM-helices. 6.5 81.5 26.221 representative sequences (ranges from 15 to 20 for each organism).10) in the interest of showing inter-genomic OR cluster association for the olfactory acuity.4 Results The generated cross-genome phylogeny exhibit significant coclustering with human ORs.13). The clusters with ORs from multiple organisms provide platform to study for the conserved motifs.1. Bos XP 875301.1 Calli_XP_002743272. HSC6 and HSC7. and to train SVM model to identify putative ORs across genomes.1 Bos_XP_591375. Totally. The significant coclusters for 25 sequence pairs were identified and reported (Table 6.10 Percentage Identity between selected human ORs and nonhuman ORs CLUSTER Hum ORs Non-human ORs Bos_XP_875301.2 98. Gallus NP 001008754.9 82. Table 6.

222

Aves

Figure 6.13 Cross genome phylogeny Figure 6.13 Cross- genome phylogeny on selected human ORs with ORs from non human primates and aves
Note : Aquva (HSC1), violet (HSC2), indigo (HSC3), blue (HSC4), green (HSC5), yellow (HSC6), orange (HSC7), red (HSC8), olive (HSC9) and teal (HSC10) were denoting distributed 10 human OR subclusters in cross-genome OR phylogeny and all other nonhuman ORs and ORs from Gallus gallus were noted in maroon colour.

6.8.5

Summary The pilot study with selected ORs from non-human primates, aves

and human ORs shows clear coclustering and evolutionary trends across genomes and the study provides platform to observe conserved motifs and othologs. 6.9 6.9.1 DATABASE OF OLFACTORY RECEPTORS (DOR) Objectives The availability of genome sequences for the interested genomes like yeast, fly, worm, mouse and human facilitate the creation of a nonredundant data repository on olfactory receptors. The selected eukaryotic genomes are useful model organisms and hence in vivo application can be

223 suggested using the curated data repositories and related structural information in the near future. DOR is an integrated database to provide sequence and structural information on olfactory receptors (OR) for selected eukaryotic organisms such as S. cerevisiae, D. melanogaster, C. elegans, M. musculus and H. sapiens. Versatile functions of ORs motivate to create a non-redundant data repositories which can be further used for various practical applications in the field of pharmaceutical industries (aroma therapy), olfacto-sexual function, olfacto-neural communication, cosmetic industry (perfume manufacturing), food industry, agricultural pest managements and so on for the vast practical application for the benefit of mankind. OR from each genome is peculiar about its sense of olfaction. For instance, amphibians retain both class I and class II type of olfactory receptors, where as teleost fish including the goldfish Carassius auratus carries only class I type receptors (Freitag, et al., 1998, (Speca, et al., 1999). This, further emphasizes the role of class I type receptor for detecting waterborne odors, wherein class II type receptors for sensing air–borne odors. Since the amphibian lifestyle accommodates both terrestrial and aquatic habitat, the class I and II type receptors were acquired for its dual life-style (Freitag, et al., 1995). Higher order organisms also retain both class I type (fish-like) ORs and class II type ORs (mammalian ORs) (Glusman, et al., 2001,and Niimura and Nei, 2005). The observed two types of receptors in human (terrestrial vertebrates) particularly reveals the phylogenetic distance between fish and mammals and also the occurrence of class I and II types could be the result of an adaptive process during evolution. This permits fishes to sense water-soluable odors and mammals to recognize a large variety of hydrophobic and volatile compounds (Freitag, et al., 1998). Separately,

224 since there is no class-specific motifs were identified for these classes of ORs the structural differences are helpful in discriminating these two type receptors to some extent. Notably the length of the ELC3 in the class I type receptor ranges from 10 -15 amino acids, but ECL3 in class II type receptors in vertebrates ranges from 13-14 amino acid residues (Freitag, et al., 1998). This could be the best example to emphasize the need of integrated knowledge on sequence and structure to understand the property of ORs more in detail. Thus in the current study, attempts were made to incorporate information related to sequence analysis in documenting the non-redundant OR sequences, predicted membrane topology, possible cross-genome OR alignment, phylogeny, and structure analysis to provide information on predicted secondary structural details, conserved motifs and dimer-interfaces for the selected representative sequences selected from OR phylogeny (all structural information data were carried out by collaborators from NCBS and AIST). 6.9.2 Features on OR sequences in DOR DOR provides user friendly platform to access features related to OR sequence and structure (Figure 6.14 and 6.15). The main menu provides 5 key features like “Sequence”, “Genomic combination”, “phylogeny”, “Structure” and “TM-MOTIF” and database can be accessed from http://caps.ncbs.res.in/DOR

225

Figure 6.14 Available main menu in the front page of DOR
Notes : Snapshot depicting the available main menu Database Of Olfactory Receptors (DOR) with user interactive features. Label 1 refers to the retrieval of OR sequences for the genomes of interest in FASTA format by using the option “Sequence”. Label 2 indicates the available intraand inter-genomic OR cluster alignments and are available in both .aln and in .mas format by using the option “Genomic combinations”. Label 3 guides the user to view and download the phylogeny of selected intra- and inter-genomic OR phylogeny (available in .meg and .mts formats). Label 4 provides secondary structural details such as 3D structure, pairwise alignment with template, CONSURF, predicted dimer interface for the interested OR sequences. Label 5 facilitates user to download TM-MOTIF package to visualize MSA in VIBGYOR colouring scheme and to identify conserved motifs with AAS. All the said options have related drop-down menu namely “ ORGANISM “ which provides list of available organisms for user to select. Label 6 refers to the DOR–home page to reach back after navigation. Label 7 refers to the available help-page for DOR.

DOR (Database of Olfactory Receptors) provides following information on olfactory receptor for sequence details: 6.9.2.1 OR sequences of target genomes: In this option, user can select their “organism of interest” and can collect the respective OR sequences (in FASTA format) by using the hyperlink in every genome for downloads . The related drop-down menu called “SOURCE” provides the list of organisms such as S. cerevisiae, D. melanogaster, C. elegans, M. musculus and H. sapiens.

226 6.9.2.2 Predicted TM boundaries By using this option, user can collect information about predicted transmembrane domain boundaries from TM1 to TM7 for predicted seven helices (Figure 6.15 and 6.16) and the predicted helix boundaries for the OR sequences were colored by violet (V), indigo (I), blue (B), green (G),yellow (Y),orange (O) and red (R) colouring scheme with respect to predicted seven TM-domains and when sequences were overpredicted for more than seven TM-domains, a pale cream colour is used. The sequences predicted less than seven TM-domains also observed through the incomplete representation in VIBGYOR colouring scheme. This provides knowledge on membrane topology at first sight and the given hyperlink helps the user to download the corresponding OR sequence in FASTA format and the OR sequences recommended for 3-D modeling are emphasized with a * symbol and the provided hyperlink helps the user to navigate to the webpage related to structural information.
A B

C

D

E

Figure 6.15 A snapshot of the give option “sequence” and its application in DOR
Note: The given snapshot depicts the display for the given menu “Sequence” (given in A) and the respective drop-down option for “SOURCE” (given in B) and the “SUPPORTS” (given in C).

227 DOR display for selected sequence with predicted membrane topology in VIBGYOR colouring scheme (given in D) and respective display to retrieve FASTA sequence (given in E) is shown.

Figure 6.16 Display of predicted membrane boundaries in DOR
Note : Display for the given option “sequence” wherein each olfactory receptor was given with protein ID, sequence length, NCBI –protein identifier, followed by predicted membrane topology (by HMMTOP), Here , N’terminal, number of predicted helices along with the predicted TM boundaries for the seven helices with start and stop positions. Notably, seven helix boundaries were denoted in VIBGYOR colouring scheme and over/under predicted helices were also given.

6.9.2.3

Single/cross- genome OR alignments Apart from the uni genomic phylogeny, few cross-genome

phylogenetic analysis were performed and the user can select any of the following combinations to view phylogeny –such as S. cerevisiae – D. melanogaster -H. sapiens, C. elegans- H. sapiens, and H. sapiens-

M. musculus. MAFFT alignment tool was used to generate the alignment for the selected intra and inter-genomic organism(s). Here, user can benefit by the cross-genome alignments to study more on comparative genomics. The MSA of the interested genome can be downloaded both in CLUSTALW alignment

17 Display of “Alignment” option in DOR Note: Snapshot showing the display of “Alignment” option for the selected “genomic combinations” (H.9.4 Cluster association and Phylogeny The phylogenetic analysis at single genome level and cross - genome level provides knowledge on cluster distribution. around 371 ORs of H. The generated phylogeny. sapiens.228 format (. at intra genomic and inter genomic association were made available with legible display and cladograms were made available to users for . related sequences were grouped as clusters and the cluster-wise distribution of sequences were given in MSA.M. 6.aln format) also in the MEGA alignment session format (. By observing tree topology with significant BS values.aln) formats (given in A and B). along with AAS in the each position of the alignment. for all selected genomes. Sapiens were grouped into 10 OR subclusters and distributed 10 OR clusters were added in TM-MOTIF tool (Chapter 4) to observe the conserved amino acids.2.17).mas) and CLUSTAL W (.mas format) (Figure 6. Figure 6. musculus) and the respective cross-genome OR alignment has been given in MEGA (. For example.

elegans and 10 human OR subclusters are available as an inbuilt cluster dataset and user can view the detected motifs at intra and inter-genomic cluster alignments of inbuilt dataset or user can also submit their sequences of interest in the MSA (.18). “Run-Motif” and “Run-TM-motif”. Figure 6.229 downloadable image format and the MEGA tree session file in the . musculus) and respective MEGA tree session file (. melanogaster. GPCR cluster dataset of human. By using TM-motif.and inter-genomic clusters of user’s interest.9.2. D.M.mts) can be downloadable. user can identify conserved .5 Softwares and Tools – (TM-MOTIF) in DOR TM-MOTIF is a downloadable software tool (Chapter 4) and an effective alignment viewer to map discovered motifs on predicted membrane topology in the set of aligned OR sequences in VIBYOR colouring scheme. TM-motif mainly helps in mapping the discovered motifs on the intra. 6. sapiens.18 Display of cross-genome OR phylogeny in DOR Note: A snapshot for the option “PHYLOGENY” showing the cross -genome OR phylogeny (H.aln format) along with multiple fasta sequences to run the various display options such as “Run-TM”.mts also available for the downloads (Figure 6. and C.

20). polar negative and polar uncharged at each position in the MSA. Label 4 shows the display of generated phyogenetic tree for the unigenome Label 5 refers to the display of cross-genome phylogeny. Such an annotated alignment can be effectively used for modeling the sequences and also guide template selection. aromatic.230 motifs at 60% level of conservation along with amino acid substitutions (AAS) with their physicochemical properties such as hydrophobic.19 Overview on pictorial representation of available features in DOR for sequence analysis Note : Label 1 depicts the option “Sequences” for the retrieval of OR sequences in FASTA format. . Figure 6. Label 2 refers the available alignments for single and cross-genome display in CLUSTAL W format (. The tool can be downloaded and used as a standalone package for the benefit of user (Also refer Figure 6. user can submit their sequence of interest to align with any of the selected reference sequences (whose structure is known) to obtain the pairwise alignment with TM-MOTIF display.mas). Separately. Label 3 indicates the display of predicted seven TM-helices with respective boundaries in VIBGYOR colouring scheme. Label 6 indicates the available DOR-help page and Label 7 displays the TM-MOTIF display of the OR subclusters in VIBGYOR colouring scheme and identified motifs. polar positive. .19 and 6.aln ) and in MEGA format (. User can choose the option to run-blast search the nearest homologues for their sequence of interest from the in-built cluster associations.

melanogaster.mas). sapiens . “Sequences” facilitate the retrieval of OR sequences in FASTA format. Label 1 The “Sequence” in the “SOURCE” option provides the list of “ORGANISM” such as C. elegans . S. “Alignment” provides the alignments for single and cross-genome display in CLUSTALW format (. M.231 Figure 6.20 Overview on DOR features for sequence and structural information for olfactory receptors in DOR Note : The five available “SOURCE” options are given in pink arrows and are numbered from 1-5. H.aln ) and in MEGA format (. D. musculus. The respective “SUPPORT” option provides the “Sequence” and “Alignment”. sapiens. H. And result files are downloadable. elegans. The available “ORGANISM” and “SUPPORT” drop-down menu options are given in inverted triangle in blue. sapiens/C. cerevisiae. Label 2 The “GENOMIC COMBINATION” in the “SOURCE” option provides the list of “ORGANISM” such as H.

(d) – Residue conservation mapped on OR homology model using Consurf.mel/S. H.232 /M. musculus . musculus. H. S. elegans. cerevisiae . elegans. (c) -Residue conservation mapped on OR sequence using Consurf.mas). Label 5 “MOTIF ANALYSIS TOOL” provides option for “TM-MOTIF” – an alignment viewer to display predicted seven TM-helices of ORs in VIBGYOR colouring scheme with the identified motifs mapped on the alignments along with AAS and the package is available for downloading. sapiens / M. mel/C. sapiens/D. melanogaster. H. H. sapiens/D. sapiens. (f) Dimer-interface prediction for OR model. cerevisiae and representative 3D models with features like (a)Alignment between OR sequence and bovine rhodopsin.elegans for the display of unigenome and cross-genomic OR phylogeny and the tree session files are downloadable. cer. M. D. H. mel/C. melanogaster. elegans for the display of cross-genome alignments in CLUSTAL W format (.aln ) and in MEGA format (. (b)-Pymol session file with seven TM domains coloured in VIBGYOR colour. D. D. (e)-Validation chart for every homology model. M. sapiens/C. . And result files are downloadable. sapiens. mel/S. Label 3 The “Phylogeny” in the “SOURCE” option provides the list of “ORGANISM” such as D. musculus . H. musculus . Label 4 The “Structure” in the “SOURCE” option provides the list of “ORGANISM” such as C. cer. And the result files are downloadable. S.

9. NCBS. Harini.233 6. related alignments.21 Display of 3D Structure and related features in DOR Note : A snapshot for the option “STRUCTURE” showing related options about generated model and predicted dimer-interfaces for certain representative OR sequences selected from phylogeny. (The supportive .9. conserved residues. JAPAN). Structural features like homology modeling of selected OR sequences. details on structure validation and predicted dimer interface residues have been also incorporated into DOR for having complete knowledge on sequence-structure-function paradigm through DOR. Banglore) and dimer interface predictions (by Dr.4 Summary DOR (Database of Olfactory Receptors) is an user-friendly database where user can retrieve and download information on both OR sequence and structure arena for the five eukaryotic genomes. intra and inter-genomic OR clusters further to perform homology modeling (by K. 6. Nemato. Figure 6. AIST.3 Structural features (Application of sequence searches) The performed sequence searches were highly useful in proposing representative sequences.

The selected best representative sequences from the generated can be suggested to predict for the homology modeling and to predict dimerinterfaces to discover functionally important residues and ligand binding pockets.234 tables. The generated phylogenetic tree (single and cross-genome) further helps to understand the sequence properties at intra and inter-genomic levels. User can also use : . An inbuilt dataset of 10 human OR subclusters was available in the TM-MOTIF package and downloadable. As an initiative in implementing the sequence knowledge.in/DOR) The given option “Sequence” provides non-redundant OR sequences for the targeted eukaryotic genomes. The phylogenetic tree from uni-genome and cross-genome help us to study the cluster associations to select the representative sequences for further analysis.res. The given option “Alignment” provides not only the MSA for the single genome but also for the crossgenome. These sequence studies could effectively be used to detect cluster– specific motifs from the MSA. These alignments can be further used to detect conserved motifs and particularly cross–genome alignments are very useful for the evolutionary perspective. The list of non-redundant OR sequence can be further used to train SVM to identify potential OR sequences also implemented to identify orthologs across genomes.ncbs. TMMOTIF– a tool to detect motif in the set of aligned OR sequences was incorporated to database. species-specific cluster association and cocluster association at cross genome phylogeny. alignments. The other option namely “TM-boundaries” provides the predicted TM-helices for each OR sequence with the start and end position for each predicted helix and the predicted boundaries for seven helices are given in seven different colour (VIBGYOR colouring scheme) for easy observation. and phylogeny are available in the URL :http://caps.

The dimer-interface prediction for every structure guides us further to study the oligomerization process of these receptors and the functional significance of such higher order entities. This would further help us in understanding the mechanism of function of olfactory receptors. .235 their sequence of interest to view the alignment in VIBGYOR colouring scheme with identified conserved motifs along with AAS in each position of the alignment. the conservation of residues within helices and to generate electrostatic contour maps. The olfactory receptor structures provide great opportunity to the users to analyse the interaction between helices.

and olfactory GPCRs (OR repositories) of various model organisms facilitates to investigate intraand inter-genomic phylogenetic clustering studies. for this study on membrane proteins. I wish to compile the highlights of results previously discussed in Chapters (2-6) and intended to highlight critical results. . scopes. Database of Olfactory Receptors). In this chapter. applications and future directions in brief.236 CHAPTER 7 CONCLUSION 7. The other mandate had been to create related tools and databases.1 COMPENDIUM My Ph. The abundant availability of non-olfactory G-protein-coupled receptors (GPCRs) (GRAFS system of classification). conserved sequence features and the design of computational package (TM-MOTIF) and database (DOR.D objective entitled “Genome –wide survey of certain mammalian GPCRs and olfactory receptors” has been carried out using effective bioinformatics approaches and resulted in insights on related GPCR/OR sequences at cross-genome level. for public access. The main purpose of the current study is to collect biologically most significant GPCRs and ORs from selected eukaryotic genome(s) and to perform cross-genome GPCR/OR clustering to address the conserved evolutionary trends (motifs and orthologs) and co-clusters.

srh. The resultant 32 cross-genome GPCR cluster association of human and C. and srxa). srj. srg superfamily (srg. srb. elegans genome to provide results like cluster-specific associations and motifs at intra. A profile database of 32 well-known GPCR clusters and the RPS-BLAST technique were utilized to associate more than 1000 C. srv. 7. elegans GPCRs were analyzed for the type of cluster association using the alignment viewer in MEGA 4. srt. elegans GPCRs to the known group of human GPCRs. srw and srz) have also been associated with 32 known human GPCR cluster dataset. chemokine receptors (CMK). and sre). grouped as sra superfamily (sra. cell adhesion receptors (CAR).000 BS replications using TREE-PUZZLE (Schmidt et al 2002).237 Analysis on phylogenetic clustering of GPCRs/ORs helps to recommend the best representative sequences. sru.2 CROSS-GENOME GPCR CLUSTERING Chapter 2 is focused on cross-genome clustering of selected GPCRs from human and C. the cluster-specific sequence motifs and structure-function studies for various practical applications.and intergenomic levels.0 (Tamura et al 2007). The previously established and biologically significant eight major types of human GPCR clusters (such as peptide receptors (PR). sri. Cross-genome GPCR alignments were prepared using an efficient alignment procedure PRALINETM server (Pirovano et al 2008) and the cross-genome GPCR phylogeny was generated by quartet-based maximum-likelihood method for 10. elegans GPCRs to associate functional relevance. str) and others or solo type (srbc. frizzed/smoothened receptors (FRZ/SMT)) were used to associate more than 1000 C. class B (secretin) receptors (SEC). srab. Serpentine receptors of nearly 20 recognizable families. srsx. srx. biogenic amine receptors (BGA). nucleotide and lipid receptors (N&L). class C (glutamate) receptors (GLR). str super family (srd. Terminologies such .

In other instance. namely dop-1 and dop-2. The designed protocol is quite effective in associating remote homologues. Notably.e. The cross-genome GPCR association exhibit 27 nematode GPCRs as “orthologs” to certain human GPCRs. the cross-genome GPCR cluster association shows average cluster identity ranging from 12% to 20% in many clusters and this reflects the efficiency of RPS-BLAST in associating nematode GPCRs to the given human GPCR profiles even at low sequence identities. respectively. Additional 14% association was observed at the E-value thresholds ranges between 1 to 5. species-specific members [SS] have been used to describe the branching types in the dendrogram (Chapter 2) and to refer the types of association as pure distribution (homogenous occurrence) of human GPCRs. neighbor members [NM]. the observed orthologs occur predominantly in the co-clusters (results in Chapter 2) indicating close relationship. two dopamine receptors.. inter-mixing distribution (heterogenous occurrence) of GPCRs to denote highly related (co-clusters & neighbor clusters) and distantly related (neighbor members) nematode to human GPCRs and the homogenous distribution of nematode GPCRs in the tree topology. for instance. For instance. from C. neighbor clades [NC]. elegans is identified as an ortholog to human .001 to 1). 2% of association was done by the E-value thresholds more than 5. GABA B receptor subunit (gbb-1) from C. and very small percentage i. In parallel. elegans were associated with the human biogenic amine type receptors at the significant E-values in Cluster 24.238 as human GPCR clade [HC]. co-clusters [CC]. the current approach on profile-based clustering of nematode GPCRs with the functionally known human GPCRs was quite impressive in associating 84% of nematode GPCRs with the human GPCRs at significant E-value thresholds (ranges from 0.

A trial study conducted with known associations (cross-genome human-Drosophila GPCR clusters (Metpally and Sowdhamini 2005). and identified orthologs. 8. and 11 from peptide receptor type can be further explored for functional relevance to human GPCR types. Notably. Studies verified/cross-checked with known association (Trial study). since the counterpart GPCR from C. V2R_HUMAN/NP_493193. showed 90% of correct association at significant E-value thresholds: Table A2. elegans were annotated as hypothetical proteins. 11.1 in Appendix). 2000). the identified putative ortholog pairs. Besides evolutionarily related GPCR sequences. Thus. 176 GPCRs annotated as hypothetical proteins (unannotated proteins) from C. indicating the distant relationships of nematode GPCRs with human GPCRs in evolution.239 (GABA) B receptor 1 (GBR2_HUMAN) at the most significant E-value thresholds (Remm and Sonnhammer. 23 and 32).1. elegans have been associated by RPS-BLAST to the known human GPCR type and provides a platform to investigate the functional relevance with the associated human GPCR type (s) (examples from the Clusters 3-5.1.1 from the clusters such as 5. clearly support the RPS–BLAST . certain candidate GPCRs showed species-specific tendency (referred as SS and HC) in the cluster association. 16-17. the identified ortholog pairs emphasize the role of RPS-BLAST in associating closely related species across taxa.1. particularly str and srh type receptors. few candidate receptors from the largest str superfamily show relaxed E-value thresholds. 6. Interestingly. TRFR_HUMAN/NP _491990. such as Q96AM5/NP_509515. NK1R_HUMAN/ NP_500930.1 and NY4R_HUMAN/ NP_508234. This ortholog pair retains 37 % of sequence identity and 51% sequence similarity.

As odr-10 is the only one annotated olfactory receptor in C. elegans and eight major types of human GPCRs provide opportunity to explore the secondary structural details. elegans (Sengupta et al 1996) to sense compounds like di-acetyl. the subclusters related to odr-10 has been studied in detail. 7. str-112 is found to be the closest homologue to odr-10 and has been identified from the associated tree topology. the current objective of performing phylogenetic analysis is helpful to identify the related serpentine receptors and conserved sequence features following cluster association at superfamily level.. elegans. Robertson and Thomas 2006). Since a broad spectrum of serpentine receptor superfamily members i.3 PHYLOGENETIC RECEPTORS Chapter 3 describes the phylogenetic analysis on selected serpentine receptors of C.e. nearly 20 SR families have been reported for the C. Interestingly. all the ANALYSIS ON SERPENTINE . elegans chemoreceptors (Robertson 1998. conserved motifs and to confirm functional relevance in vivo across these genomes for practical applications. In essence. 43 SR sequences have been identified as homologues to odr-10 and are distributed in the subclusters namely.240 clustering technique in associating sequences (remote homologues) to related PSSM profiles. the cross-genome GPCR association between diverse serpentine receptors from C. Interestingly. Str_C1 to Str_C6 in tree topology. 683 serpentine receptors were collected from SEVENS database (Ono et al 2005) and 97% of sequences were found to be retaining N-out topology in the predicted membrane topology. The generated phylogenetic tree exhibited the cluster association in a family-specific manner and in turn the superfamily-specific cluster association. Through phylogenetic analysis.

the model generated only with TMhelices shows structure validation for allowed regions as 93. and also guides to connect structurefunction relevance. The generated three-dimensional model shows a final energy of -1020.241 sequences associated to odr-10 belongs to str family of Str superfamily to represent species-specific tendenc at family and superfamily levels and to study ligand binding for odr-10 homologues. These were used to detect the amino acid conservation by using TM-MOTIF package (Chapter 4).2%.23 kcal/mol after energy minimization and shows 82% of the residues are observed within strictly allowed regions and 14% are observed within partially allowed regions of the Ramachandran plot. In order to analyze the conserved sequence features. Also. few representative SR sequences were collected and aligned by MAFFT alignment procedure (Katoh et al 2002). a case study on odr-10 has been performed for the secondary structural details. 92 family- . Identified homologues of odr-10 can be further explored for secondary structural details. This case study can be an example for the usage of sequence studies and to extend structure prediction further to functions. ligand-binding sites.3% and additionally allowed regions as 5. This cluster association can be taken as a best example to explain the effectiveness of phylogenetic approach in associating closely related sequences at intra-genomic level. Such a model can be further studied for ligand-binding sites and active sites/hot spot residues in the three-dimensional structure embedded in the lipid-environment in-silico. oligomerisation. As a pilot test. A three-dimensional model was generated using bovine rhodopsin (known structure) as a template through homology modelling technique using MODELLER (Sali and Blundell 1993). and in sensing di-acetyl compounds.

Since odr-10 also reported for the N-out topology as human olfactory receptors. However. This way. green (G). a phylogenetic study was conducted with selected human ORs (371 ORs) and odr-10. the generated phylogeny does not show any significant co-clusters and odr-10 stays as an outgroup. nematode life style and the ability to recognize limited and simple odors. provide an opportunity to compare the conserved sequence feature (motifs) within and across genome(s) in the set of aligned homologous sequences. indigo (I). yellow (Y). cross-genome clustering and phylogeny provide preliminary guidelines on types of sequence association as related or distant within and across genomes. where the predicted seven TM-helices are displayed in violet (V).242 specific motifs have been identified from the selected serpentine receptors and the observed sequence features can be used for SVM techniques to train the sequence features and can be used further to detect the SR-like sequences from other nematode species and other organism(s). The lack of coclustering also suggests that the agreement in topology may not necessarily include olfactory receptors to cluster together. The key feature of TM-MOTIF package (Figure 4.4 TM-MOTIF PACKAGE The characteristic feature of TM proteins in retaining seven helices with three intra. This may be due to the long lineage in evolution.and extracellular connecting loops. blue (B). orange (O) and red (R) colors . The main objective of TM-MOTIF package is to identify and display the conserved motifs and amino acid substitutions (AAS) in the set of aligned transmembrane proteins.3) is primarily to aid user to visualize identified motifs on predicted seven transmembrane helices and loop regions of the MSA(Tusnady and Simon 2001). 7.

For all the displays options.5-4. the full –length of the sequence is displayed in pale cream colour. aromatic (*).7 in Chapter 4). The user-friendly TM-MOTIF package provides options for the user to submit their sequence of interest (should be membrane proteins) in FASTA format. elegans GPCR cluster dataset of eight major groups of 32 clusters (Chapter 2) and clearly distinguishable 10 human-mouse OR clusters (Chapter 6) from cross-genome clustering studies were incorporated in TM-MOTIF package.Drosophila GPCR cluster dataset. An in-house program for the identification of motifs (MotifS program. . a profile based clustering of selected human .C. “Run-MOTIF” and “Run-TMMOTIF” from TM-MOTIF package (Figure 4. polar positive (+). polar negative (-) and polar uncharged ($)) by the given symbolic representation. For such cases. And an mouse-over option provides the details about the type (physicochemical property) of AAS at each position. Amino acid substitutions were denoted according to their physico-chemical properties such as hydrophobic (@). along with its respective MSA. User can select any one of the given display options such as “Run-TM”. the conserved residue in each position of the alignment as “consensus” is displayed along with the MSA. Inevitably. 2005) of selected human . considerable amount of mis-predictions occurs due to “false merge” and falsesplit” of TM-boundaries and causes underprediction and overprediction of TM helices.243 (VIBGYOR colouring scheme) and the conserved residues along with substituting amino acids (AAS) (at default of 60% conservation) are documented at each position in the multiple sequence alignment Figure 45. written by R.Sowdhamini) was used effectively to identify residue conservation and substitutions in each position of the alignment. An inbuilt dataset of previously established phylogenetic clusters (Metpally and Sowdhamini.

. FORTRAN compiler and standalone versions of CLUSTAL W and BLAST2 installed in user’s machine. human β-2 Adrenergic receptor. common turkey β-1 AR. Zuser.txt.pir (output for the alignment option namely “compare with reference Sequence”option) and Zblast_sorted. TM-MOTIF alignment displays could be supported with graphical representation (as structures in 2D cartoons).res. human dopamine D3 receptor and human CXCR4 chemokine receptor to get a pairwise alignment (by CLUSTAL W) in preferable TM-MOTIF display options and can be further used for homology modelling. TM-MOTIF provides useful output files such as Zconsensus. human adenosine receptor A2A.aln. Also.txt (output for the three display options). The package is integrated with DOR (Database of Olfactory receptors) and downloadable from the URL http://caps. japanese flying squid rhodopsin. It requires pre-requisites such as: PerlTk.txt. The TM-MOTIF package could be enriched with other genomes for in-built cluster dataset and extended to membrane-bound helical proteins like ion channels and transporters in future. “Run-BLAST” to collect the nearest homologue for their sequence of interest from the in-built dataset.244 TM-MOTIF is user-interactive tool. Zuser. BioPerl. Zpattern. TM-MOTIF package has been effectively used for the crossgenome GPCR/OR cluster dataset and is highly suitable for the comparative genomics to identify the cluster / receptor specific and common motifs observed at various percentage of conservation within and across the genome(s) of interest. where user can use the option namely. Also.in/DOR (Chapter 6 also). User can also select any one of the reference sequences whose structure is solved such as bovine rhodopsin.txt (output for “RUNBLAST”option).ncbs. Zmotif. TM-MOTIF is suited for the linux OS.

TASI motif in TM3. two and multi-receptor types were also studied. and human – only GPCR cluster dataset were considered to identify conserved motifs. A total of 33 conserved motifs have been identified from the crossgenome (human-Drosophila) GPCR cluster dataset and 76% of them were observed in TM helices (predominately in TM2 and TM7). Motifs observed in single receptor type (also known as cluster /receptor-specific receptors). PFF motif in TM6 and WLGY motif in TM7 are observed exclusively in BGA type receptors and these motifs can be referred as receptor-specific motifs and are very interesting since they are observed at cross-genome level. . particularly cross-genome GPCR cluster datasets. the current study (Chapter 5) is aimed to identify the conserved motifs along with the substituting amino acid (AAS) in the set of aligned homologues sequences. structural stability and mutations causing diseases and abnormalities.5 STUDY ON CONSERVED MOTIFS AND AAS IN CROSSGENOME GPCR CLUSTERS The role of conserved motifs and AAS play crucial role in functional aspects. in membrane proteins conserved amino acids play an important role in GPCR mechanism. Interestingly. VGL motif in TM1. As mentioned in Chapter 4. VMP motif in TM2. Interestingly. elegans GPCRs. LGF motif in TM5 and NSC motif in TM7 are observed exclusively in peptide receptors.245 7. YLLNLA motif in TM2 and HCC motif in TM7 are observed in chemokine type receptors. TMMOTIF package has been used to recognize membrane topology for the observed motifs. human-C. So. previously established 32 clusters of eight major types of receptors of cross-genome GPCR clusters such as human-Drosophila GPCR clusters. Motifs such as GNL motif in TM1.

respectively. Also. Motifs preserved in the loop regions also identified for the cause of functional importance such as structure stability. 80%. TM6 and TM7. MRTVTN and ASG motifs were observed in both glutamate and peptide type receptors. but also in nucleotide and lipid type receptors. the maximum amino acid conservation occurs as 42% and 46 % in TM2 and TM3. it retains only 30. signaling (intra-cellular loops). 61% occurs in TM1. SEC. AA conservation is high at TM2 for BGAR. However. Also. . LPL motif in TM5 and LYA in TM7 are observed in both peptide chemokine type receptors. ADLL) in TM2 is observed in multi-receptor types. For example. ligand binding (extracellular loops). Significant conservation of 55%. CLP motif from PR (Cluster 7) has AAS in the pattern as [C/P][L/F][P/C/S]. Generally. IYL motif in TM2 and CIS motif in TM3 are observed not only in chemokine type receptors. several motifs were identified exclusively in TM-helices and 133 such motifs have been documented along with AAS.50% of conservation at TM2. RYL. The other motif pattern as DLL (also as ADL. Although the occurrence of motifs (consecutively preserved as three residues) are high in PR. Interestingly. AIA motif in TM3. and FRZ type receptors.246 Motif such as SLA in TM2 is identified in two receptor types such as peptide and biogenic amine receptors. 59 clusterspecific motifs observed in the loop regions were also documented. TLP and LPF motifs in TM2. LDR at 60% level of conservation. TM2. TM3 within CMK receptors. There are eight different motifs were observed in loop regions and the well-known E/DRY motif in ICL2 is also found as DRYLA. Motifs such LFL. GLUR.

as expected. Primarily. sequence identity and evolutionary relationship. Preliminary analysis on identification of conserved “motifs” at 30% level of conservation (due to the evolutionary distance) for human-C. elegans GPCR clusters have been documented.6 PHYLOGENETIC ANALYSIS ON ORS IN SELECTED EUKARYOTIC GENOMES Olfactory receptors (ORs) belong to the largest group of class A type GPCRs (Gaillard et al 2004) and are fascinating for their vast practical applications. 7.247 whereas WPFG and LCK motifs were found exclusively in ECL2 of peptide type receptors. The current study (Chapter 6) is aimed to perform phylogenetic analysis on certain olfactory receptors in selected eukaryotic genomes. 371 OR sequences were collected from various data resources and . In most of the clusters. alignment procedure. In essence. The list of identified motifs from this study illustrates the conserved sequence properties (motifs) across two (or more) different receptor types and provide clues to connect common sequence properties observed at crossgenome level. membership/participation of sequences from particular genome. percentage residue conservation in ICL2 is higher than the other loop regions. the identified motifs emphasize the importance of conserved residues in terms of functional relevance across receptor types and the study is more useful since pursued at cross-genome level. The study on identifying conserved motifs and AAS at cross-genome GPCR cluster depends on number of sequences. sequence length. elegans GPCRs cluster dataset have been performed and handful of identified motifs (295 motifs) for the cross-genome human-C.

almost all receptors in HSC1 are from chromosome 11 and are related to class I type receptors. Motifs exhibiting 60% conservation were identified from 10 OR subclusters and 163 motifs were identified. Interestingly. Interestingly. loop-helix junction. loops. 2010). From the generated human OR phylogeny. a cross-genome OR phylogeny with human and selected fish ORs suggest that HSC1 is related to fish-like ORs (class I-type) (Section 6.1 in Chapter 6). et al. Among the 10 OR subclusters. is conserved both in HSC1 and in the ORs of fishes like zebra fish. showing the sequence diversity at intra-genomic level for the need of recognizing complex and diverse odors (Hayden. These include both common and cluster-specific motifs for 10 human OR subclusters to various topologies such as TM-helices. helix-loop junctions. HSC1 remains distinct and retains the class I type receptors and are found to be responsible for sensing the water-borne odours and could be fish-like ORs (Freitag et al 1998). N`-TM1junction and TM7C`junction (Table section 6. The other subclusters from HSC2-HSC10 were referred for the class II –type receptors (mammalian –like ORs). Human OR subclusters exhibit percentage identity ranges from 44% to 54%.2 in Chapter 6) in human OR phylogeny and cross-genome OR phylogeny of frog ORs (pertaining to dual lifestyle to sense both air and water-borne odors) with human ORs helped to discriminating the class I (air-borne) and class II type (water-borne) receptors in human OR phylogeny. Separately. This illustration further confirms the effectiveness of phylogenetic clustering in . “KAFSTC” motif related to class I type receptors. the selected human OR sequences were distributed in 10 subclusters (namely HSC1-HSC10) and showed remarkable differentiation in tree topology. Notably.248 unrooted NJ method of phylogenetic analysis was conducted for the 1000 BS replicates. N-.C-termini.. 50 representative sequences have been recommended further for three-dimensional modelling.

probably the observed reverse topology in the fly genome could be another reason for the observed lack of co-clustering with human ORs. namely DMC1-DMC10. The resultant phylogeny clearly depicts the distant cluster of Drosophila ORs and there is no-significant co-clustering between selected ORs of human and Drosophila. Interestingly. and life style of fruit flies in sensing specific odour. Also. It is also found that the known 24 antennal receptors (Hallem et al 2004. DmC8. Wistrand et al 2006) and the study in performing phylogeny on selected 60 Drosophila ORs established the cluster association as 10 subclusters. this emphasizes the necessity of identifying motifs to understand the sequence features at crossgenome levels. This proves that insect ORs are evolutionarily distinct from mammalian ORs. chemosensory receptors in nematodes are highly diverse and abundant.67b (DmC8) can be further examined for common motifs (at various level of conservations such as 30-60%). In parallel. whereas higher order organisms established a complex olfactory system. Dobritsa et al 2003) were distributed in eight subclusters. candidate OR sequences in the phylogeny contributed only 18% average sequence identity and show diverse requirement for fly olfaction. the OR83b and associated sequences such as 83a.63a. human ORs and OR-like sequences from yeast.104.249 associating related sequences across genomes. except in DmC4. An attempt has been taken to perform a cross-genome phylogenetic analysis on selected Drosophila ORs. Overall. predicted for secondary structures and to observe for ion-channel properties. This could be due to the independent evolution of fly ORs. Drosophila olfaction is an interesting field of study and notably fly ORs exhibit reverse topology (Benton et al 2006. As we know. Nearly 20 families of serpentine receptors participate in .

In the interest of performing the cross-genome phylogeny. mouse ORs also possess two broad classes of ORs and comparatively mouse ORs are abundant and more diverse than human ORs (Zhang and Firestein 2002). 72 mouse ORs were co-clustered with the given class I type of human ORs and the rest of the 45 human OR sequences were distributed along with other mouse ORs to represent class II type receptors. elegans which is capable of sensing di-acetyl compounds. namely Duff_human (recently evolved GPCR). The resulted NJ method of phylogeny exhibited significant co-clustering and notably. wherein mouse ORs are scattered in all the chromosomes except chromosome 12 and Y. . was also included along with the human and mouse ORs and it stays as an outgroup. Human ORs are predominantly distributed in chromosome 11 and 1. the possibility on critical analysis on mouse OR subclusters is limited and thus the phylogeny can be further improved by adding additional human ORs. all belonging to Str superfamily. a preliminary study was conducted with 50 representative human OR sequences and 410 mouse ORs and were co-aligned. This is also an appropriate example to emphasize the effective use of representative sequences in cross-genome phylogeny. As found in human olfactory receptors. A cross-genome phylogeny on selected human ORs with odr-10 and 82 homologues of odr-10 (Chapter 3) showed the lack of co-clustering. This further helps to discriminate the occurrence of class I and II type of ORs in the mouse. Since only limited number of human ORs were considered for cross-genome survey.250 chemosensation (Robertson 1998. the obtained subcluster namely CeC3 retains odr-10 and 22 serpentine receptors. Interestingly. A chemokine receptor. And particularly Str-115 is found to be the nearest homologue to odr-10 through phylogenetic cluster associations. Robertson and Thomas 2006) and particularly odr-10 is the only one annotated olfactory receptor in C.

Harini.251 Also. Nemato. Drosophila melanogaster. So. The given option “Sequence” provides non-redundant OR sequences for the targeted eukaryotic genomes. DOR. cross–genome OR alignments and phylogeny. And especially information such as OR sequences.provides sequence and structural information on olfactory receptors (OR) of selected organisms. and the performed cross-genome phylogenetic analysis addresses the issues on conserved evolutionary trends. The other option namely “TM-boundaries” provides the predicted TM-helices for each OR sequence with the start and end position for each predicted helix and the predicted boundaries for seven helices are given in seven different colour (VIBGYOR colouring scheme) for easy observation. and Homo sapiens with the related sequence information. are available attractive features to access. Japan) have been compiled and deposited to generate/construct a Database of Olfactory receptors namely DOR. Banglore). The given option “Alignment” provides not only the MSA for the single genome but also the cross-genome alignments. These alignments can be further used to detect conserved motifs . AIST. Mus musculus. 3D models of representative ORs (K. in the interest of creating a non-redundant data repository for the ORs in selected eukaryotic genomes such as Saccharomyces cerevisiae. tool for motif identification (TM-MOTIF). predicted membrane topology. a pilot study with ORs from human and non-human primates have been studied and predicted for membrane topology and analyzed for the cluster arrangements at intra-and inter genomic level. Caenorhabditis elegans. ORs of each genome is peculiar for their sense of olfaction. clustering and orthologs at intraand inter-genomic levels for olfactory receptors (OR) in selected eukaryotic genomes. dimer-interface predictions (Dr. NCBS. In essence.

The list of nonredundant OR sequence can be further used to train SVM to identify potential OR sequences.252 and particularly cross –genome alignments are very useful for the evolutionary perspective. An inbuilt dataset of 10 human OR–subclusters was available in the TM-MOTIF package and downloadable. The phylogenetic tree for uni-genome and crossgenome helps to study the cluster associations. The olfactory receptor models provide great opportunity to the users to analyse the interaction between helices. As an initiative in implementing the sequence knowledge. This would further help us in understanding the mechanism of function of olfactory receptors.and inter-genomic levels. The generated phylogeny (single and crossgenome) further helps to understand the sequence properties at intra. to select the representative sequences for further analysis. that can be also implemented to identify orthologs across genomes. TM-MOTIF– a tool to detect motif in the set of aligned OR sequences was incorporated in the database. The selected best representative sequences from the clusters can be gathered for homology modelling and to predict dimer-interfaces to discover functionally important residues and ligand binding pockets. species-specific cluster association and co-cluster association at cross-genome phylogeny. The dimer-interface prediction for every structure guides us further to study . the conservation of residues within helices and to generate electrostatic contour maps. These sequence studies could effectively be used to detect cluster –specific motifs from the MSA. User can also use their sequence of interest to view the alignment in VIBGYOR colouring scheme with identified conserved motifs along with AAS in each position of the alignment.

7.ncbs. pharmaceutical importance. The realization of related sequences across genome paves a way for comparative genomics.res. elegans is useful in introducing a profilebased clustering technique such as “RPS-BLAST”.in/DOR.253 the oligomerization process of these receptors and the functional significance of such higher order entities. DOR (Database of Olfactory Receptors) is an user-friendly and composite resource. the usage of . my research interest on “Genome-wide survey on certain mammalian GPCR and ORs” provides useful insights for the scientific community. sequence analysis on these proteins across genome provide excellent opportunity and responsibility to convey knowledge on sequence properties further to connect structure and function. The database can be accessed from http://caps. particularly scholars interested in olfaction and membrane proteins and can be applied to the fields of molecular modelling and drugdesign. to identify potential OR sequences and also implemented to identify orthologs across genomes. As we know. In short.7 SUMMARY To conclude. The study on cross-genome GPCR clustering of biologically significant GPCRs of human and C. with information on sequence and structural information of several ORs. The list of non-redundant OR sequences can be further used to train machine learning algorithms. membrane proteins are of utmost significance and are vital proteins for cellular activities. The users can retrieve and download information on both OR sequence and structure arena for five eukaryotic genomes. and related to human healthcare.

Besides. This fundamental understanding leads to assign functional relevance of the unknown sequence with the reference sequence which is high when degree of sequence identity/BS value/ E-value is favourable. evolutionary pressure also plays a major role in relating sequence features across genome. phylogeny-guided sequence analyses across genomes can be explored for the conserved sequence features like domain architecture.to predict putative candidate receptors across genome. odr-10 and its homologues to train SVM and also to identify putative chemoreceptors in other nematode species. when associated by clustering technique/phylogeny. amino acid substitutions and orthology. in turn. can be used for the support vector machine –a machine learning approach . The designed TM-MOTIF package is complete with alpha testing and is an user-friendly tool to visualize motifs in TM-proteins in VIBGYOR colouring scheme in the large alignment window. Receptor-specific-sequence properties. motifs. . Such studies inspire applying bioinformatic approaches in handling biological data effectively.254 viable model organism. Separately. It is helpful for generating pairwise alignment between query and a reference sequence (whose structure is known). Sequence information on known protein sequences guides to predict the structural and functional relevance of the unknown sequence. It can be used as an academic tool-kit to identify sequence motifs in membrane proteins. Exclusive study on serpentine receptors (in Chapter 3) could inspire to compare the illustrated sequence properties (such as motifs). The package can be used effectively to identify not only the conserved motifs but also substituting amino acids. the pre-requisite of homology modelling. and indeed a valid starting point to conduct experimental studies for functional implications.

Information on ORs. the fly and the worm GPCR clusters provide knowledge on conserved sequence features across taxa to establish structurefunction relationship further to apply for the vast practical applications. .255 Insights from the analysis of conserved motifs and permitted amino acid exchanges in the human. performing genome wide survey on GPCRs and ORs from selected eukaryotic organisms will improve scientific credibility and ultimately serve for human benefit. olfactosexual function and to study olfacto-neural communication. should assist the study of structural details. in forensics and defense studies. food industry. associated mechanism/ proteins (OBP). signaling for vast practical application in the fields of pest-control. pharmaceutical industry (aroma therapy). Thus. organized as a database of olfactory receptors (DOR). cosmetic industry (scent /perfume manufacturing). ligand-binding properties. olfactory disorders.

ICL2 Srh 3 530-532 PYR C' Srh [*/@/@/@/$/@/*/*][-/$//*/@/+/@/@/$/$/@/*][$/$/-//*/$/+/@/$/+/$/$/*][+/$/$] [$/@/$//$/+/@/$/$/$/@][*/*/+/@/@/@][+/+ /+/@/$/$/*/*] [@/*/@/@/$/@][*/*/@/@][@/*/@/ @/@] [+/+][+/$/$/*][$/-/+/+/$/+] [@/$/@/$][@/@/@][@/*/@/@/@] [@/*/@/@]$[$/$] [*/$][@/@][*/$/*][@/*/@][@/@/@ ][*/+/@/@/*] [*/@/@/@]$[$/$]*[+/+] [@/@][@/@/$][@/@/@/*] [*/*][$/-][@/*/@/@] [*/*/+/$][+/$/+/$][*/+] +[$/@/@/$][$/@/$] [*/*]+[*/*][@/*/@/@] [+/+/@][$/@/$][*/@/+] [+/+/@/@][@/$]@[@/$/@][@/@]$ [$/@/$] [@/$/@]$[@/$/$] [$/@/@][@/@/@/@][*/@/@/$][$//$][@/*/@/@] 1 2 3 1 35-37 143-145 357-359 22-24 IYL KHQ VLI VNP FIYLI F FGNY R LLL FNL YRY RCS YRYL RSW RALIV QT IPI TM1 ICL2 TM7 TM1 Sri Sri Sri Srj 2 3 4 5 6 7 8 9 10 11 26-31 38-42 44-46 51-53 69-71 99-101 116-119 200-202 252-258 260-262 TM1 ICL1 TM2 TM2 ECL1 ECL1.1 List of observed motifs in Serpentine receptor families (60 % level of conservation) S.TM5 TM6 TM6 Srj Srj Srj Srj Srj Srj Srj Srj Srj Srj 12 276-280 PIFGI ECL3 Srj .ICL2 ECL2.TM3 TM3.256 APPENDIX 1 THE LIST OF IDENTIFIED FAMILY-SPECIFIC MODIFS IN SR Table A1.ICL1 Srh 2 246-249 FENR TM3.No Alignment Position Motif Location Super family AAS [K/E/H/I/L/N/Q/R/S/ T][T/C/K/S][P/D/E/ K/L/S/T] [F/I/L/M/T/V/W/Y][ E/C/D/F/I/K/L/M/Q/ T/V/Y][N/C/D/E/F/ G/H/I/Q/R/S/T/Y][R /Q/S] [P/A/C/D/G/H/L/N/ S/T/V][Y/F/H/I/L/V] [R/H/K/L/Q/T/W/Y] [I/F/L/M/S/V][Y/F/ L/M][L/F/I/M/V] [K/R][H/N/Q/Y][Q/ E/H/K/N/R] [V/C/I/S][L/M/V][I/ F/L/M/V] [V/F/I/L]N[P/Q] [F/T][I/V][Y/C/F][L/ F/I][I/A/V][F/H/L/V /W] [F/L/M/V]G[N/S]Y[ R/K] [L/I][L/M/S][L/I/V/ Y] [F/Y][N/D][L/F/I/M] [Y/F/H/N][R/G/K/S] [Y/H] R[C/A/I/T][S/A/G] [Y/F]R[Y/F][L/F/I/ M] [R/K/L][S/A/T][W/I/ R] [R/K/L/V][A/T]L[I/ T/V][V/I]Q[T/A/S] [I/T/V]P[I/S/T] [P/A/L][I/A/M/V][F/ I/L/S][G/D/N][I/F/L/ V] Symbols for AAS [+/-/+/@/@/$/$/+/$/$][$/$/+/$][$/-//+/@/$/$] 1 148-150 KTP TM1.

ICL1 Srw [@/$/@][$/$/*/$/@/@/$/$][+/+/+/$/ $][+/-/$/+/$/$] [$/@/$/-/-/+/$/$][$/@/$//+/@/@/+/$/@][$/@//+/+/@/@/$/+/@][*/$][+/$//+/$/$/$] $[@/*/@][$/*] [+/@/$][@/@/@][$/@/$][$/*] [@/*/$][$/@][*/@/*] [$/$][$/$][+/$]@@@ [$/$][@/@][*/@/@] [@/$]$[@/@] $[$/@]@ [*/*][$/$][$/$]$$@@ [$/*]@[$/@][*/*] [@/$/@][$/@][@/@] [$/@/$][$/$/*][$/@] *--[$/$][@/@/+] 2 1 2 3 4 5 6 7 8 9 10 11 12 660-664 28-30 32-35 44-46 65-70 73-75 77-79 124-126 128-134 143-146 165-167 177-179 184-188 SSQY R SLN KISQ LTF STKIL L NLF ANL SGM YGQT GLL CATF ISI STG WDDP L TM7.ICL1 ECL1.C` N` TM1 TM1 ICL1.TM3 ICL2 TM4 TM6 Srbc Srbc Srbc Srbc Srbc Srbc [Y/M/N/S][L/I][L/I/ V] [S/I/K/T][I/T][F/L/V ] [K/R][N/S][L/F/I/V] [F/S/V][P/S][I/L/Q/ T/V] [L/I/M][F/Y][G/C/E/ V] [I/F/L/V]A[L/M][L/ F/I/V]D [I/L/M/V][D/I][R/I/ K/L/V][L/F/V/Y][I/ L/R/V/Y] [L/C/I][T/C/F/G/I/L/ N/S][R/H/K/N/Q][K /E/P/R/S/T] [S/A/C/D/E/H/N/P][ S/A/C/D/H/I/L/R/T/ V][Q/A/E/H/K/L/M/ N/R/V][Y/C][R/C/E/ K/N/Q/S] S[L/F/I][N/W] [K/I/Q][I/A/L][S/A/ N][Q/F] [L/F/T][T/A][F/L/Y] [S/G][T/S][K/Q]ILL [N/T][L/I][F/L/V] [A/T]N[L/I] S[G/V]M [Y/F][G/C][Q/S]TG LL [C/F]A[T/L][F/Y] [I/T/V][S/A][I/L] [S/I/T][T/S/Y][G/A] WDD[P/S][L/I/R] [*/@/$/$][@/@][@/@/@] [$/@/+/$][@/$][*/@/@] [+/+][$/$][@/*/@/@] [*/$/@][$/$][@/@/$/$/@] [@/@/@][*/*][$/$/-/@] [@/*/@/@]@[@/@][@/*/@/@][@/@/@/@][/@][+/@/+/@/@][@/*/@/*][@/@/+ /@/*] 1 172-176 IDRLI TM3.No 13 Alignment Position 304-307 Motif AIIL Location TM7 Super family Srj AAS A[I/L/V][I/V][L/F/I/ V/Y] [Y/C/F][R/L][Y/A/C /F/H/L/S/T/V/W] [Q/D/E/N][L/F/I/Y][ F/H/L/M/T/V/Y] G[P/F/L/S/V/Y][C/ G/I/L] [Y/F/I/L/T/V][F/H/S /T/Y][V/F/I/L] [P/H/I][Y/F][R/F/K/ Q] Symbols for AAS @[@/@/@][@/@][@/*/@/@/*] 1 2 268-270 493-495 YRY QLF TM3.TM2 TM2 TM2 TM4 TM4 ECL2 TM5 TM5 ICL3 Srw Sra Sra Sra Sra Sra Sra Sra Sra Sra Sra Sra Sra .TM6 Str Str [*/$/*][+/@][*/@/$/*/+/@/$/$/@/*] [$/-/-/$][@/*/@/*][*/+/@/@/$/@/*] 1 2 3 124-126 409-411 413-415 GPC YFV PYR TM2.ICL2 Srsx 1 150-153 LTRK TM1.ICL2 ICL3.257 Table A1.ECL1 TM7 TM7.1 (Continued) S.C` Srd Srd Srd $[$/*/@/$/@/*][$/$/@/@] [*/*/@/@/$/@][*/+/$/$/*][@/*/@/ @] [$/+/@][*/*][+/*/+/$] 1 2 3 4 5 6 25-27 29-31 93-95 127-129 161-163 237-241 YLL SIF KNL FPI LFG IALLD TM1 TM1.

TM1 TM1 ICL1.ICL3 TM7 Super family Sra Sra Sra Sra Sra Sra Sra Sra Sra Sre Sre Sre Sre Sre Sre Sre Sre Srv Srv AAS [F/A/I]N[L/C/F] [Y/H][N/K][K/D/E] IC[F/S][L/V][T/A/N ] [F/A/W][M/L/V][F/ S] [Y/N/S][S/T][F/A/S] [G/A] [V/I][V/A/Q][W/Y] [P/V][F/I/Y][I/G/V][ A/N/V][L/A] [K/T][Q/G]T[Q/V][ D/E] H[I/M][K/N/S][Q/H/ S] MI[F/I] P[I/T/V][Y/F/T] WT[D/K/S][D/I] [F/L/T][F/Y][N/H/Q ] [R/Q][F/Y]Q[A/V][ K/M/R]EN [F/V][E/D/Q][N/A/S ] [L/V][N/G]P[L/S/V] ETD [L/I][R/H/K][K/E] INP [N/E/I/K/V]R[F/T/V /W/Y] [Y/C/F][G/M/V][S/F/I/ L] [I/F/T][P/H/Q/S/T][L/F /M] [Y/A][N/D/G/K/S]C[S/ P] [R/H/Q/Y][P/Q/T/Y][I/ F/L/P/V] [L/F/I/V][Y/T][I/F/L/T/ V][P/I/L] [K/E/Q/R][I/L/M/T/V] M [N/H/S][S/C][I/F/L/V] [Q/E/F/I/K/M/Y]G[A/I] [V/A/S][F/Y]C [L/F/P][I/F][Y/F][I/C/F /L/V] [W/Y][F/L][F/Y][D/N] P [I/A/L/V][Y/S][V/E/I/T ] [M/E/I/S][N/E/F/M/Q/S ][F/L/Y] [I/A/S/V]Y[L/F/I] [T/A/L/M/Q][I/M]R[N/ K/Q/S] Symbols for AAS [*/@/@]$[@/$/*] [*/+][$/+][+/-/-] @$[*/$][@/@][$/@/$] [*/@/*][@/@/@][*/$] [*/$/$][$/$][*/@/$][$/@] [@/@][@/@/$][*/*] [$/@][*/@/*][@/$/@][@/$/@][@/ @] [+/$][$/$]$[$/@][-/-] +[@/@][+/$/$][$/+/$] @@[*/@] $[@/$/@][*/*/$] *$[-/+/$][-/@] [*/@/$][*/*][$/+/$] [+/$][*/*]$[@/@][+/@/+]-$ [*/@][-/-/$][$/@/$] [@/@][$/$]$[@/$/@] -$[@/@][+/+/+][+/-] @$$ 1 182-184 NRF TM3.TM2 TM2 TM2.ECL2 ICL3 ECL3 TM7 C` TM5.ECL1 ECL1.ICL2 Srx [$/-/@/+/@]+[*/$/@/*/*] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20-22 26-28 33-36 97-99 181-184 202-204 218-220 228-233 238-241 313-317 393-395 397-399 424-426 431-434 YGS IPL YNCS RPI LYIP KIM NSI QGAVF C LIYI WFFDP IYV MNF IYL TIRN N` N` N` N`.1 (Continued) S.258 Table A1.No 13 14 15 16 17 18 19 20 21 1 2 3 4 5 6 7 8 1 2 Alignment Position 219-221 233-235 261-265 271-273 275-278 298-300 305-309 337-341 343-346 1-3 18-20 114-117 198-200 255-261 305-307 311-314 376-378 260-262 348-350 Motif FNL YNK ICFLT FMF YSFG VVW PFIAL KQTQ D HIKQ MIF PIY WTDD FFN RFQA KEN FEN LNPL ETD LRK INP Location TM6 ICL3 TM6 TM6 TM6 TM7 TM7 C` C` N` N` ECL1 TM4.TM3 ECL2 TM6 TM6.ECL3 TM7 C` Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt [*/$/*][$/@/@][$/*/@/@] [@/*/$][$/+/$/$/$][@/*/@] [*/@][$/-/$/+/$]$[$/$] [+/+/$/*][$/$/$/*][@/*/@/$/@] [@/*/@/@][*/$][@/*/@/$/@][$/@/@] [+/-/$/+][@/@/@/$/@]@ [$/+/$][$/$][@/*/@/@] [$/-/*/@/+/@/*]$[@/@][@/@/$][*/*]$ [@/*/$][@/*][*/*][@/$/*/@/@] [*/*][*/@][*/*][-/$]$ [@/@/@/@][*/$][@/-/@/$] [@/-/@/$][$/-/*/@/$/$][*/@/*] [@/@/$/@]*[@/*/@] [$/@/@/@/$][@/@]+[$/+/$/$] .

ICL2 TM4 TM4 ECL2 ECL2 TM6 family Srg Srg Srg Srg Srg Srg Srg Srg Srg AYL RILYV PQLC NRMS APF IWN GGF WAS VTT AAS @*@ +@@*@ $$@$ $+@$ @$* @*$ $$* *@$ @$$ Symbols for AAS S.1 (Continued) Alignment Position 32-34 43-47 90-93 131-134 160-162 165-167 180-182 192-194 212-214 Super Motif AYL RILYV PQLC NRMS APF IWN GGF WAS VTT VGSP LV Location TM1 TM1 ICL1 TM3.259 Table A1.No 1 2 3 4 5 6 7 8 9 10 284-289 TM7 Srg VGSPLV @$$$@@ 1 72-74 ILL ICL1 Sra [I/C/S/T/V]L[L/I/S] [R/Q][F/Y][Q/H/N/ R] [@/$/$/$/@]@[@/@/$] 2 282-284 RFQ ICL3 Sra [+/$][*/*][$/+/$/+] .

Vol. and DeFazio. Z. Vol. 2000. 33.. 3. 4. Altschul. M. Noll. Alfarano. M. Andrade. pp. Vol. V. 23. 417-30.. "The Biomolecular Interaction Network Database and related tools 2005 update". B. 5. K.L.260 REFERENCE 1. Buzadzija. N. J. A. 2005. Bantoft. a putative receptor for the hedgehog signal" Cell. Vol. "The SWISSMODEL workspace: a web-based environment for protein structure homology modelling".R. Ache. Adams.W. 8. K.J. M. K. Betel.... 2005. "Model of amino acid substitution in proteins encoded by mitochondrial DNA". 3389-402. D. S. J. M. 1997. Kopp.A. 2185-95. Cavero. Zhang.. BMC Genomics.... Dumontier. Ayzenzon. PubMed PMID: 21548958. T. pp. Vol. and Hooper. D418-24. Bobechko. 86. 6.. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". 2011. and Lipman. "The genome sequence of Drosophila melanogaster". I.. Donaldson. Burgess.J. 37. Nucleic Acids Res. "The location of olfactory receptors within olfactory epithelium is independent of odorant volatility and solubility" BMC Res Notes. Vol. pp. D. Alcedo. T.M.S. pp. 9. Vol...E. No.. 2005. L. pp.. C. C.. .. R. pp. 221-32. "Olfaction: diverse species. M. pp. C. J.. J. Vol... K. I. and Earles. 1996.. Schaffer. Anthony. Miller. 173. 25. Von Ohlen. 195-201. 7. and Young. T.. 42. and Ngai. W. D'Abreo. "The Drosophila smoothened gene encodes a seven-pass membrane protein.. J Mol Evol. "The odorant receptor repertoire of teleost fish". Arnold.. J. M. 2006. 48. Madden. T. 2. T. Vol. Nucleic Acids Res. Bioinformatics. Adachi. K. Science. pp. 287. E. Boutilier. 1996.E. 6. D.. Bordoli. Alioto. A. conserved principles" Neuron. 22. and Schwede. Zhang. Dorairajoo. pp.. and Hasegawa. PubMed Central PMCID: PMC3118157. Dumontier.F. 459-68. M..D. Abaffy. Bahroos. B.R. Bajec. J. 6.

20. "Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse" Genomics. S. R.. "Comparative chemosensation from receptors to ecology". P. 2006. 1999. 88. 1723-9. pp. 4. Andrew. and Srinivasan. K. Sachse. 30. R. "Atypical membrane topology and heteromeric function of Drosophila odorant receptors in vivo".P. Michnick. 2006. "Molecular tinkering of G protein-coupled receptors: an evolutionary success".. R. S. H. Howe. S. 2002.. "Odorant response properties of convergent olfactory receptor neurons" J Neurosci. E. "A method to identify protein sequences that fold into a known three-dimensional structure". Y. pp.261 10. 225-30. and Schioth.. C. pp.U. and Sonnhammer. Neuron. Bateman. 2006.R. "Cascade PSI-BLAST web server: a remote homology search tool for relating protein domains". 15. Vol. Vol. Marshall. J. pp. and Ngai... and Kauer. 1996. Etwiller. Birney.. Bjarnadottir.L. 253. Vol. A. 1991. Samos. pp. "Asynchronous onset of odorant receptor expression in the developing zebrafish olfactory system". L. R. PLoS Biol. and Pin. J. 164-70.. Abhinandan. 12. 17. D. 18.E. 11. Barth... 16. Bhadra. Vol. T.C..L.. 444. Luthy. 276-80.. Berry. Brink. N. L. T. M. 34. S. pp. D. R. J.. 1996.. Chakrabarti. M. 295-301. J. Nathans. Hellstrand. Justice.. Durbin.. 18..R. Bockaert. 4560-9. Kristiansson.C. C. N. H. A. Sandhya. Eddy. Vol. Cerruti. and Vosshall. pp. R. 18.D. and Nusse. Vol. J. Vol.K. S.. Gloriam.P. 1998.. Bargmann. 23-34.W.J. Nucleic Acids Res. Bozza.. 16. R.. pp. W143-6. pp. E.H. S. Griffiths-Jones. Vol.S. 2007. 19. 2006. Bhanot. pp.. Embo J.. L. 13. "A new member of the frizzled family from Drosophila functions as a Wingless receptor".B. Vol. K. Hsieh. Benton. Vol.H. Bowie. J. 382. and Eisenberg. "The Pfam protein families database" Nucleic Acids Res. D.. Science. Nature. Wang.I. 263-73. "The potential of trace amines and their receptors for treating neurological and psychiatric diseases" Rev Recent Clin Trials. pp. Nature. Sowdhamini. J. Macke. S. M. e20.B. J... Vol. 2. . Fredriksson.L. 14.

Johnsen. A. Keith Studley. and Axel. D. D. 22. A. B.B. Bender. H. W. pp. "Identification of a nematode chemosensory gene family"... Rogers.D... 21.. A. 2005. J. Tran. D. Axel. and Axel.. Vol. 23.S.. Cell. M..W. 77. M. Harris. Bulger. 377. 27. D. PubMed PMID:1840504.. N. and Stein. C. 2003. 4723-8. Vol.. Genome Res.A. and Groudine. 28. Chang. "Olfactory receptors: molecular basis for recognition and discrimination of odors". Chen. Mah. 1974. Anal Bioanal Chem. A. "Wiring optimization can relate neuronal structure and function". pp. Vol. and Fay.. B. and Saier. Simon. 71-94. elegans ORFeome". Protein Sci. "The C. and Rost.. pp.pp. 26. 30. Brenner.H.. 2006. Cho.12. No. Bradnam. 2155-61. Chen.. 427-33. Proc Natl Acad Sci U S A. May-Jun. Farrell.H. Mol Membr Biol. 31. L.W. elegans glycopeptide hormone receptor ortholog. 29. Altun. Proc Natl Acad Sci U S A. Z. . 2004 Chen. Proc Natl Acad Sci U S A. Curr Biol. Baillie. S. Chen. Breer. Telling. A. Epub 2006 Mar 14. Felsenfeld. 171-81. 5129-34. R. 25. D.P.262 21. 17.. 10B. 102. M.. 1999. S. L. "A novel multigene family may encode odorant receptors: a molecular basis for odor recognition". "Phylogeny as a guide to structure and function of membrane transport proteins".D.. Lawson. 78. R. and Stein.. FSHR-1. pp.. 2007. Pai. H. C. Z. Saitoh. T. L. 14. C. "WormBase as an integrated platform for the C. R. 823-34. 1994. 203-12. 146-51. S. M. Lin. Vol. K. Cell. "Conservation of sequence and structure flanking the mouse and human beta-globin loci: the beta-globin genes are embedded within an array of odorant receptor genes".. Chess.B.G. 2004. Buck. Cedar. Vol. G.L. pp. 24. 96. Vol.. Moerman. K. regulates germline differentiation and survival". Vol. "Transmembrane helix predictions revisited". R. van Doorninck.C. No. 12. 2002. N. pp.. Zhao. Vol.. N. Genetics.. A. Hall. Kernytsky.H. 21. R.2774-91. "The genetics of Caenorhabditis elegans". PubMed PMID: 16537428.L. "Allelic inactivation regulates olfactory receptor gene expression".. Vol.V. Vol. Newbury. Vol. pp. and Chklovskii. I.. PubMed Central PMCID: PMC1550972.. D. pp. pp.

39. pp. 37. Zhang. and Shepherd.B. Nucleic Acids Res. Vol.. "Olfactory Receptor Database: a metadata-driven automated population from sources of gene and protein sequences". Eisel. "Of worms and men: an evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families".. F. 13.H. Hartung. Daniel.M. 30. "The olfactory receptor family album". Lessing. P. Strelets. M.. J. P. Clyne. "Peptide and trace amine orphan receptors: prospects for new therapeutic targets". . Roubin.L.. J. W.I. D.L. Kim.. B.263 32. Curr Opin Pharmacol. Genome Biol. 1997. Vol. J Mol Evol. 3.. L. 34.R. Chou. C.. 40. J. pp. J. 41. 2001. V. 354-60.A. "A novel family of divergent seven-transmembrane proteins: candidate odorant receptors in Drosophila". P. D486-91. U. 1027.G. pp. L. J. O'Connell. P. Spletter. Nucleic Acids Res. Vol. Goodman.R. 35. M. Warr. Crosby. J. 22. pp. 281. R. 2007. 2002.... Goldfarb.. Vol.. M. 2000. Neuron. Grant. Yaksi.S.. Raming. "Odorant response of individual sensilla on the Drosophila antenna". and Carlson.C. 11. 33. 36. Vol.. A. 3926-34. Levai. M. 2006. Vol. 127-35. Breer. 2010. 1997. Freeman. Vol. K.. pp 127-34. C.M. Eur J Neurosci.. M. and Gelbart.. D.. H.. 439-49. Journal of Biological Chemistry. Vol. 44. E. and Carlson. Crasto. Bode. Coulier. G. Y.R. and Birnbaum. P. Invert Neurosci. Singer. C. 1999. Clyne. "FlyBase: genomes by the dozen". O. Davenport. Conzelmann. R. 35. Miller... Wilson. pp. Vol. "novel brain receptor is expressed in a distinct population of olfactory sensory neurons". G. 34942–54. and Luo.. pp.. Leong. and Shepherd. 2. pp. Vol. Nat Neurosci.. S.J. 3.. 12. J. Scott and Sharon Layfield "Characterization of novel splice variants of LGR7 and LGR8 reveals that receptor signaling is mediated by their unique low density lipoprotein class a modules".. 38. "Diversity and wiring variability of olfactory local interneurons in the Drosophila antennal lobe". Marenco. Crasto. 327-38. R. No. 43-56. P. H. 2003. pp. and Strotmann.. Pontarotti. pp.

.. Vol. S. pp.. "Hypogonadotropic hypogonadism due to loss of function of the KiSS1-derived peptide receptor GPR54". "Integrating the molecular and cellular basis of odor coding in the Drosophila antenna". 2003. C. Vol. 2002.. 345–352. 11. Spyropoulos.. Chaussain. "A database for G proteins and their interaction with GPCRs". 50. Meister. 1998. Vol. 46. 154-70. Weinger. "Odor processing in the frog olfactory system" Prog Neurobiol. Vol. H. M. J. 3.. and Engelman. McNealy. V. Duchamp-Viret. Dobritsa. 1997..C. 561-602. and Goffeau. 2003. pp. Vol.C. "A novel family of genes encoding putative pheromone receptors in mammals". 83.C. "A model of evolutionary change in proteins".O. Baret. 43. 2. Cell. J. and Hamodrakas. C. 316. Darmanin.R. Steinbrecht.. Vol. N. Neuron.L. Vol. K. B. E. P. 53.. B. Elefsinioti. pp. 100. Carvajal.R. 45..J. E.S. pp. Oka.A. "Motifs of serine and threonine can drive association of transmembrane helices". Vol.. and Duchamp. Eddy. Dulac.. Varghese. No. "Phylogenetic classification of transporters and other membrane proteins from Saccharomyces cerevisiae". Vol. 799-805. B.. 14.G. pp. S. pp. 44. and Axel. D. F. E. S. "Spatially restricted expression of candidate taste receptors in the Drosophila gustatory system" Curr Biol. Dayhoff. Quiney. pp. 20. 48.. No. Dujon. "Profile hidden Markov models" Bioinformatics. P.M.P. Atlas of Protein Sequence and Structure. and Carlson. Talla. and Amrein. 827-41. A. J.. Funct Integr Genomics. Protein Sci. 822-35. R.A. R. 755-63. A. and Nugent. 457-64. Vol. P. 1978..N. C. H. 19.W. Vol.264 42.L. 208. 22. 47. Wilkins. De Roux. N. van der Goes van Naters. 52. Dawson.. and Orcutt. 5.. T. No.. pp. Proc Natl Acad Sci U S A. 2001. P.G. 2002. 49. 5. ..S. A. Schwartz. E.. 1995.. R. H. J.. Warr.. Yagi. 2004. Matsuda. Dilanian. 2011. I. pp. 195-206. BMC Bioinformatics..M. 51. J Mol Biol. Dunipace. De Hertogh.A.. C. R. J. pp. Carel. and Milgrom. 10972-6. pp. L. J. 37. "A new approach for structure analysis of two-dimensional membrane protein crystals using X-ray powder diffraction data".M. Bagos. Genin. PubMed PMID: 11866532.

Vol. 226. Strotmann. M. 391.B. Nature. Freitag.. J Mol Evol. 55.. F.. Vol. and Werblin. Elphick. 1414-25.K. G. Fire.. 2000. pp. 1996. 1995. 60. Vol. 211-8. and Breer. I. J. "Evolutionary trees from DNA sequences: a maximum likelihood approach". Kostas. 165-74. 1989. Fredriksson. pp 93-104. Biochem. 62. 64. Driver and Mello. 806-11. J. Freitag. 58. and Egertova. 63. "Olfactory receptors in aquatic and terrestrial vertebrates" J Comp Physiol A..E. Felsenstein. pp. G. Xu. Rossler. Vol. 1998.P. P.. Skupski. pp. C. Beck. 381-408. 67. 54. 244. Vol. . M. 57. L. Jupe. A. 368-76. 150.. M. Vol. S. 15. and Churchill.. Andreini.C. Nature.M. J. R. F23-30. "The neurobiology and evolution of cannabinoid signalling" Philos Trans R Soc Lond B Biol Sci. 1981. and Breer.. Ludwig. J. Mol Biol Evol. M. 59. 13. and Holbrook. K.. M. H. and I. 61. "The repertoire of G-protein-coupled receptors in fully sequenced genomes". pp. 2005. G. PubMed PMID: 11557990. Vol. "Bioinformatics and type II Gprotein-coupled receptors". "Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans". 1998.S. Felsenstein. Firestein. 17.. 356. 413(6852):Review.265 53. von Buchholtz. 2002. H. 183. S. 30. Vol. Fortini. 1999. Vol. 56. Vol. pp. and Breer. pp. J. "Odor-induced membrane currents in vertebrate-olfactory receptor neurons" Science. Soc Trans. Vol.. "Two classes of olfactory receptors in Xenopus laevis" Neuron. Vol. S. Firestein "How the olfactory system makes sense of scents". pp. M. 2001. Mol Pharmacol.. Boguski.. Montgomery. J. 2001.A. pp. "A Hidden Markov Model approach to variation among sites in rate of evolution". pp. 79-82.R. J. H. Krieger. J.. H. Ludwig.E. Hariharan "A survey of human disease gene counterparts in the Drosophila genome" J Cell Biol. S. and Schioth. 473-9. S. Freitag. Foord. pp. "On the origin of the olfactory receptor family: receptor genes of the jawless fish (Lampetra fluviatilis)" Gene. 635-50. 1383-92. pp.. S.A.

pp. "Identification of candidate Drosophila olfactory receptors from genomic DNA sequence". A.266 65. Gether. 91. Vol.W. 60.. Olender. M. and Pearson. 2004.. and Christmann.. pp. and Chess. BMC Evol Biol. 66. "Common peptides shed light on evolution of Olfactory Receptors". 11. 6. Q. E. Vol. 2012. D. I. Harini. "The complete human olfactory subgenome". Friedrich. Vol.G.A. D. 1676-7. 2010. 456-69. "LIN-12/Notch signaling in C. Gaillard. pp. 75. S. and Horn. 76. 1999. "Homologous over-extension: a challenge for iterative similarity searches". Vol. 70. I.pp. 1988. J. Gonzalez.W. Science. S. 117. and Sowdhamini. Vol. 2007. and Giorgi. 71. "Botany. A. Vol. 2177-89. "The molecular basis of odor coding in the Drosophila antenna" Cell. 315. "The protein kinase family: conserved features and deduced phylogeny of the catalytic domains". 38.Vol. K. 67. Neuron. Bioinformatics and Biology Insights . melanogaster". and Lancet.. I. and Carlson. Rouquier. M.. "Olfactory receptors" Cell Mol Life Sci. pp. "Uncovering molecular mechanisms involved in activation of G protein-coupled receptors" Endocr Rev. Vol. 965-79. an olfactory receptor in D. A. Yanai. pp.K. A plant receptor with a big family" Science. U. Gottlieb. pp. 1-16. 12. Genome Res. Grill. Rubin. 90-113. 685-702. E. G. 21. Hanks. 74. 42-52. Vol. T. 241. 69. Greenwald. 2009. Gao. D.R. Ho. 737-52. Genomics. and Hunter. Glusman. Quinn. p. A. 2001. 9. and Korsching. D. Lancet. I. pp. Hallem. 61. 33–47. 2005. "Molecular modelling of oligomeric states of DmOR83b. 2000. W. 72.. R. S. 1997. .M.. "Combinatorial and chemotopic odorant coding in the zebrafish olfactory bulb visualized by optical imaging". pp. 18. 31-9. 73.I. Vol. T. R. pp. 68. Nucleic Acids Res. elegans " WormBook. Vol. pp. 2004.R. Vol.

A.M. "A single hydrophobic to hydrophobic substitution in the transmembrane domain impairs aspartate receptor function". 79.A. 2009. and Bazzicalupo. Coulson. C. 1211-7.C. 86.. Jones. pp..P. 1992.. 2010. pp. 10915-9. Murphy. Vol.. 2002. elegans: So many genes. 26. Curr Biol. 2008. Proc Natl Acad Sci U S A.J. L. 14. pp.Y. Fu. E. 378-9. 12.. M. Y. Hillier. 89. C.I.. 81. S. L. R. 8. Hirokawa. "Impact of GPCRs in clinical medicine: monogenic diseases. and Hu. V.. 82..R. 80. Vol. M. 84. 322.. and Stevens. S. E. Jeffery. W. and Michel.H... Science. pp. Boon-Chieng.T. elegans responds to chemical repellents by integrating sensory inputs from the head and the tail". J. Y. Z. "CD-HIT Suite: a web server for clustering and comparing biological sequences".G. 2010. Lane. Henikoff. and Li. 85.W... "Ecological adaptation determines functional mammalian olfactory subgenomes". 2005.A. 275-82. genetic variants and drug targets". Biochemistry. Taylor. 10. 1998..C. Mariani. 83. 15. P. Bargmann. 1651-60. Z. pp. "C.P. R. Griffith.A.. 730-4. 1-9.I. Niu. S. Tang. Chien. Vol. pp. Vol. Vol. 33.E. D. Sulston. I. 20. B. Biochim Biophys Acta. W. Bekaert. "SOSUI: classification and secondary structure prediction system for membrane proteins". Cherezov. Vol. Jaakola. 680-2.E. Hayden. and Waterston. Y. Vol. and Teeling. M.267 77. Y. pp..6 angstrom crystal structure of a human A2A adenosine receptor bound to an antagonist". T. S. 1992. M. and Mitaku. Ijzerman. Zhang. Comput Appl Biosci. Insel. 87. pp. 78.. . "The repertoire of G-protein-coupled receptors in Xenopus tropicalis". "Amino acid substitution matrices from protein blocks". BMC Genomics. 2007.M.R. pp. 263. Ji. Gao. V. A. such a little worm". Hilliard. S. W.. Genome Research. J. Huang. 3457-63. Bao. "The 2. Hanson. Genome Res. pp.T. J. Crider. Bioinformatics. and Thornton. "Genomics in C. 994-1005. P. Hahntow. Bioinformatics. T. Vol. 1768.A. A.. pp..J. and Henikoff. J. D... "The rapid generation of mutation data matrices from protein sequences". Murray. and Koshland. Vol..C. 1994. J. C. Vol. Vol. M.

193. 2010. Epub 2002 Oct 17. and Methner. p. Suganthan. L. Vol. 96. 3059-3066. 94. "Phylogenetic analysis of 277 human Gprotein-coupled receptors as a tool for the prediction of orphan receptor ligands". "Odor responses after complete desensitization of the cAMP-dependent pathway in turtle olfactory cells" Neurosci Lett. Kang. and Caprio. M.. pp. Pugalenthi.K. Hartmann.U. 1991. Molecular Bio Systems. 14. Vol. "Cloning of a putative G-protein-coupled receptor from Arabidopsis thaliana". and Kurihara. Vol. Manoharan Malini. 93.G. 846-56. 1995. PubMed Central PMCID: PMC133447. 2007. K. EL. Misawa. Vol. L. Physiol. pp. 63. 699–721. 35. T. Moller. and S. E. pp. pp. 90. No.391. "SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes" Biochem Biophys Res Commun. Karplus. Vol.. Joost. A. K.. 98. L. 1998.. 3. G. 30.. S. Kashiwayanagi. Josefsson. pp. "Electro-olfactogram and multisubunit olfactory receptor responses to complex mixtures of amino acids in the channel catfish. J. C. Ictalurus punctatus" J. Vol.. 415-20. 1997. P. PubMed PMID: 12429062. and Martinetz. Kalies. "Hidden Markov models for detecting remote protein homologies" Bioinformatics. 89.pp.. Panicker and Ramanathan Sowdhamini "Molecular modeling and docking studies of human 5hydroxytryptamine A (5-HT2 A) receptor for the identification of otspots for ligand binding".5. Eur J Biochem. Kandaswamy. 2002. Barrett.. Genome Biol.1306-11. 429-32. 2009. 17. Nucleic Acids Res.268 88. Vol. Gen.1877–88. Vol. Aditi Bhattacharya. and Rask.. Vol. 2002. K. and Hughey. P. K. A. 61-4. Karuppiah Kanagarajadurai. Kuma and Miyata "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform" Nucleic Acids Res. "Advantages of combined transmembrane topology and signal peptide prediction the Phobius web server". R. 91. Katoh. 249.M. pp. . (Web Server issue). Krogh.Mitradas M. 92. J. pp. 95. Käll.

9. Jimenez. P.R. Kumar. 2008. von Heijne.. "The IntAct molecular interaction database in 2012" Nucleic Acids Res. "Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes"..... A.K.E. Masson. D. K. Sato.S. pp. Vol. pp. elegans". pp. 104. 127-38. K. Zeiger.. 101. No. 2004. pp.R. Mahadevan. 994-8. 2009.. "The use of functional genomics in C. 21-80. elegans for studying human development and disease" J Inherit Metab Dis. 326... Vol. Krogh. Kim.. 299-306. N. pp. "A hidden markov model that finds genes in E. E. 17481-7. 40. 2009. Shibuya. Rohde. G. Nagel. Broackes-Carter. P. Kerrien. Cook. 567-80. 99. Ferin. D841-6.. C. Kristiansen. "Molecular mechanisms of ligand binding. S.. 1828-39. Science. E. L. A. Brosig. Orchard.. 2001.. Veatch. 22.. Raghunath.. Butcher. K. 98. Kuwabara. . J. 1994b. Laage. Hinz. B. G.M. Vol.A. Vol. N.. A. pp. Duesbury. pp. A. pp. U. J.. 2.. 275. Protein Sci. Pfeiffenberger. Vol.. D. 103. 2001. M. Mozumdar. "A conserved membrane-spanning amino acid motif drives homomeric and supports heteromeric assembly of presynaptic SNARE proteins". 100. H. Detitta. M. and Sonnhammer. P. J. Krol. A. 4768–4778.269 97. K. S. Jandrasits. Acids Res.. R. Aranda...C. E. Mian. I. Brief Bioinform. 102. R. K. 22.. pp. Vol. Clardy..G. M.. B. M. "Two chemoreceptors mediate developmental effects of dauer pheromone in C. D..coli dna" Nucleic. Porras. Roechert... I. J. J.. R. No. 305. B. 103. signaling. Khadake. Dudley. Vol.. "MEGA: a biologistcentric software for evolutionary analysis of DNA and protein sequences". Vol. Pharmacol Ther. J. B. and Langosch. 2000. and Hermjakob... and Tamura.. M. "Determination and application of empirically derived detergent phase boundaries to effectively crystallize membrane proteins". and Malkowski. A. Mol.. and regulation within the superfamily of G-protein-coupled receptors: molecular modeling and mutagenesis approaches to receptor structure and function".T. Breuza. Ragains.L. Touhara. Vol. 105. Bridge. M. U. and O'Neil. PubMed PMID: 10764817.. F. C. Chen.... P. Nei. C. 18. Koszelak-Rosenblum. R. and Sengupta. Luft. 9... M.. Larsson. Feuermann. Pedruzzi. and Haussler. Wunsch.. K.. Krogh. Dumousseau. Biol. S... J. 2011..

. 1985. Carlson. K. pp. O. Vol. Domingos. Vol. Vol.. Villa. and Lin. Vol.. Leonov. 21. L. pp.270 106. M. J. Li. Kall. "Receptor basis for dopaminergic supersensitivity in Parkinson's disease". 2002. Pilpel. 12. Okuno.J. Lipman. P.. Laskowski. Larsson. T. M.. P. 343. 296-306. Y.A. 71. 2001..T. 111. I. H. 108. Genome Res.E. "Structure of bovine rhodopsin in a trigonal crystal form". 43.a program to check the stereochemical quality of protein structures". 1435-41.L. M. Falcovitz...B. J. Gilad. Neuron. and Knight. MacArthur. Heijne. Lundin. H. T. Current Sociology.Y. 64. 110. Lapidot.. and Hornykiewicz. Lao... K. 59-61. "Evaluating transmembrane topology prediction methods for the effect of signal peptide in topology prediction". Seeman. G.. and Schertler. and Pearson. D. "Closure of the NCBI SRA and implications for the long-term future of genomics data storage". M.. Edwards. 227.. 112.E. 703-13. pp.W. Sharon.R.J. pp. L. Gerstein. P.. In Silico Biol.Y. 273.. Vol. Vol. Chiappe. "Mouse-human orthology relationships in an olfactory receptor gene cluster" Genomics. D. D. W. 2011.S..F. 2604-10. Vol. Ch'ang. 1993. C. A. Amrein. Vol. Jones. R. and Vosshall. Rajput. E. 115. Farley. and Shimizu. pp. 2005. pp. Sonnhammer. Salzberg. A. J Mol Biol. 1978.M. 109. "Ruminations on Smell as a Sociocultural Phenomenon". D. and Nilsson. 897–934. G. "Membrane topology of the 107. Flicek.. 117.. 402. Haaf. Burghammer. Science. 10. "PROCHECK . "Rapid and sensitive protein similarity searches".. and Thornton. C..S. C. Crystallogr. Lee. Low.. I. pp. and Lancet. 485-94. Vol. I. pp..R. 2004.H. J Appl. W.C. 2004. J.M. D.I. 114. C. 2000.A. W. M.. 116. 113. pp. D.. Vol. S. "Or83b encodes a broadly expressed odorant receptor essential for Drosophila olfaction". Kapp. L. M. 1409-38. S... .. R. "A periodicity analysis of transmembrane helices" Bioinformatics. 2. Nature..D.Y. T.C. pp. Vol. Y. Genome Biol. 53. Kreher. 397-417 2005. Chou. Lipman. Moss.. and Arkin. "Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics". C. pp. Liu.. Lai.. T. 70314. A.

S. 368-76. J.H. pp. M. S. Geer. S. Genome Res. Dev Biol. Ailion. Song. Liebert. S. 248.P. Sugimoto.271 Drosophila OR83b odorant receptor". Vol.. F. Makalowski. M. "Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences". T. Tasneem. Matsumoto. DeWeese-Scott.I..A. Vol. pp. 121. McGrath. Liu.. L.. and Sowdhamini. 2004. M..A.. M. 37.. M. Okada.. 846-57. 2007.. "Identification of transcriptional regulatory elements in chemosensory receptor genes by probabilistic segmentation" Curr Biol. 2005. H... 265. D. T. 6:106. and Sengupta. He. 302-19. 119. McCarroll. Derbyshire. Jackson.H. J.A.. W.. C. 118.. Li. 123.. Vol.S.S. G. J. 2005. BMC Genomics. N.. 581. M. J. "CDD: specific functional annotation with the Conserved Domain Database". Garrison. and Boguski. 2001. 477. J.L. Development of functional diversity in the chemosensory system of Caenorhabditis elegans". Ke. pp. M. C. N.. 6. Y. pp. "Parallel evolution of domesticated Caenorhabditis species targets pheromone receptor genes" Nature. FEBS Lett. N. Hidaka. pp. M. F.. 183-9. Lu.. Metpally.. Anderson.H.. Marchler. 122. C. 15. Trends Pharmacol Sci. Z.B. 22. 321-5.. 347-52. 2000..I. 5601-4. Marchler-Bauer. C. and Furuichi. R. Takasaki. D205-10. Nucleic Acids Res. R.. Lanczycki. P.... R.. 2011. "Cross genome phylogenetic analysis of human and Drosophila G protein-coupled receptors: application to functional annotation of orphan receptors". Zhang and Bryant. J. and Bargmann... J. and Bargmann. pp.. Zhang. Geer.K. J. S. D. R. Vol. 2009. Vol. "G-protein-coupled receptors and signaling networks: emerging paradigms"..J.. Chitsaz. 120. "The novel Gprotein coupled receptor SALPR shares sequence similarity with somatostatin and angiotensin receptors".Y.. T. A. Yamaguchi. Vol. Yamashita. 124. Thanki. pp. pp. T. M.. Mullokandov. Gwadz. Kamohara. K. Gene. pp. Melkman. Zhang.C..I. C.T. Lu.J. Vol. Vol. C. R. Xu. 125. 1996. 1-20. .. Marinissen. Vol.. P... Butcher.R.. Gonzales. K. Saito. Hurwitz. Fong.D. "The worm's sense of smell. and Gutkind.A.R.

Vol. Angele. 399-422. 97. A. E. Science. Nemes. I. "A cGMP-signaling pathway in a subset of olfactory sensory neurons". Montero. 131. 2987-97. Bidet. Pucci. P.. and Wunsch. 40. Dulac. Vol. pp.. and Breer.pp. Murphy. 71. Kremmer. 133. Vol.. "Modeling amino acid replacement". Mombaerts.. 2009. Mezler.272 126. Fleischer. N. "A general method applicable to the search for similarities in the amino acid sequence of two proteins" J Mol Biol. Vol. Needleman. J Recept Signal Transduct Res. Lacombe. Axel "Visualizing an olfactory sensory map" Cell. "Stability study of the human G-protein coupled receptor. Munger. Proc Natl Acad Sci U S A... 2010. and Muller. R. M. M. C. pp.D. Edmondson and R. and Tiffany. No. 110010. Vol. Polidori. 33. and Paez. J. Kaupp.. 1999. M. 130. 135. T. Vol. . 487-509. J. and Vingron. 2001. "Homology models of the cannabinoid CB1 and CB2 receptors. pp. U. Meyer. 2000. Joubert. Vol. Mori. Nemoto. pp. 127. pp. 48.E. "Genetics of chemotaxis and thermotaxis in the nematode Caenorhabditis elegans" Annu Rev Genet. Mendelsohn.. 75-83. C. Goya. B.. pp.. P. Vol. 137.M. Campillo. P.. and Zufall.. pp. 2005. 312-7. 2000. 1999. 675-86. 22. J Exp Biol. "Cloning of complementary DNA encoding a functional human interleukin-8 receptor". 1991. Leinders-Zufall. A docking analysis study" Eur J Med Chem. F. 1996. C. Vol. 7. Muller. A. 129.. Biochim Biophys Acta.D. S. F. "Molecular biology of odorant receptors in vertebrates" Annu Rev Neurosci. 13. F.A. M. pp. B. 1970. 2009.. and Mus-Veteau. pp.B. W.. I. Vol. 134. J. "GRIP: A server for predicting interfaces for GPCR oligomerization".R. 128. 1798. 253. Mombaerts.B. J Comput Biol. 115-40. M. Nehme. "Characteristic features and ligand specificity of the two olfactory receptor classes from Xenopus laevis". T.. S. Vol. H. H.K. and Toh... pp.L. pp 10595-600. 87. 1280–3. H. 136. 132. 761-76. 443-53. A.. P. "Subsystem organization of the mammalian sense of smell" Annu Rev Physiol. Chao. Wang. Smoothened". S. O. 204.

13-21.. K. Kappeler.E. 2005. Fox. Stenkamp. 18-24..P. Nemoto. W. "Mice lacking bombesin receptor subtype-3 develop metabolic defects and obesity".C.. and Hauri.. 2000. "Role of cytoplasmic C-terminal amino acids of membrane proteins in ER export".. Salomon. Kikuyama.. pp. Degen. 1993. H. E. J Cell Sci. T. 16. Vol.. 2005. Necles. M. and Henikoff.. Yamamoto. 2000. K... Vol.A. E. pp..644-60. H. Watase. C. Kumasaka. 2005.R and Axel. 255-8. Y. Henikoff.. Vol. 139. J. Hanski. O. Y. 141.... "Coding of olfactory information: topography of odorant receptor expression in the catfish olfactory epithelium". Vol. K. Nufer. Motoshima.U. 144. Tani. P. Guldbrandsen. 289.G. J. 23-8. T.. 739-45. T. pp. Ngai. H. 619-28. 760-6. Y. Vol. Ohki-Hamazaki. pp. H. N. Ogura. Vol. pp.58. D. "Odorant-sensitive adenylate cyclase may mediate olfactory reception" Nature. Dowling. Function. pp. K. pp. Paccaud. and Lancet. S. 145.. and Wada. F.. Gene. Science. E. 16. J. 146. 165-9. Behnke.. "PHAT: a transmembranespecific substitution matrix... and Nei. pp. K. Vol.. M. 346. R. M. Okada. Niimura. 143. and Suwa. 667-80. Wada. 364. I. B. A.A. M.. Fujibuchi. Niimura. Yamamoto. Palczewski. "Automatic gene collection system for genome-scale overview of G-protein coupled receptors in eukaryotes". 1997. and Bioinformatics.. "Prediction of interfaces for oligomerizations of G-protein coupled receptors". . H. Y. and Nei. 2005. 390. Vol. Hori.273 138.. Ng.M. pp. Gene. Nature. 142..P.C.. "Crystal structure of rhodopsin: A G protein-coupled receptor". M. Vol. Ono.. M. S. Le Trong. Imaki. PROTEINS: Structure. M. Yamada. "Evolutionary changes of the number of olfactory receptor genes in the human and mouse lineages". Pace. R. Chess. Vol.. Maeno.. pp. Cell. and Miyano. Gene. No. Teller... S. "Comparative evolutionary analysis of olfactory receptor gene clusters between humans and mice". 63-73. J. 147. Macagno. D. Predicted hydrophobic and transmembrane" Bioinformatics. 346. H. Yamano. 72. K. 140. and Toh. M. 1985. 115. W. 2002.

155. "Expression of members of the putative olfactory receptor gene family in mammalian germ cells". Eggerickx. and Perret. Vol.. 11. O. Prinster. Nature. pp. 7.N. and Caffrey. Redfern. P. M.I. and Brian Kobilka.G. Vol. DNA Cell Biol. No. Edwards.. Schuster. 2010. D. Choi. C. F. Gerard. G.. L. Schertler. Vol. D.. 67. "From plants to man: the GPCR "tree of life"". Lefort. 719-31.. 57. S. Weis. K. Curr Opin Plant Biol. J. J.A. pp. Dessailly. Vol.M. "Sequence alignment of the G-protein coupled receptor superfamily".F. 355. pp.. Ledent. Libert. and Sealfon.. A. 2008. pp.A. M. 289-98. 152. Raman. pp.. 1992.J. 2005. C. S. Archunan. 154.C. Vol. Kandaswamy. and Heringa "PRALINETM: a strategy for improved multiple alignment of transmembrane proteins".. Fischetti. 2006. and Sowdhamini. Rosenbaum. Sanishvili.. pp.M.. Hague.. Vol.. 453-5. D.C. 36-51.. Pharmacol Rev. V. P. R.S.. 450.C. "The Membrane Protein Data Bank" Cell Mol... 3. Nature. B. Mol Pharmacol.M.. D. S..A. S. 24. 153.. Rasmussen. Kobilka. C.. 777-83. "Heterodimerization of g protein-coupled receptors: specificity and functional significance".. M. Mollereau. Suganthan..F. W. Probst.. Life Sci. Perfus-Barbeoch. No. 1-20. S. Brosius. Vol. W. Ratnala. 394-402. 2007. 2. K. Pugalenthi. P.K. 63. Schiffmann. Pirovano. 492-497. F. C. 156. Burghammer.R.S. pp. H. C. T. V. Feenstra.. S. and Hall. 2008. W. 149. R.C.. pp. 1992. Schurmans.A.. A. . pp. K. 151. 157... L. Vol. Jones. Bioinformatics. G. pp. Thian. 2004... R. "Crystal structure of the human β2 adrenergic G-protein-coupled receptor". Cherezov. Parmentier. R. Amino Acids. Snyder. M. Vol.274 148. 39. 150. 1383-4..C. 18.I. and Assmann. "Exploring the structure and function paradigm". "Identification of functionally diverse lipocalin proteins from sequence information using support vector machine". Perez. G. 2005. "Plant heterotrimeric G protein function: insights from Arabidopsis and rice mutants". 383-387. Curr Opin Struct Biol. and Orengo. Vol.

Vol. 87.. 73. WormBook. Sagasti.. 1971. 2000.L. D. H. "The olfactory receptor gene repertoire in primates and mouse: evidence for reduction of the functional fraction in primates" Proc Natl Acad Sci U S A. 99. and Schoneberg. Crump. Remm. Proc Natl Acad Sci U S A. 1993. Robertson. 159. Arnold. 232–44. C.I. F. Ressler. pp. S. A. pp. pp. Vol. K.275 158. 20. S.J.. 1998. M.. elegans olfactory neurons".G. and Carlson. "Molecular evolution of the insect chemoreceptor gene superfamily in Drosophila melanogaster". Warr. and Sonnhammer.M. 166. Vol. Blancher. diversification. pp. and Thomas. "A zonal organization of odorant receptor gene expression in the olfactory epithelium" Cell. Vol.. Vol.L. 163.. 10. movement. M. Rognan. 1998. L. Pohl. 2870-4.. pp. 165. and Giorgi. pp.M. 160. 162. and Sundby. J.G. pp. S. A. Vol. 909-13. L. 167. 597-609. 2003. J. Vol. "The reaction of glucagon with its receptor: evidence for discrete regions of activity and binding in the glucagon molecule". and Buck. Robertson. No.T.. "Development and virtual screening of target libraries" Journal of Physiology-Paris. 2000. H. Robertson. 724-32. pp. pp. 2006. 100 Suppl 2. elegans". K. 97. "Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs". 55-67. "Functional consequences of naturally occurring DRY motif variants in the mammalian chemoattractant receptor GPR33". Birnbaumer. Vol. Genome Res. 161. 68. Roayaie. and intron loss". 449-63. 2–3. 2006. J. H. 2006. Rouquier. C. Genome Res. Orth. "The G alpha protein ODR-3 mediates olfactory and nociceptive function and controls cilium morphogenesis in C.. pp. D. H. Proc Natl Acad Sci U S A. "The putative chemoreceptor families of C. 1-12. 164. . 8. T. and Bargmann. 1679-89.H. A. 14537-42. "Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication.M. AS. Sullivan.. Genomics. Yu. Vol. E. H. Rodbell.R. Neuron.B. Rompler.

pp. 15. J. pp.. Pellegrini.. M.V. L.. 2010. and Hawkins. Vol. Vol. Mol Biol Evol. J Mol Biol. EvguenievaHackenberg. R.. "The highly conserved DRY motif of class A G protein-coupled receptors: beyond the ground state" Mol Pharmacol.P. C. 40625. Giegerich. 1999.. Vol. N. C. and Nei. "IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices". M. Bioinformatics. Francis. Vol. Vol. C.V. Nat Genet.P. Valetti. pp. . Saslis-Lagoudakis.. Colla.A. De Luca. Vol.. S. Bassi. 1996. Forest.I. 1002-6. M. C.E. J. PLoS One. V.. No. Saitou.A. Alloni.R.V. B. C.. T. 71.. 1993.. 175. pp. K. Nakagawa.D. C. Janssen. "Insect olfactory receptors are heteromeric ligandgated ion channels"... E. S. 2011. Tacchetti. Ponting.. Capra.. Rovati. C.. M. M. 171. S.. Vol.. Epub 2006 Dec 27. pp. 1000-11. e22275.. 174.. R. "The neighbor-joining method: a new method for reconstructing phylogenetic trees". A. "Ocular albinism: evidence for a defect in an intracellular signal transduction system". V. Koonin. Schaffer. M.. pp.. J. E. 9055-60. pp. and Touhara.L. G. J. C. Nature. A. Puri. A.. Becker. Vosshall. "The use of phylogeny to interpret cross-cultural patterns in plant use and guide medicinal plant discovery: an example from Pterocarpus (Leguminosae)". M... Vol. and Ballabio. Daschkey. and Altschul. R. pp. 234. L. T.. "The ocular albinism type 1 gene product is a membrane glycoprotein localized to melanosomes" Proc Natl Acad Sci U S A. 959-64. E.. Pellegrino. M. and Blundell. 7.F. Schiaffino. T. De Luca. Nakagawa. Y.. 6.. 452.B. 4.. Schiaffino. B. Vol. S. 169. 176. K. Sato.B. K.. and Ballabio. Wolf.. 23. Williamson. Tacchetti. F. 2007.T.. Schluter. d'Addio. Janicke.. G.. Klitgaard. Aravind.. 11. Baschirotto. "Comparative protein modelling by satisfaction of spatial restraints".M. A. A. 2008. L. 1987.. 245. Savolainen.276 168. 93.. 173. 779-815.H. "A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti" BMC Genomics. pp. Cortese. 172. Montalti. and Neubig. 170. Baschirotto. 1999.. Sali. and Becker. S. 108-12. Reinkensmeier. Review. PubMed PMID: 17192495.

Shah. Ann N Y Acad Sci. 349. Schlyer. 2-8. "TREEPUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing". 867–74. Vol.. H. Aleksandar Stojanovic. W. Vol.. Y.A. Hendrick. Aparicio. Vol. Dixon.. 178. K.B. 1793–1800. 1998.. 84. BioData Min. "Structural understanding of the transmembrane domains of inositol triphosphate receptors and ryanodine receptors towards calcium channeling". 186. pp. 184. 1987. H. Thresher. and Sowdhamini.F. K. pp. R.. O'Rahilly. D.. 180. P. pp. Chou. Vol. R. Sharon. Y. 1996. Gaurav. Shagoury. U. 182. 48. W. pp. Drug Discov Today. and Bargmann. pp. "Dopamine receptors and the dopamine hypothesis of schizophrenia". pp. Prot Engng. W. Crowley.M. 899-909. Bo-Abbas.F. 179. P. J. Biochemistry.R. Kuohung.S. 855. Sengupta. 185. 11..R. "3PFDB A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach". 1. M. pp.. S. Kaiser. 2009.. and protein modeling in the olfactory receptor gene superfamily".. Gusella. 1614-27... Carlton. Vingron. S. Shameer.. 481-93. N Engl J Med.. Seeman.K.G. and Haeseler.pp. and Colledge.A. S. Vol. 2002. P. and Horuk. 20. Schmidt. Eric Arehart and Daniel Byington "Conserved Rhodopsin Intradiscal Structural Motifs Mediate Stabilization: Effects of Zinc". G. D. S.. "odr-10 encodes a seven transmembrane domain olfactory receptor required for responses to the odorant diacetyl".. Slaugenhaupt. M. No.14. Zahn... 2003.. 8. J. Vol. 182-93. R. Bioinformatics. Vol. "Genome dynamics. Cell. Vol.E. 181. D... and Guy. J. K.B. Horn-Saban. and Sowdhamini. 18. P. evolution. . R. No.H. Nagarajan. No. Acierno. Seminara. A. Pilpel. Scott Gleim.H. "STAM: simple transmembrane alignment method" Bioinformatics. Vol. S.. 2004.. K. 6. Strimmer. J. 758-69. pp..277 177. PubMed PMID: 11934758. 3. Synapse.. S. 502-4.A. Glusman. Shafrir..B.. Chatzidaki. 183. J..K. 133-52. S. 2001. E.. 2009. "The GPR54 gene as a regulator of puberty". Vol.I. C. Schwinof. Y. "I want a new drug: G-protein-coupled receptors in drug development". A. Messager. and Lancet. 2006.

S. Vol. J Biol Chem. 79-96. Sklar. 28. The Journal of Biological Chemistry. and Snyder. Vol.R. Skoufos. Sikder. 283. E. 190.N. von Heijne. and Michener. 2007. Vol. P. Vol. G.. pp. 117. C. Genes Dev.R..F. 1990. "Seven transmembranespanning receptors for free fatty acids as therapeutic targets for diabetes mellitus: pharmacological.Y.S. Siddiqi. Smith. "Identification of common molecular subsequences". 1997 Sonnhammer. and Durbin. 16269–73. 1999.H. pp. 38. Proc Int Conf Intell Syst Mol Biol. 6. 1998. "The odorant-sensitive adenylate cyclase of olfactory receptor cells. Vol. 405-20. 3 Eds. and Shepherd. 1958. pp.J. pp. O. E. R. Marenco. 487-98.. 9.B. R. A. T. "Pfam: a comprehensive database of protein domain families based on seed alignments".. Vol. 2009. 21. 1986. Differential stimulation by distinct classes of odorants". P. 28. "Olfaction in Drosophila". D. Marcel Dekker. 196.. Vol. 188. pp. pp. 341-3.M. E. 193. Vol. S.L.E. and Dittman. T. Ngai.M. S. Isacoff. G.H. 197.. 1409–1438. R... 192. "Olfactory receptor database: a sensory chemoreceptor resource". "The neurohormone orexin stimulates hypoxia-inducible factor-1 activity". P. 23.L. 195.S. pp. 2006. Fidler. Neuron. pp. 175-82. Vol. Miller. Lin. Proteins. D. "Functional identification of a goldfish odorant receptor". "A statistical method for evaluating systematic relationships" University of Kansas Science Bulletin. D. phylogenetic. pp. .M.M. 2000. B. Steiger. and drug discovery aspects". pp.. Eddy. S. Anholt.278 187. Nadkarni.. Vol. Speca. L.C. J Mol Biol. NY. "A hidden Markov model for predicting transmembrane helices in protein sequences". Nucleic Acids Res. Chemical senses Wyzocki et al. A. M. 189. 195-7. 15538-43.W. Vol.L.. pp. 147. 1981 Sokal. A. Stefano Costanzi and Gershengorn. and Krogh. 29953005. Sorensen.. and Waterman. 261. E. Sonnhammer. J. and Kempenaers. "Evidence for increased olfactory receptor gene repertoire size in two nocturnal bird species with well-developed olfactory ability" BMC Evol Biol. P. 191. 194. and Kodadek.

. Kamath.E.L. 199. J. pp. 201. D.B.. pp.. Tamura. T. Curr Biol. Vol. Vol. L. P. Ressler. Erland. 1997. P. Proc Natl Acad Sci U S A. Vol.H. and maximum parsimony methods". 28. S.R. M. K. 2006.. Nei. Coulson... M. Blasiar..W. 24.D. D. 1269..S. J. G. J. and Waterston.R. Ng. 884-8.. and Kumar. "The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics". C. Durbin. and Hansson.A.P.. Taniura..J.. 116-21. 2003. Jansen.C. Greenaway. M. Rogers.. JBC Papers in Press. Stensmyr.C. A. 202. L. R. Harris. "MEGA5: molecular evolutionary genetics analysis using maximum likelihood. Clee. evolutionary distance. 1. and Breer..E. 2011. M. 1596-9. Rademakers. A. Tareilus.3–26. 1996. Fulton. 94. G. M. and von Haeseler. N.. N. J.. K.L. Vol. "Insect-like olfactory adaptations in the terrestrial giant robber crab".L. Kuramoto.A.. Brent. Fulton. N. Miner. 22. Strimmer. B.. S. Kuwabara.. 2005. Marra.A.. K. Tamura.K.. 4.. T. Wilson. Stein. 129-38. Hillier. Sullivan. D'Eustachio... Dekkers.. Coghlan... D.. 205. M. L. C.. and Buck. pp... R. 15. GriffithsJones. 2006. Stecher. Vol.H. G. J.. E. M.S. M. Stocker "The organization of the chemosensory system in Drosophila melanogaster : a review. Kozak... R. Schein..A.. R. elegans generates novel behavioural responses to human ligands" BMC Biol. Noe. Fraser. E. PLoS Biol. 2007. H.. S. E45. Mullikin. "Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment". Minx. "MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4...E. "The chromosomal distribution of mouse odorant receptor genes" Proc Natl Acad Sci U S A.. and Yoneda. Peterson. D. Mol Biol Evol. Vol. J. Peterson.. 200. " Cell Tissue Res Vol. Sanada. pp. 6815-9. R. "A etabotropic glutamate receptor family gene in dictyostelium discoideum". L. and McCafferty. pp. pp. Spieth. S. Wei. L. pp. Willey. Fitch.. P. R. Wallen. J. B. J. pp 1994. 1995. "Expression of mammalian GPCRs in C. P.W.. S. 204. A. Vol. Vol.. Stajich. H. 2731-9. pp.. M. R. S. "Calcium signals in olfactory neurons" Biochim Biophys Acta. Dudley. pp. Teng... Y..W. 203. Chen.C. Chinwalla. Clarke. Nei.. C. Hallberg. Adamson. Blumenthal.. K.0" Mol Biol Evol.. .279 198.. 275. Z. Sohrmann. T. Plumb. A. Mardis.. and Kumar. E. 206.E. 207... 93. Bao.. N.

1471-2. 214. Bagos. pp.H. 2008. J. effectors and their interactions". and Simon. N. Vol. 6. Vessella.C. 549. 193. and Simon. Roudier. 83. PMID: 18975117. 24(12).. 6. G-proteins.D. Tsuboi. elegans". "Olfactory sensory neurons expressing class I odorant receptors converge their axons on an antero-dorsal domain of the olfactory bulb in the mouse" Eur J Neurosci. "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting.A. Dwyer. Thomas. Thompson. Chen.R. 2008.. 15.. "Divergent seven transmembrane receptors are candidate chemosensory receptors in C. L. 2008. H. pp. Tusnady. Troemel. L. PubMed PMID: 7585938. PubMed PMID: 7984417. Vol. Ulrich.. Tripathi.. Chou. No.. 2009. No.D. H. G. 36. Colbert. Miyazaki.. Biochem Biophys Res Commun. and Sowdhamini. 26. Lantry. E.D. 4673-80. 213. C. C. and Nunn. H.D. Kalmar.I." Nucleic Acids Res. L. 23. and H. Vol. Vol. I. and Bargmann.. "In vitro binding evaluation of 177 Lu-AMBA. pp. R. 211. Vol. Vol.E. D234-9. pp... P.. SJ. L. Vol. Theodoropoulou. A. 209.. Buell... "Genome-wide survey of prokaryotic serine proteases: analysis of distribution and domain architectures of five serine protease families in prokaryotes" BMC Genomics. R. No.P. 210. BMC Biol. "Molecular cloning and functional expression of the human gallbladder cholecystokinin A receptor". 2006.M. "TOPDB: topology data bank of transmembrane proteins". No. R.D. Vol. pp.L. and Gibson. J. 17. 1995. M. Vol.J. 1994. Vol. I. 22. Hadac. 204-11. I "The HMMTOP transmembrane topology prediction server".. 1993. "The Caenorhabditis chemoreceptor gene families". No..C. pp. pp. Holicky. "gpDB: a database of GPCRs. Thomas. 105–9. T. and Robertson.. and Miller.280 208. A. 217. J.E. 2008. 2.. 1436-44. 11. 849-50. . Bioinformatics. T.E. G.H. Cell. 42.G. pp. 9. 212. pp. pp. G. J. 216. 2001. Ferber. M. Tusnady. H.M. 215. Epub Oct 31. I. PubMed Central PMCID: PMC308517. E. 2008. and Sakano. E. Imai. Clin Exp Metastasis. Nucleic Acids Res. 20. Spyropoulos. position-specific gap penalties and weight matrix choice. a novel 177 Lu-labeled GRP-R agonist for systemic radiotherapy in human tissues".. Bioinformatics. 207-18.

P. 226. N. pp. Edwards. 13 pp.H. "Glucocorticoids decrease endothelin-and -B-receptor expression in the kidney". P. Vol. M. Vol. "dOr83b--receptor or ion channel?". and Provencher.F. Wicher. and Hansson. 85. J. 2008. Leslie. M. Vol. 164-7.. Villeneuve. A. S. 549-51.. 2027-43. 51-63. 2007. Vol. 227. Henderson. R. and Stocker. Gignac. 26. 2009.J. 2005.J. and Singh. "Estimating the frequency of events that cause multiple-nucleotide changes". D. 2001. Stensmyr. J Cardiovasc Pharmacol. Vinson. Vol. Kim. Thomson. P. Genetics. Nature. Ann N Y Acad Sci. and Adler. Venkatesh.. 454. 314. Vol. 228.. pp. R. pp.281 218.. J. C. Vadakkadath Meethal..N. R.. pp. N. C. 1159-204. C. 6.. Heinemann.. pp. Vol. G. S.Morphol. 2004.S. S. Vol. Tate. Embryol.F. J. T.G. Heller.. R. 223.C. and Brenner. S. White.R. Philos Trans R Soc Lond B Biol Sci. "Directional non-cell autonomy and the transmission of polarity information by the frizzled gene of Drosophila". pp. 1984. M. 219. pp. Vol. S.J. 225.Y. Petras. Schafer. Nature. Int.N. "Structure of a beta1-adrenergic G-protein-coupled receptor".G. Sgro. Haasl. 36 (5 Suppl 1). Annu Rev Neurosci.. S. and Schertler.. 222. J. J. Vol. 505-33. 1987. "Identification of a gonadotropin-releasing hormone receptor orthologue in Caenorhabditis elegans". 1-340. Serrano-Vega... Vosshall. R. M. C.H. "Olfaction in Drosophila: coding. Warr. and Atwood.S. Wettschureck. "Sensilla on the third antennal segment of Drosophila melanogaster meigen". 30.. S. and Carlson. Baker. BMC Evol Biol. 486-91. 167. 1986. G.B..R. 224. L.. R. R.C. pp.G. 2000... Warne. "Mammalian G proteins and their cell type specific functions" Physiol Rev. "The structure of the nervous system of the nematode Caenorhabditis elegans".. R.J. Moukhametzianov. Gallego. 221. P. 220. 1170. Whelan. Bauernfeind. E. 201-6.J. Southgate. 2006. A. Vol. S. 103. genetics and e-genetics" Chem Senses.. and Goldman. Clyne. Insect. pp.. "Molecular architecture of smell and taste in Drosophila". de Bruyne. 329. B. . and Offermanns. J..

. 310. Y. Yamano. Vol. D.. pp. S.. Fang. Proc Natl Acad Sci U S A. Li. 239. Vol. I. P. 1066-71. Y. and Firestein. 80. 41. T.133.M. L. Liu. Vol. M. Wilbur. Vol..Y... R. 2006. J. pp. Zou... Bolund. V. and Firestein. Y.. elegans" WormBook. 2005. Biosci Biotechnol Biochem. pp. X. A. and Greenwald. Tian. F. 89.. L. Liu. "The olfactory receptor gene superfamily of the mouse". .J.. Kall. Ye. 2002. X. Zhang..282 229.. W. 2006. Zhang. 5. 34.. 101. "WEGO: a web tool for plotting GO annotations" Nucleic Acids Res. Kuhn. Vol.. "Comparative genomics of odorant and pheromone receptor genes in rodents" Genomics.. pp. Katritch.. Vol. Yoshimizu.D. R.. "Structures of the CXCR4 chemokine GPCR with small-molecule and cyclic peptide antagonists". pp. 237. X. Protein Sci. 2002.. Vol. 15. W. pp..J.. V. 441-50. I. E. 234. "Gene duplications and genetic redundancy in C. pp. Abagyan.. 14168-73. 230. and Sonnhammer. pp. 330. J. Zhang.. C. Li. Wells. Chien.. M. A. and Firestein. T. 1369-71. S. Proc Natl Acad Sci U S A. H. 233. Chaki. "A general model of G protein-coupled receptor sequences and its application to detect remote homologs". 1983. Handel. 231. Yoshioka. 2004. J. Zhang.C. Zhang..S. J. A. 509-21. Vol. Yoo. Toda. "The olfactory receptor gene superfamily of the mouse". Fenalti. 235. G. "Rapid similarity searches of nucleic acid and protein data banks". W293-7. Zhang.J. 238. Zhang. and Firestein. M. S. L.. 124 . pp.C. elegans" Science. H.L. 2010. "LIN-12/Notch activation leads to microRNA-mediated down-regulation of Vav in C. 68. Vol. M.. and Wang. S. 236.J. and LipmanD. pp. S. X. Bi. Zheng. Nature Neuroscience.. Cherezov. Kamon. Woollard. Wang. Z.. Vol... Chen..J. B. Y.. 1330-3.. Brooun. E.. 2007. 5. X. pp. Shepher. P. and Morishima. Mol. "The role of the DRY motif of human MC4R for receptor activation". 2004. M.. Rogers. and Stevens. Vol. Science. Wistrand.. R. "High-throughput microarray detection of olfactory receptor gene expression in the mouse". D. Oshida. Hamel. Wu. S. J. Nat Neurosci. Ma. 1-6. 2005. 726-30. 124-33. 232. Zhang.

2001. X.. Vol. and Nguyen. 18. 2007. "The human olfactory receptor repertoire" Genome Biol. and Qin.. Y. Echeverri. Yang. S. T. pp 395. Vol. Liang. 8. S. Zozulya. BMC Genomics. F. Zhang. Guan. F. "Genome-wide survey of putative serine/threonine protein kinases in cyanobacteria".. X. Zhao. 241. p.. ..283 240. C. 2.

2011. Sankar Kannan..15–20.pp. 2012. 3... No. Bioinformation. PMCID: PMC3218415. No.. pp. “Crossgenome clustering of human and C. Govindaraju Archunan. PMCID: PMC3163927.5. 2. and Ramanathan Sowdhamini “Insights from the analysis of conserved motifs and permitted amino acid exchanges in the human. pp.. Singaravelu Kalaimathy. 2011. and Ramanathan Sowdhamini..8. Veluchamy Balakrishnan.7.. Vol..7. elegans G-protein coupled receptors”. Kannan Sankar. Govindaraju Archunan. Balasubramanian Nagarathnam. Bioinformation. 1. the fly and the worm GPCR clusters”. Balasubramanian Nagarathnam. Varadhan Dharnidharka.. . and Ramanathan Sowdhamini “TM-MOTIF: an alignment viewer to annotate predicted transmembrane helices and conserved motifs in aligned set of sequences”.229-259. 214–221. Vol. Evolutionary Bioinformatics. Varadhan Dharnidharka.284 LIST OF PUBLICATION 1. Vol. Veluchamy Balakrishnan.. Veluchamy Balakrishnan. Published online 2011 August 20. Balasubramanian Nagarathnam. Published online 2011 October 31.

India. India. She has also served as the Head. Her project on “Genomewide survey of certain mammalian GPCRs and olfactory receptors” was funded under India-Japan Collaborative Research Project by National Institute of Advanced Industrial Science and Technology (AIST). . Tamil Nadu. She served as a student volunteer and also presented a poster at the 8th AsiaPacific Bioinformatics Conference. Tsukuba. She is also into developing bioinformatic tools that may be used by the wider scientific community to address these questions. identify motifs and to disentangle their phylogeny. She has been a Senior Research Associate at Bio Informatics Research. Vellalar College of Arts and Science and in the Department of Microbiology. She has been a lecturer of biological sciences in several colleges in India. Sowdhamini. Chennai. National Centre for Biological Sciences. Department of Bioinformatics. She carried her complete research work as full-time Research Scholar at Lab-25. KSR College of Arts and Science. Bangalore. India. C/o Prof. Tirchengode. GIC online (Pvt). At the science-industry interface.285 CURRICULUM VITAE NAGARATHNAM B. Assistant Professor. India and Dr. V. Bangalore. National Centre for Biological Sciences. NS College of Arts and Science. She has attended several international conferences such as the 3rd Japan-India Bilateral Workshop on Bioinformatics at AIST. Bangalore. R. Japan and Department of Biotechnology (DBT). Nagarathnam started her academic career with a distinction in BSc Zoology at Bharathiar University and earned her Masters in Applied Biology from Gandhigram Rural Institute in which she was the gold-medalist. At various times. She is particularly interested in bioinformatic methods in sequence analysis of membrane proteins. R. has worked towards her PhD under the guidance of Dr. Balakrishnan. KSR College of Technology. Professor. Sowdhamini. Nagarathnam has taught as a Lecturer in the Department of Zoology. Japan. She also holds an MPhil in Biotechnology from Periyar University.