Professional Documents
Culture Documents
net/publication/343671684
CITATIONS READS
0 1,566
1 author:
Ansary P Y
Govt Ayurveda College, Tripunithura
107 PUBLICATIONS 21 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ansary P Y on 26 August 2020.
Edited by
Dr. P. Y. Ansary, MD (Ay), PhD
~1~
Hand Book
on
Bioinformatics & its application in
Ayurveda
Edited By
Published by
Dept. of Dravyagunavijnanam
Govt. Ayurveda College
Tripunithura, Ernakulam (Dt)
In association with
KUHS School of Fundamental Research in Ayurveda
Tripunithura
2020
Preface
School of Fundamental Research in Ayurveda is regularly conducting
faculty improvement program. As a part of this a program was
conducted at February 2020 in association with Department of
Dravyaguna, Govt. Ayurveda College Tripunithura to study the
Basics of Bioinformatics.
Bioinformatics is an interdisciplinary field that develops
methods and software tools for understanding biological data,
which is an essential part in research. It include the collection,
storage, retrieval, manipulation and modeling of data for analysis.
Bioinformatics combines biology, computer science and information
technology to analyse and interpret biological data. The knowledge
of basics of bioinformatics is essential for a researcher of Ayurveda,
as already there are few research works conducted in Ayurveda using
bioinformatics.
Two important activities that use in bioinformatics are
Genomics and Proteomics. The knowledge of these will be very
much useful to Ayurveda research. How a signaling pathway
works in a cell can be addressed through system biology. The gene
involved in the pathway, their interaction and modification can be
modeled using system biology. Ayurveda can make use of this for
future research to explain what happens in a cell during doshadushya
samurchana and the cell level changes during pathological stage can
be able to identify. By understanding the complete “parts list” ina
genome, will give a better understanding of a complex biological
system. This will helpful to Ayurveda for better understanding of
~3~
Prakrithi, manifestation of disease and knowledge of proteomics
will help in drug discovery.
This book starts with an article titled ‘Research works in
Ayurveda using Bioinformatics tools – A Review’ which gives
the current scenario of research in Ayurveda that make use of
bioinformatics technology. The introductory remarks in the
subsequent session clearly explain all about bioinformatics in
a way to understand for a beginner. The chapters follow are
Biological databases, Multiple Sequence Alignment using Clustal
X 2.1, Alignment Editing, Phylogenic Analysis using MEGA X,
Primer designing, Gene prediction using Gene Mark, Molecular
visualization, Secondary structure prediction, Database and Online
Tool Website. The narration of all the chapters is for a proper
understanding of a reader.
Hope this book will be useful for teachers, PhD scholars and post
graduate students to understand bioinformatics and its application
in research, hence forward this for improving the quality of future
research in Ayurveda.
Dr. Sudhikumar K B
Professor in Charge
School of Fundamental Research in Ayurveda
~4~
Editor’s Note
Department of Dravyagunavijnanam, Govt. Ayurveda College,
Tripunithura had organized a fundamental training course in
‘Bioinformatics’ for Ayurveda faculty of affiliated colleges on
25th& 26th February 2020 in association with KUHS, School of
Fundamental Research in Ayurveda. The aim of the two day program
was to give a basic outlook in bioinformatics and its application in
Ayurveda.
In this era of transdisciplinary approach in research, it is
high time to renovate Ayurveda making use of the advances in
science and technology. Bioinformatics helps to create better
understanding about the concepts, theories, methodologies, etc and
thereby gives way to the development of Ayurveda. Bioinformatics
tools offers application in different areas like ‘Prakriti’ assessment,
pharmacokinetic/pharmacodynamic analysis, medicinal plant
based drug development, personalized approach in medicine,
identification of disease susceptibility prakritis, preventive medicine,
development of new treatment methods, etc. In the last few decades
some researchers have taken effort to conduct studies in Ayurveda
making use of bioinformatics technology.
We are very happy to publish a hand book on ‘Bioinformatics
and its application in Ayurveda’ in connection with this training
course. The contributors are Dr. Abhilash. M, Associate Professor,
Dept. of Kriya Sharira, Govt. Ayurveda College, Tripunithura; Dr.
K. S. Rishad, PhD, Research Director, UniBiosys Biotech Research
Labs (Managed by UniBiosys Foundaation for Education and
~5~
Research), Cochin University Road, South Kalamassery, Cochin;
Dr. Bivya Gopalan, UniBiosys Biotech Research Labs and Arjun S.
R, ZyGene Biotechnologies (P) Ltd, Kochi.
We express our sincere gratitude to Dr. Mohanan
Kunnummal, Hon. Vice Chancellor, Kerala University of Health
Sciences, Trissur for the constant support and encouragement.
We extend our heartfelt thanks to Dr. A. Nalinakshan, Pro Vice
Chancellor and Dr. A. K. Manojkumar, Registrar, Kerala University
of Health Sciences, Trissur for their valuable help. Respectful
thanks to Dr. V. S. Syamaladevi, Principal, Govt. Ayurveda College,
Tripunithura for providing facilities and support. Dr. Sudhikumar.
K. B, Professor, School of Fundamental Research in Ayurveda is the
mentor of this program and we are very thankful for his inspiration
and guidance.
Dr. P. Y. Ansary
~6~
Research works in Ayurveda
using Bioinformatics tools –
A Review
Dr. Abhilash. M, MD (Ay), Assistant Professor,
Dept. of Kriya Sharira, Govt. Ayurveda College, Tripunithura
Dr. P. Y. Ansary, MD (Ay), PhD, Professor & HOD,
Dept. of Dravyaguna, Govt. Ayurveda College, Tripunithura.
Introduction
The holistic concepts of Ayurveda were most often rendered
incompatible when considered with the modern biological
principles. But this scenario is being changed after the introduction
and advancements of bioinformatics especially systems biology
approaches. The so-called science no more is a sequential exploration
of reductionist techniques. Rather, as of now, integration is the key
factor, provided you have enough data and measures to look into
the matter scientifically.
Recent developments in computational biology and
bioinformatics have provided biologists with some systematic
methods to analyze the molecular networks in acellular context.
Collectively predicated as systems biology, it aims to analyze
relationships among elements (nodes) in a given system or the
emergent properties of the system. Cellular networks that model
the cellular response to a given perturbation would include protein-
~7~
protein interaction networks (PPI: encode the information of
proteins and their physical interactions); signal transduction
and gene regulatory networks (STN and GRN: show regulatory
relationships between transcription factors and/or regulatory RNAs,
as well as the signalling pathways that confer these responses); and
the metabolic networks (MN: illustrates the biochemical reactions
between metabolic substrates and products). Molecular networks
that occur in a cell can be presented as either directed or undirected
graphs. For example, PPI networks use undirected graphs where
nodes represent proteins and the links show the physical interactions
between the proteins. An exhaustive description of these networks
is available.
Ayurveda is one of the ancient systems of health care of
Indian origin. Roughly translated into “Knowledge of life”, it is
based on the use of natural herbs and herb products for therapeutic
measures to boost physical, mental, social and spiritual harmony
and improve quality of life. Although sheltered with long history
and high trust, Ayurveda principles have not entered laboratories
and only a handful of studies have identified pure components
and molecular pathways for its life-enhancing effects. In the post-
genomic era, genome-wide functional screenings for targets for
diseases is the most recent and practical approach1. The current
situation demands the merger of Ayurveda and functional genomics
in a systems biology scenario that reveals the pathway analysis of
crude and active components and inspires Ayurveda practice for
health benefits, disease prevention and therapeutics.
~8~
Drug oriented studies
Ashwagandha is an important herb used in Ayurveda. Alcoholic
extract (i-Extract) from its leaves and its component, withanone,
were previously shown to possess anticancer activity. In a study, a
combination of withanone and withaferin A, major withanolides
in the i-Extract, retained the selective cancer cell killing activity
and found that it also has significant antimigratory, -invasive, and
-angiogenic activities, in both in vitro and in vivo assays. Using
bioinformatics and biochemical approaches, it was demonstrated
that these phytochemicals caused down regulation of migration-
promoting proteins hnRNP-K, VEGF, and metalloproteases and
hence are candidate natural drugs for metastatic cancer therapy2.
Some of the major bioactive compounds of Withania somnifera
have been discussed on protein-protein, protein-DNA and
genetic interactions with respect to gene and protein expression
data, protein domains, metabolic profiling, root organ culture,
genetic transformation and phenotypic screening profiles. The
implementation of latest bioinformatics tools in combination with
biotechnological techniques for breeding platforms are important
in conservation of medicinal plant species in danger3.
Asparagus racemosus (Shatavari) has been exploited as a
food supplement to enhance immune system and regarded as a
highly valued medicinal plant in Ayurvedic medicine system for
the treatment of various ailments such as gastric ulcers, dyspepsia,
cardiovascular diseases, neurodegenerative diseases, cancer, as a
galactogogue and against several other diseases. In depth metabolic
~9~
fingerprinting of various parts of the plant led to the identification
of 13 monoterpenoids exclusively present in roots. LC-MS profiling
led to the identification of a significant number of steroidal saponins.
In order to understand the molecular basis of biosynthesis of major
components, transcriptomesequencing from three different tissues
(root, leaf and fruit) was carried out. Functional annotation of
A. Racemosus transcriptome resulted in the identification of 153
transcripts involved in steroidal saponin biosynthesis, 45 transcripts
in triterpene saponin biosynthesis, 44 transcripts in monoterpenoid
biosynthesis and 79 transcripts in flavonoid biosynthesis4.
Clitoria ternatea is an essential constituent in medhya
rasayana for treating neurological disorders. The phytochemicals
from the root extract were extricated using gas chromatography–
mass spectrometry assay and molecular docking against the protein
Monoamine oxidase was performed with four potential compounds
along with four reference compounds of the plant. This persuaded
the prospect of C. ternatea as a remedy for neurodegenerative
diseases and depression. The in-silico assay enumerated that a
major compound (Z)-9,17-octadecadienal obtained from the
chromatogram with a elevated retention time of 32.99 furnished
a minimum binding affinity energy value of -6.5 kcal/mol against
monoamine oxidase (MAO-A). The interactions with the amino
acid residues ALA 68, TYR 60 and TYR 69 were analogous to the
reference compound kaempferol-3-monoglucoside with a least
score of -13.90/-12.95 kcal/mol against the isoforms (MAO) A
and B5. This study fortified the phytocompounds of C. ternatea
~ 10 ~
as MAO-inhibitors and to acquire a pharmaceutical approach in
rejuvenating Ayurvedic medicine.
A crucial virulence factor for intracellular Mycobacterium
tuberculosis survival is Protein kinase G (PknG), a eukaryotic-like
serinethreonine protein kinase expressed by pathogenic mycobacteria
that blocks the intracellular degradation of mycobacteria in
lysosomes. Inhibition of PknG results in mycobacterial transfer
to lysosomes. Withania somnifera, a reputed herb in Ayurvedic
medicine, comprises a large number of steroidal lactones known
as withanolides which show various pharmacological activities. The
docking of 26 withanferin and 14 withanolides from Withania
somnifera into the three-dimensional structure of PknG of M.
tuberculosis using GLIDE was described. The inhibitor binding
positions and affinity were evaluated using scoring functions-
Glidescore. The withanolide E, F and D and Withaferin - diacetate
2 phenoxy ethyl carbonate were identified as potential inhibitors
of PknG6. The available drug molecules and the ligand AX20017
showed hydrogen bond interaction with the amino acid residues
Glu233 and Val235.
Several bioactive compounds have been isolated from
medicinal plants such as Ficus benghelensis, Ficus racemosa, Ficus
religiosa, Thespesia populena and Ficus lacurbouch were taken for
screening. In a study aimed to evaluate molecular interactions of
selected diabetes mellitus (DM) targets with bioactive compounds
isolated from Ficus benghelensis, Ficus racemosa, Ficus religiosa,
Thespesia populena and Ficus lacurbouch, screening of the best
~ 11 ~
substances as bioactive compounds was achieved by molecular
docking analysis with 3 best selected DM target proteinsie,
aldose reductase (AR), Insulin Receptor (IR) and Mono-ADP
ribosyltransferase-sirtuin-6 (SIRT6). In this analysis six potential
bioactive compounds (gossypetin, herbacetin, kaempferol,
leucoperalgonidin, leucodelphinidin and sorbifolin) were
successfully identified on the basis of binding energy (>8.0 kcal/mol)
and dissociation constant using YASARA. Out of six compounds,
herbacetin and sorbifolin were observed as most suitable ligands for
management of diabetes mellitus7.
Guggul gum resin from Commiphora wightii (syn.
Commiphora mukul) has been used for centuries in Ayurveda to treat
a variety of ailments. The NMR and GC–MS based non-targeted
metabolite profiling identified 118 chemically diverse metabolites
including amino acids, fatty acids, organic acids, phenolic acids,
pregnane-derivatives, steroids, sterols, sugars, sugar alcohol,
terpenoids, and tocopherol from aqueous and non-aqueous extracts
of leaves, stem, roots, latex and fruits of C. wightii. Out of 118,51
structurally diverse aqueous metabolites were characterized by
NMR spectroscopy. Quinic acid and myo-inositol were identified
as the major metabolites in C. wightii. Very high concentration of
quinic acid was found in fruits (553.5 ± 39.38 mg g_1 dry wt.) and
leaves (212.9 ± 10.37 mg g_1 dry wt.). Similarly, high concentration
of myo-inositol (168.8 ± 13.84 mg g_1 dry wt.) was observed
from fruits. The other metabolites of cosmeceutical, medicinal,
nutraceutical and industrial significance such as a-tocopherol,
n-methyl pyrrolidone (NMP), trans-farnesol, prostaglandin F2,
~ 12 ~
protocatechuic, gallic and cinnamic acids were identified from non-
aqueous extracts using GC–MS. These important metabolites have
thus far not been reported from this plant. Isolation of a fungal
endophyte, (Nigrospora sps.) from this plant is the first report. The
fungal endophyte produced a substantial quantity of bostrycin and
deoxybostrycin known for their antitumor properties. Very high
concentrations of quinic acid and myo-inositol in leaves and fruits;
a substantial quantity of a-tocopherol and NMP in leaves, trans-
farnesolin fruits, bostrycin and deoxybostrycin from its endophyte
makes the taxa distinct, since these metabolites with medicinal
properties find immense applications as dietary supplements and
nutraceuticals8.
Centella Asiatica is a plant considered as part of Ayurvedic
medicine, traditional African medicine and traditional Chinese
medicine. The unavailability of genomics resources is significantly
impeding its genetic improvement. There had been no attempt
made to develop Expressed Sequence Tags (ESTs) derived Simple
Sequence Repeat (SSR) markers (eSSRs) from the Centella
genome. A study hence was initiated aimed to develop SSRs and
their further experimental validation and cross-transferability of
these markers in different genera of the Apiaceae family to which
Centella belongs. An in-house pipeline was developed for the entire
analyses by combining bioinformatics tools and perl scripts. A total
of 4443 C. asiatica EST sequences from dbEST were processed,
which generated 2617 nonredundant high quality EST sequences
consisting 441 contigs and 2176 singletons. Out of 1776.5 kb
of examined sequences, 417 (15.9%) ESTs containing 686 SSRs
~ 13 ~
were detected with a density of one SSR per 2.59 kb. The gene
ontology study revealed 282 functional domains involved in various
processes, components, and functions, out of which 64 ESTs were
found to have both SSRs and functional domains. Out of 603
designed EST-SSR primers, 18 pairs of primers were selected for
validation based on the optimum parameter value. Reproducible
amplification was obtained for six primer pairs in C. asiatica that
were further tested for cross-transferability in nine other important
genera/species of the Apiaceae family. Cross-transferability of the
EST-SSR markers among the species was examined and Centella
javanica showed highest transferability (83.3%). The study revealed
six highly polymorphic EST-SSR primers with anaverage PIC
value of 0.95. In conclusion, these EST-SSR markers hold a big
promise for the genomics analysis of Centella asiatica, to facilitate
comparative map-based analyses across other related species within
the Apiaceae family, and future marker-assisted breeding programs9.
In the research project conducted by Dept of Dravyaguna
Vijnana, Govt. Ayurveda College, Thiruvananthapuram BLAST
analysis and DNA sequencing of 5 plants from Hortus Malabaricus
were studied during 2018 (Dr. Jollykutty Eapen, Dr. P. Y. Ansary,
Dr. A. Shahul Hameed, Dr. Indulekha. V. C, Dr. Resny. A. R).
The plants were Velutha mandaram (Bauhinia acuminata Linn.),
Payyani (Pajanelia longifolia Willd K. Schum.), Venkurinji (Justicia
betonica Linn.), Kasavu (Memecylon edule Roxb.) and Alpam (Thottia
siliquosa Lam.). One Post Graduate research study in the Dept of
Dravyaguna Vijnana, Govt Ayurveda College, Thiruvananthapuram
(Dr. Vandana Venugopalan, Dr. M. A. Shajahan, Dr. Indulekha V.
~ 14 ~
C.), completed in 2018 was on the in vitro and in silico antifungal
activity of Allium sativum Linn., Curcuma longa Linn., Emblica
officinalis Gaertn., and Acacia catechu (Linn.F.) Willd.
~ 15 ~
be improved, research and its translation into druggable target are
crucial. Ancient systems of healthcare (Ayurveda, Siddha, Unani
and Sowa-Rigpa) have been used from centuries for the treatment
vascular diseases and dementia. This traditional knowledge can be
transformed into novel targets through robust interplay of network
pharmacology (NetP) with reverse pharmacology (RevP), without
ignoring cutting edge biomedical data. A work demonstrated
interaction between recent and traditional data, and aimed at
selection of most promising targets for guiding wet lab validations.
PROTEOME, DisGeNE, DISEASES and Drug Bank databases
were used for selection of genes associated with pathogenesis
and treatment of vascular dementia (VaD). The selection of new
potential drug targets was made by methods of NetP (DIAMOnD
algorithm, enrichment analysis of KEGG pathways and biological
processes of Gene Ontology) and manual expert analysis. The
structures of 1976 phytomolecules from the 573 Indian medicinal
plants traditionally used for the treatment of dementia and vascular
diseases were used for computational estimation of their interactions
with new predicted VaD-related drug targets by RevP approach
based on PASS (Prediction of Activity Spectra for Substances)
software. It was found that 147 known genes were associated with
vascular dementia based on the analysis of the databases with gene-
disease associations. Six hundred novel targets were selected by
NetP methods based on 147 gene associations. The analysis of the
predicted interactions between 1976 phytomolecules and 600 NetP
predicted targets leaded to the selection of 10 potential drug targets
for the treatment of VaD11. Twenty four drugs interacting with
~ 16 ~
10 selected targets were identified from Drug Bank. The relation
between inhibition of two selected targets (GSK-3, PTP1B) and
the treatment of VaD was confirmed by the experimental studies on
animals and reported separately in our recent publications.
A number of plants have been described in Ayurveda
and other traditional medicine for the management of diabetes.
However, information about them is not easily available. Active
constituents of any medicinal plant define the efficacy and safety of
treatment to control hyperglycemia. The database was developed to
maintain the record of medicinal plants having anti-hyperglycemic
or anti-diabetic activity. The database contains information such as
plant name, its geographical distribution, useful plant part, known
dosage, active constituents, mechanism of action and clinical/
experimental data. The database also includes information about
plant raw material suppliers or manufacturers in India. The current
database includes 238 plants species and 123 Indian industries
using them12.
Psoriasis is a chronic relapsing immune mediated disorder
of the skin. The current systemic therapies aim to eliminate
the symptoms of disease rather than offering a complete cure.
Parangichakkai chooranam (PC), a Siddha oral herbal formulation
has been widely prescribed for the treatment of psoriasis. Though the
medication is highly prescribed by the Siddha healers the mechanism
of PC for the treatment of psoriasis remains to be elucidated. A
study utilized an integrated systems pharmacology approach to
decipher the mechanism of action of PC. The comprehensive
network pharmacological approach resulted in the construction
~ 17 ~
of a Compound-Target network which encloses 155 compounds
and 583 protein targets. A Disease-Target network was constructed
by assembling disease proteins and their partners. When the
compound targets were mapped to the network their involvement
as controllers of the disease and triggers of disease associated co-
morbidities were identified. A Target-Pathway network raised from
the pathway enrichment analysis not only identified disease specific
pathways but also the pathways mediating secondary complications
such as skin hemostasis, wound healing, desquamation and itch.
This work sheds light on the mechanism of action of PC in treating
psoriasis13.
In a Post Graduate thesis work titled, “Study on the
effect of Siravedha in varicose vein – A predictive systems biology
model”, done in Dept. of Kriya Shareera, Govt. Ayurveda College,
Pariyaram, Kannur in 2018 (Dr. Jayasree R Kartha, Dr. Ajitha K,
Dr. Abhilash M, Dr. Umesh P); a multi scale modelling was done
using Systems Biology to understand the effects of siravedha on
complex changes happening at subjective and objective levels. A
predictive model considering the impact of blood parameters, PO2
and PCO2 levels, area affected, diastolic BP as well as raktadushti
score on the outcome of siravedha was developed.
~ 18 ~
Other infrastructure such as telemedicine, hospital information
systems and also focus its implementation in modern medicine or
is not implemented and strategized at a national level to support
Traditional Medicine. Informatics may not be able to address all
the emerging areas of Traditional Medicine because the concepts
in Traditional Medicine system is different from modern system,
though the aim may be same, i.e., to give relief to the patient. Thus,
there is a need to synthesize Traditional Medicine systems and
informatics with involvements from modern system of medicine.
Future research works may include filling the gaps of informatics
areas and integrate national informatics infrastructure with
established Traditional Medicine systems14.
The practice of medicine is ever evolving. Diagnosing
disease, which is often the first step in a cure, has seen a sea change
from the discerning hands of the neighborhood physician to the
use of sophisticated machines to use of information gleaned from
biomarkers obtained by the most minimally invasive of means. The
last 100 or so years have borne witness to the enormous success
story of modern medicine. Nevertheless, failures of this approach
coupled with the omics and bioinformatics revolution spurred
precision medicine, a platform wherein the molecular profile of an
individual patient drives the selection of therapy. Indeed, precision
medicine-based therapies that first found their place in oncology are
rapidly finding uses in autoimmune, renal and other diseases. More
recently a new renaissance that is shaping everyday life is making its
way into healthcare. Drug discovery and medicine that started with
Ayurveda in India are now benefiting from an altogether different
~ 19 ~
artificial intelligence (AI) -one which is automating the invention of
new chemical entities and the mining of large databases in health-
privacy-protected vaults. Indeed, disciplines as diverse as language,
neurophysiology, chemistry, toxicology, biostatistics, medicine and
computing have come together to harness algorithms based on
transfer learning and recurrent neural networks to design novel drug
candidates, a prior inform on their safety, metabolism and clearance,
and engineer their delivery but only on demand, all the while
cataloguing and comparing omics signatures across traditionally
classified diseases to enable basket treatment strategies15.
Randomized ribozyme library was introduced into cancer
cells prior to the treatment with i-Extract. Ribozymes were
recovered from cells that survived the i-Extract treatment. Gene
targets of the selected ribozymes (as predicted by database search)
were analyzed by bioinformatics and pathway analyses. The targets
were validated for their role in i-Extract induced selective killing
of cancer cells by biochemical and molecular assays. Fifteen gene-
targets were identified and were investigated for their role in
specific cancer cell killing activity of i-Extract and its two major
components (Withaferin A and Withanone) by undertaking the
shRNA-mediated gene silencing approach. Bioinformatics on the
selected gene-targets revealed the involvement of p53, apoptosis
and insulin/IGF signaling pathways linked to the ROS signaling16.
In a study employing bioinformatics tools on four genes,
i.e., mortalin, p53, p21 and Nrf2, identified by loss-of-function
screenings, the docking efficacy of Wi-N and Wi-A to each of the
four targets were examined and found that the two closely related
~ 20 ~
phytochemicals have differential binding properties to the selected
cellular targets that can potentially instigate differential molecular
effects. They validated these findings by undertaking parallel
experiments on specific gene responses to either Wi-N or Wi-A in
human normal and cancer cells. It demonstrated that Wi-A that
binds strongly to the selected targets acts as a strong cytotoxic agent
both for normal and cancer cells. Wi-N, on the other hand, has a
weak binding to the targets; it showed milder cytotoxicity towards
cancer cells and was safe for normal cells. This molecular docking
analyses and experimental evidence revealed important insights to
the use of Wi-A and Wi-N for cancer treatment and development
of new anti-cancer phytochemical cocktails17.
In Ayurveda system of medicine individuals are classified
into seven constitution types, “Prakriti”, for assessing disease
susceptibility and drug responsiveness. Prakriti evaluation involves
clinical examination including questions about physiological and
behavioral traits. A need was felt to develop models for accurately
predicting Prakriti classes that have been shown to exhibit molecular
differences. A study was carried out on data of phenotypical
tributes in 147 healthy individuals of three extreme Prakriti types,
from a genetically homogeneous population of Western India.
Unsupervised and supervised machine learning approaches were
used to infer inherent structure of the data, and for feature selection
and building classification models for Prakriti respectively. These
models were validated in a North Indian population. Unsupervised
clustering led to emergence of three natural clusters corresponding
to three extreme Prakriti classes. The supervised modeling
~ 21 ~
approaches could classify individuals, with distinct Prakriti types,
in the training and validation sets. This study was the first to
demonstrate that Prakriti types are distinct verifiable clusters within
a multidimensional space of multiple interrelated phenotypic traits.
It also provided a computational framework for predicting Prakriti
classes from phenotypic attributes18.
Piper longum (P. longum, also called as long pepper) is
one of the common culinary herbs that has been extensively
used as a crucial constituent in various indigenous medicines,
specifically in Ayurveda. For exploring the comprehensive effect
of its constituents in humans at proteomic and metabolic levels,
all of its known phytochemicals were reviewed and enquired
about their regulatory potential against various protein targets
by developing high-confidence tripartite networks consisting of
phytochemical-protein target-disease association. This study also (i)
explored immunomodulatory potency of this herb; (ii) developed
subnetwork of human PPI regulated by its phytochemicals and
could successfully associate its specific modules playing important
rolein diseases, and (iii) reported several novel drug targets. P10636
(microtubule-associated protein tau, that is involved in diseases
like dementia etc.) was found to be the commonly screened target
by about seventy percent of these phytochemicals. 20 drug-like
phytochemicals were reported in this herb, out of which 7 were
found to be the potential regulators of 5 FDA approved drug
targets. Multi-targeting capacity of 3 phytochemicals involved
in neuroactive ligand receptor interaction pathway was further
explored via molecular docking experiments. To investigate the
~ 22 ~
molecular mechanism of P. longum’s action against neurological
disorders, a computational framework was developed that can be
easily extended to explore its healing potential against other diseases
and can also be applied to scrutinize other indigenous herbs for
drug-design studies19
The effects of integrative medicine practices such as
meditation and Ayurveda on human physiology are not fully
understood. A study was conducted to identify altered metabolomic
profiles following an Ayurveda-based intervention. In the
experimental group 65 healthy male and female subjects participated
in a 6-day Panchakarma-based Ayurvedic intervention which
included herbs, vegetarian diet, meditation, yoga, and massage. A
set of 12 plasma phosphatidylcholines decreased (adjusted p < 0.01)
post-intervention in the experimental (n = 65) compared to control
group (n = 54) after Bonferroni correction for multiple testing;
within these compounds, the phosphatidylcholine with the greatest
decrease in abundance was PC ae C36:4 (delta = −0.34). Application
of a 10% FDR revealed an additional 57 metabolites that were
differentially abundant between groups. Pathway analysis suggests
that the intervention results in changes in metabolites across many
pathways such as phospholipid biosynthesis, choline metabolism,
and lipoprotein metabolism. The observed plasma metabolomic
alterations may reflect a Panchakarma-induced modulation of
metabotypes. Panchakarma promoted statistically significant
changes in plasma levels of phosphatidylcholines, sphingomyelins
and others in just 6 days20. Forthcoming studies that integrate
metabolomics with genomic, microbiome and physiological
~ 23 ~
parameters may facilitate a broader systems-level understanding
and mechanistic insights into these integrative practices that are
employed to promote health and well-being.
~ 24 ~
similarities in androgen (An) nuclear receptor behavior, whereas
thymus constitutions are mainly regulated by T-cells (Tc) nuclear
receptor behavior. Moreover, it suggests that thyrus constitutions
share similarities in thyroxine (Th) nuclear receptor behavior. These
proposed nuclear receptors are expected to regulate the expression
of specific genes, thereby controlling the embryonic development,
adult homeostasis, and metabolism of the human organism in a
very profound way. The method finally predicts small differences
in measured property (An, Tc, and Th nuclear receptors behaviour)
within a birth constitution across different races to be expected by
modulation effects in melanocyte-stimulating hormone receptor
behavior21.
It also seems from our observations that analyzing extreme
constitution types that have phenotype-phenotype linkages within
them might allow us to identify important axes such as hypoxia,
apoptosis, inflammation, etc. that could contribute to system wide
changes. For instance, differences in hypoxia-inducible factors
(HIF) through expression differences in EGLN1 could not only
contribute to differential prognosis in various diseases such as cancer,
asthma, chronic obstructive pulmonary disease, ischemia, stroke,
etc. where hypoxia is implicated but could also lead to variability
in processes such as inflammation, metabolism, erythrocytosis,
oxidative stress, and other downstream targets of HIF. The outcome
of these differences could be accessed through physiological and
biochemical measurements. Some of these parameters could
connect to features that are described for Prakriti assessment and
thereby help objectivise them for global applicability. Enrichment
~ 25 ~
for genes belonging to the key cellular pathways in a single data
set strengthened our belief that categorizing into Ayurvedic
phenotypes captures the differential regulation of these processes
and hence must be testable at the level of multi organ physiology.
Therefore, the next challenge is in threading the intraindividual
physiological and molecular attributes through Prakriti phenotypes.
This approach fits well with “systems theory” which implies that
the “whole is greater than the sum of its parts”. Identification of
functional axes in healthy individuals would be a key to uncovering
intra-individual cryptic phenotype-phenotype links. For example,
these axes, based upon the salient features, could be
1. highly connected to many organs; just as in gene networks,
hubs in physiological functioning would be key in organizing
the system
2. easily quantifiable in a relatively noninvasive manner
3. well-defined and characterized in the modern system of
medicine and physiology
4. known to have diverse disease associations
More than one axis in the same individual can be measured,
each one measuring a slightly different yet physiologically connected
function. These measures could then be collapsed into latent
variables by using dimensionality reduction techniques that could
be overlaid with Prakriti information for supervised classification
tools to develop objective classifiers of V, P, and K. A blind approach
such as hierarchical clustering can also be applied to assess the
clusters formed on the basis of modern anatomical-physiological
~ 26 ~
measurements and the concordance of these with Prakriti driven
clusters formed through a supervised machine learning approach22.
Genetic differences in the target proteins, metabolizing
enzymes and transporters that contribute to inter-individual
differences in drug response are not integrated in contemporary drug
development programs. Ayurveda, that has propelled many drug
discovery programs albeit for the search of new chemical entities
incorporates inter-individual variability “Prakriti” in development
and administration of drug in an individualized manner. Prakriti
of an individual largely determines responsiveness to external
environment including drugs as well as susceptibility to diseases.
Prakriti has also been shown to have molecular and genomic
correlates. To highlight how integration of Prakriti concepts can
augment the efficiency of drug discovery and development programs,
a unique initiative of Ayurgenomics TRISUTRA consortium was
designed. Five aspects that have been carried out are
(1) Analysis of variability in FDA approved
pharmacogenomics genes/SNPs in exomes of 72 healthy individuals
including predominant Prakriti types and matched controls from a
North Indian Indo-European cohort
(2) Establishment of a consortium network and development
of five genetically homogeneous cohorts from diverse ethnic and
geo-climatic background
(3) Identification of parameters and development of uniform
standard protocols for objective assessment of Prakriti types
~ 27 ~
(4) Development of protocols for Prakriti evaluation and its
application in more than 7500 individuals in the five cohorts
(5) Development of data and sample repository and
integrative omics pipelines for identification of genomic correlates.
Highlight of the study are
(1) Exome sequencing revealed significant differences
between Prakriti types in 28 SNPs of 11 FDA approved genes of
pharmacogenomics relevance viz CYP2C19 CYP2B6, ESR1, F2,
PGR, HLA-B, HLA-DQA1, HLA-DRB1, LDLR, CFTR, CPS1.
These variations are polymorphic in diverse Indian and world
populations included in 1000 genomes project.
(2) Based on the phenotypic attributes of Prakriti, the
study identified anthropometry for anatomical features, biophysical
parameters for skin types, HRV for autonomic function tests,
spirometry for vital capacity and gustometry for taste thresholds as
objective parameters.
(3) Comparison of Prakriti phenotypes across different
ethnic, age and gender groups led to identification of invariant
features as well as some that require weighted considerations across
the cohorts.
Considering the molecular and genomics differences
underlying Prakriti and relevance in disease pharmacogenomics
studies, this novel integrative platform would help in identification of
differently susceptible and drug responsive population. Additionally,
integrated analysis of phenomic and genomic variations would not
only allow identification of clinical and genomic markers of Prakriti
~ 28 ~
for application in personalized medicine; but also its integration in
drug discovery and development programs23.
To conclude; not only the concepts of Ayurveda can be
safeguarded using the platform of Bioinformatics, but also the
clinical efficacy of Ayurvedic measures can be better represented
using its tools provided the researchers have insight in both
informatics and Ayurveda.
References
1. Deocaris, C.C., Widodo, N., Wadhwa, R. et al.Merger of Ayurveda and
Tissue Culture-Based Functional Genomics: Inspirations from Systems
Biology. J TranslMed 6, 14 (2008). https://doi.org/10.1186/1479-5876-
6-14
2. Gao, R., Shah, N., Lee, J.-S., Katiyar, S. P., Li, L., Oh, E., Kaul, S. C.
Withanone-Rich Combination of Ashwagandha Withanolides Restricts
Metastasis and Angiogenesis through hnRNP-K. Molecular Cancer
Therapeutics, 13(12), 2930–2940. (2014). doi:10.1158/1535-7163.mct-
14-0324
~ 29 ~
5. Margret, A. A., Begum, T. N., Parthasarathy, S., &Suvaithenamudhan,
S. A Strategy to Employ Clitoriaternatea as a Prospective Brain Drug
Confronting Monoamine Oxidase (MAO) Against Neurodegenerative
Diseases and Depression. Natural Products and Bioprospecting, 5(6),
293–306.(2015). doi:10.1007/s13659-015-0079-x
9. Sahu, J., Das Talukdar, A., Devi, K., Choudhury, M. D., Barooah, M.,
Modi, M. K., & Sen, P. E-Microsatellite Markers for Centellaasiatica
(Gotu Kola) Genome: Validation and Cross-Transferability in Apiaceae
Family for Plant Omics Research and Development. OMICS: A Journal
of Integrative Biology, 19(1), 52–65. (2015). doi:10.1089/omi.2014.0113
10. Choudhary, N., & Singh, V. Insights about multi-targeting and synergistic
neuromodulators in Ayurvedic herbs against epilepsy: integrated
computational studies on drug-target and protein-protein interaction
networks. Scientific Reports, 9(1). (2019). doi:10.1038/s41598-019-
46715-6
~ 30 ~
11. Lagunin, A. A., Ivanov, S. M., Gloriozova, T. A., Pogodin, P. V., Filimonov, D.
A., Kumar, S., & Goel, R. K. Combined network pharmacology and virtual
reverse pharmacology approaches for identification of potential targets
to treat vascular dementia. Scientific Reports, 10(1). (2020). doi:10.1038/
s41598-019-57199-9
12. Singh, S., Gupta, S. K., Sabir, G., Gupta, M. K., & Seth, P. K. A database for
anti-diabeticplants with clinical/experimental trials. Bioinformation, 4(6),
263–268(2009)..https://doi.org/10.6026/97320630004263
15. Dana, D., Gadhiya, S. V., St Surin, L. G., Li, D., Naaz, F., Ali, Q., Paka, L.,
Yamin, M. A., Narayan, M., Goldberg, I. D., & Narayan, P. Deep Learning
in Drug Discovery and Medicine; Scratching the Surface. Molecules
(Basel, Switzerland), 23(9), 2384.(2018). https://doi.org/10.3390/
molecules23092384
~ 31 ~
Activities of the Two Closely Related Withanolides, Withaferin A and
Withanone: Bioinformatics and Experimental Evidences. PLoS ONE 7(9):
e44419.(2012). doi:10.1371/journal.pone.0044419
20. Peterson, C. T., Lucas, J., John-Williams, L. S., Thompson, J. W., Moseley,
M. A., Patel, S., … Chopra, D. Identification of Altered Metabolomic
Profiles Following a Panchakarma-based Ayurvedic Intervention in
Healthy Subjects: The Self-Directed Biological Transformation Initiative
(SBTI). Scientific Reports, 6(1). (2016). doi:10.1038/srep32609
22. Tav Pritesh Sethi, Bhavana Prasher, and Mitali Mukerji. Ayurgenomics: A
New Way of Threading Molecular Variability for Stratified Medicine. ACS
Chemical Biology 2011 6 (9), 875-880. DOI: 10.1021/cb2003016
~ 32 ~
23. Bhavana Prasher, Binuja Varma, Arvind Kumar, Bharat Krushna Khuntia,
Rajesh Pandey, Ankita Narang, Pradeep Tiwari, RintuKutum, DebleenaGuin,
RitushreeKukreti, Debasis Dash and Mitali Mukerji, Ayurgenomics for
stratified medicine: TRISUTRA consortium initiative across ethnically and
geographically diverse Indian populations, Journal of Ethnopharmacology,
http://dx.doi.org/10.1016/j.jep.2016.07.063
~ 33 ~
Introduction to Bioinformatics
Bioinformatics involves the integration of computers, software
tools, and databases in an effort to address biological questions.
Bioinformatics approaches are often used for major initiatives
that generate large data sets. Two important large-scale activities
that use bioinformatics are genomics and proteomics. Genomics
refers to the analysis of genomes. A genome can be thought of as
the complete set of DNA sequences that codes for the hereditary
material that is passed on from generation to generation. These
DNA sequences include all of the genes (the functional and physical
unit of heredity passed from parent to offspring) and transcripts
(the RNA copies that are the initial step in decoding the genetic
information) included within the genome. Thus, genomics refers
to the sequencing and analysis of all of these genomic entities,
including genes and transcripts, in an organism. Proteomics, on the
other hand, refers to the analysis of the complete set of proteins or
proteome. In addition to genomics and proteomics, there are many
more areas of biology where bioinformatics is being applied (i.e.,
metabolomics, transcriptomics). Each of these important areas in
bioinformatics aims to understand complex biological systems.
Many scientists today refer to the next wave in bioinformatics as
systems biology, an approach to tackle new and complex biological
questions. Systems biology involves the integration of genomics,
proteomics, and bioinformatics information to create a whole
system view of a biological entity.
For instance, how a signaling pathway works in a cell can be
~ 34 ~
addressed through systems biology. The genes involved in the
pathway, how they interact, and how modifications change the
outcomes downstream, can all be modeled using systems biology.
Any system where the information can be represented digitally offers
a potential application for bioinformatics. Thus bioinformatics can
be applied from single cells to whole ecosystems. By understanding
the complete “parts lists” in a genome, scientists are gaining a better
understanding of complex biological systems. Understanding the
interactions that occur between all of these parts in a genome or
proteome represents the next level of complexity in the system.
Through these approaches, bioinformatics has the potential to offer
key insights into our understanding and modeling of how specific
human diseases or healthy states manifest themselves.
The beginning of bioinformatics can be traced back to Margaret
Dayhoff in 1968 and her collection of protein sequences known
as the Atlas of Protein Sequence and Structure. One of the early
significant experiments in bioinformatics was the application
of a sequence similarity searching program to the identification
of the origins of a viral gene. In this study, scientists used one of
the first sequence similarity searching computer programs (called
FASTP), to determine that the contents of v-sis, a cancer-causing
viral sequence, were most similar to the well-characterized cellular
PDGF gene. This surprising result provided important mechanistic
insights for biologists working on how this viral sequence causes
cancer. From this first initial application of computers to biology, the
field of bioinformatics has exploded. The growth of bioinformatics
is parallel to the development of DNA sequencing technology. In
~ 35 ~
the same way that the development of the microscope in the late
1600’s revolutionized biological sciences by allowing Anton Van
Leeuwenhoek to look at cells for the first time, DNA sequencing
technology has revolutionized the field of bioinformatics. The
rapid growth of bioinformatics can be illustrated by the growth of
DNA sequences contained in the public repository of nucleotide
sequences called GenBank.
Genome sequencing projects have become the flagships of many
bioinformatics initiatives. The human genome sequencing project
is an example of a successful genome sequencing project but many
other genomes have also been sequenced and are being sequenced.
In fact, the first genomes to be sequenced were of viruses (i.e.,
the phage MS2) and bacteria, with the genome of Haemophilus
influenzae Rd being the first genome of a free living organism to be
deposited into the public sequence databanks. This accomplishment
was received with less fanfare than the completion of the human
genome but it is becoming clear that the sequencing of other
genomes is an important step for bioinformatics today. However,
genome sequence by itself has limited information. To interpret
genomic information, comparative analysis of sequences needs to
be done and an important reagent for these analyses are the publicly
accessible sequence databases. Without the databases of sequences
(such as GenBank), in which biologists have captured information
about their sequence of interest, much of the rich information
obtained from genome sequencing projects would not be available.
The same way developments in microscopy foreshadowed discoveries
in cell biology, new discoveries in information technology and
~ 36 ~
molecular biology are foreshadowing discoveries in bioinformatics.
In fact, an important part of the field of bioinformatics is the
development of new technology that enables the science of
bioinformatics to proceed at a very fast pace. On the computer
side, the Internet, new software developments, new algorithms,
and the development of computer cluster technology has enabled
bioinformatics to make great leaps in terms of the amount of
data which can be efficiently analyzed. On the laboratory side,
new technologies and methods such as DNA sequencing, serial
analysis of gene expression (SAGE), microarrays, and new mass
spectrometry chemistries have developed at an equally blistering
pace enabling scientists to produce data for analyses at an incredible
rate. Bioinformatics provides both the platform technologies that
enable scientists to deal with the large amounts of data produced
through genomics and proteomics initiatives as well as the approach
to interpret these data. In many ways, bioinformatics provides the
tools for applying scientific method to large-scale data and should
be seen as a scientific approach for asking many new and different
types of biological questions.
The word bioinformatics has become a very popular “buzz” word in
science. Many scientists find bioinformatics exciting because it holds
the potential to dive into a whole new world of uncharted territory.
Bioinformatics is a new science and a new way of thinking that could
potentially lead to many relevant biological discoveries. Although
technology enables bioinformatics, bioinformatics is still very
much about biology. Biological questions drive all bioinformatics
experiments. Important biological questions can be addressed by
~ 37 ~
bioinformatics and include understanding the genotype-phenotype
connection for human disease, understanding structure to function
relationships for proteins, and understanding biological networks.
Bioinformaticians often find that the reagents necessary to answer
these interesting biological questions do not exist. Thus, a large part
of a bioinformatician’s job is building tools and technologies as part
of the process of asking the question. For many, bioinformatics is
very popular because scientists can apply both their biology and
computer skills to developing reagents for bioinformatics research.
Many scientists are finding that bioinformatics is an exciting new
territory of scientific questioning with great potential to benefit
human health and society.
The future of bioinformatics is integration. For example, integration
of a wide variety of data sources such as clinical and genomic data
will allow us to use disease symptoms to predict genetic mutations
and vice versa. The integration of GIS data, such as maps, weather
systems, with crop health and genotype data, will allow us to
predict successful outcomes of agriculture experiments. Another
future area of research in bioinformatics is large-scale comparative
genomics. For example, the development of tools that can do 10-
way comparisons of genomes will push forward the discovery rate
in this field of bioinformatics. Along these lines, the modeling and
visualization of full networks of complex systems could be used
in the future to predict how the system (or cell) reacts, to a drug,
for example. A technical set of challenges faces bioinformatics and
is being addressed by faster computers, technological advances in
disk storage space, and increased bandwidth, but by far one of the
~ 38 ~
biggest hurdles facing bioinformatics today, is the small number of
researchers in the field. This is changing as bioinformatics moves
to the forefront of research but this lag in expertise has lead to real
gaps in the knowledge of bioinformatics in the research community.
Finally, a key research question for the future of bioinformatics
will be how to computationally compare complex biological
observations, such as gene expression patterns and protein networks.
Bioinformatics is about converting biological observations to a
model that a computer will understand. This is a very challenging
task since biology can be very complex. This problem of how to
digitize phenotypic data such as behavior, electrocardiograms, and
crop health into a computer readable form offers exciting challenges
for future bioinformaticians.
1. Biological Databases
~ 39 ~
Boolean operators
For complex queries in database Boolean operators can be used.
AND- contain both search terms
OR- either of search terms
NOT- exclude either one of search terms
Procedure:
• Click search button and wait for display of result and view results
• Select and collect the sequence of first hit, obtained after search
in FASTA format by changing to FASTA option (GenBank Flat
file format and FASTA format explained in appendix)
~ 40 ~
• Make a table of the top few organisms and corresponding number
of sequence entry for each organism.
Result:
~ 41 ~
Figure 2: FASTA sequence of first hit on database search in
GenBank
For retrieving 16s rRNA gene from bacillus following steps are done
~ 42 ~
Procedure:
• Click search option and wait for results and analyze results
• Select and collect the sequence of first hit, obtained after search
in FASTA format by changing to FASTA option.
Results
~ 43 ~
Figure 4: Expansion of annotated nucleotide sequence result
~ 44 ~
sequences from researchers and to issue the internationally
recognized accession number to data submitters. URL for DDBJ is
http://www.ddbj.nig.ac.jp/
Procedure:
• View results
Results
~ 45 ~
1.4. Protein sequence retrieval from Uniprot KB
Introduction
Procedure:
• Click search button and wait for display of result and view results
• Note down the sequence length of first hit and sequence status by
double clicking on the first hit
• Select and collect the sequence of first hit, obtained after search
in FASTA format.
~ 46 ~
Results
~ 47 ~
Figure 9: Sequence of first hit Q9IGQ6 in FASTA format
~ 48 ~
Procedure:
• Note down how many entries are available for nucleotide, protein
and structure respectively.
Results:
After the search in Entrez, it has shown all the entries for superoxide
dismutase from various databases including Nucleotide, Protein,
EST, Structure etc. Large numbers of sequences are available for
Superoxide dismutase.
~ 49 ~
Figure 10: Output of Cross-reference database search using
Entrez
1.6 R
etrieval of 3D structure from Protein
Databank (PDB)
Introduction:
~ 50 ~
Research Collaboratory for Structural Bioinformatics (RCSB). A
deposited set of protein coordinates becomes an entry in PDB.
Each entry is given a unique code, PDB id, consisting of four
characters of either letters A to Z or digits 0 to 9 such as 1LYZ
and 4RCR. It consists of an explanatory header section followed
by an atomic coordinate section. The header section provides an
overview of the protein such as information about the name of
the molecule, source organism, bibliographic reference etc. In
the structure coordinates section, there are atom part referring
protein atom and HETATM part indicating cofactor/substrate
along with its co-ordinates in specified columns.
Procedure
• Type HIV1 Protease and click result count shows how many pdb
entries related to HIV1 Protease are present in pdb
• Click PDB Entities (unique chains) and it will show all pdb
entries page by page
~ 51 ~
• Note down details about the hit such as experimental method,
resolution, ligand/ chemical component (bound) etc.
• Open in WordPad and view the pdb format (see appendix for
pdb format).
Results
~ 52 ~
Figure 12: Description of HIV 1 Protease structure under PDB
ID: 2HVP
Details of 2HVP
Experimental method X-RAY DIFFRACTION
Resolution 3 A0
~ 53 ~
1.7. D
atabase Searching Using Heuristic Pairwise
Alignment Program BLAST
Introduction
Procedure:
~ 54 ~
• Nucleotide sequence in FASTA format or accession number
should be given as input to the program (Nucleotide may either
obtained from sequencing or from database).
• All the other parameters were kept default here such as Word
size, Expect threshold, Gap Costs etc.
Results
~ 55 ~
Figure 13: BLAST output shown indicates that the given
sequence shows 100% identity with nucleotide in the database
with accession number CP041750.1 and 99% identity with
MK691443.1
~ 56 ~
• Numerous Databases are available where search can be performed.
Choosing appropriate Database is important. Non-redundant
Protein Sequence (nr) was selected as Database for searching.
• All the other parameters were kept default here such as word size,
Expect threshold, gap costs etc. By default, BLOSUM 62 was
used as substitution matrix
Results
The blast output shows homologous protein sequence of the query
sequence from database search
~ 57 ~
Figure 14: BLAST output shown indicates that the given
sequence shows 100% identity with protein in the database
with accession number AAF08111.1 and 99% identity with
AAF08110.1
~ 58 ~
2. Multiple Sequence Alignment using Clustal X 2.1
Introduction
Prerequisites:
Procedure
~ 59 ~
sequence from different organism were identified which are
carefully inspected and selected
Results
~ 60 ~
SI No: Accession No: Protein Organism
1 NP_000052.1 BTK Human
~ 61 ~
Figure 16: Guide tree used by Clustal to align the residues
visualized in Treeview X.
~ 62 ~
3. A
lignment Editing (BioEdit Sequence Alignment
Editor V 7.2.5)
Introduction:
Prerequistes:
Procedure
~ 63 ~
• Switch to Edit mode in BioEdit.
• View the total MSA output by using graphic view option (File →
Graphic View → Edit Copy page as Bitmap (Ctrl + C) and paste
in MS word (Ctrl + V))
Result
~ 64 ~
Phylogenetic Analysis Using
MEGA X
Introduction:
~ 65 ~
Maximum parsimony a character-based approach computes tree
based on sequence character than pairwise distances. The parsimony
method chooses a tree that has the fewest evolutionary changes or
shortest overall branch lengths
Prerequisites
Procedure
~ 66 ~
Molecular sequences are the primary data in order to construct a
phylogenetic tree.
~ 67 ~
Alignment is edited in Bioedit sequence alignment editor was
saved in FASTA format with. fas extension. The output obtained
is used for further analysis in MEGA X
• Choose model
~ 68 ~
• Under Phylogeny Test → Test of Phylogeny box pull down
Bootstrap Method
• Click OK
~ 69 ~
• Click Root the tree on selected branch option to define
common ancestor if already known.
• Change representation of tree View → Topology Only
• In tree Explorer Go to File → Export Current Tree (Newick)→
check Branch Lengths & Bootstrap values → Save as. nwk
• Open TreeViewX installed
• File → Open .nwk file in TreeViewX.
• Change representation of tree to Cladogram/Phylogram etc.
Results
~ 70 ~
Table 5: Showing selected BTK sequences for Phyolgenetic
Analysis
~ 71 ~
Primer Designing
Introduction:
~ 72 ~
of primers (GC Clamp) helps to ensure correct binding at the 3’
end due to the stronger hydrogen bonding of G/C residues.
Procedure:
• See results
Results
~ 73 ~
5 different primer pairs were generated for the input sequences
which are shown in table: After validation all primers were found
to be acceptable. The primer has to be validated in laboratory
condition for further optimization.
~ 74 ~
Primer Sequence Length GC Tm Pr o d u c t
Size
Primer_F TCTCCCGCACTCTTGAAACT 20bp 50.0% 60.0oC 194bp
Primer_R CCACTGCGAAGTCAACTGAA 20bp 50.0% 60.0 C
o
~ 75 ~
Apart from Primer3Plus There are several web-based services or
stand-alone software provided to the public for primer design,
such as PRIDE, PRIMER MASTER, PRIMO, Primer3, Prime
and Web Primer (http://genome-www2.stanford.edu/cgi-bin/
SGD/web-primer), and Primer Design Assistant (PDA). Users
can define the parameters listed in the menu of these tools and
then get several pairs of primers for the target template sequence
~ 76 ~
Gene Prediction Using GeneMark
Introduction
~ 77 ~
Procedure:
~ 78 ~
Results
~ 79 ~
Figure 24: All predicted genes in both forward and reverse
directions using GeneMark.hmm
~ 80 ~
Molecular Visualization
Introduction:
Requirements:
Procedure
~ 81 ~
• Open RasMol and Load the structure 1AJX.pdb (File → Open
→ 1AJX.pdb)
• Use Mouse to rotate the molecule. Press shift and use mouse to
zoom in and out.
• View results
Results
~ 82 ~
Figure 25: RasMol view of Figure 26: RasMol view of
1AJX.pdb Color: Temparature, 1AJX.pdb Representation:
Representation: Wireframe Ribbon, then run command
select * a + color red
followed by select * b + color
green
~ 83 ~
Secondary Structure Prediction
Using PSIPRED V 3.0
Introduction:
Methods
~ 84 ~
• Select prediction method as PSIPRED 4.0 (Predict Secondary
Structure) under Choose prediction methods
• Then click Submit for prediction after giving Job name to get the
output.
Results
~ 85 ~
Secondary structure prediction
using GOR IV Algorithm
Introduction:
Methods
• View Output
Results
~ 87 ~
References
1. Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D :
Basic local alignment search tool. J Mol Biol 1990, 215(3):403-
410.
2. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z.,
Miller, W. and Lipman, D: Gapped BLAST and PSI-BLAST:
a new generation of protein database search programs. Nucleic
Acids Res 1997, 25(17):3389-3402.
~ 88 ~
Analysis using Maximum Likelihood, Evolutionary Distance,
and Maximum Parsimony Methods. Molecular Biology and
Evolution 2011, 28: 2731-2739.
~ 89 ~
Database and Online Tool
Website
GenBank http://www.ncbi.nlm.nih.gov/genbank/
EMBL-Bank http://www.ebi.ac.uk/ena/
DDBJ http://www.ddbj.nig.ac.jp/
UniProtKB http://www.uniprot.org/
NCBI - Entrez http://www.ncbi.nlm.nih.gov/gquery/
Protein Data Bank https://www.rcsb.org/
NCBI-BLAST https://blast.ncbi.nlm.nih.gov/Blast.cgi
Primer3Plus http://www.bioinformatics.nl/cgi-bin/
primer3plus/primer3plus.cgi
GenMark.hmm for http://exon.gatech.edu/GeneMark/
Prokaryotes heuristic_gmhmmp.cgi
PSI-PRED server http://bioinf.cs.ucl.ac.uk/psipred/
Server for GOR IV https://npsa-prabi.ibcp.fr/cgi-bin/npsa_
method: automat.pl?page=npsa_gor4.html
~ 90 ~
Appendix
Software
~ 91 ~
a sequence name. Sometimes, extra information such as gi number
or comments can be given, which are separated from the sequence
name by a “|” symbol. The extra information is considered optional
and is ignored by sequence analysis programs. The plain sequence
in standard one-letter symbols starts in the second line. Each line of
sequence data is limited to sixty to eighty characters in width.
~ 92 ~
GenBank Flat file format
~ 93 ~
Variants of BLAST
~ 94 ~
Blast Output:
E value:
E value Interpretation
E<1e-50(1x10 ) High confidence that match is a result of
-50
homologous relation.
0.01<E>1e-50 Can be a result of homology.
10 <E>0.01 Not significant. May be remote homology,
additional evidence required.
E>10 Unrelated.
Bit Score:
S’=λ x S-InK/ln2
~ 95 ~
• Higher the bit score the more significant match
• Total Score: By the sum of scores from all HSPs from the same
database sequence
~ 96 ~
View publication stats