/  4
 
Research Proposal
Bioinformatics approach to evaluation of Transcription factor genes anddiseases (Cancer) 
Brijesh Singh Yadav
(Senior Research Associates, URC, Allahabad)E-Mail:  brijeshbioinfo@gmail.com
Problem Statement:
The purpose of the proposed research is the development of a computational approach toquantitatively evaluate associations between transcription factor encoding genes andhuman diseases, based on available literature evidence. The approach will analyze a setof candidate genes and determine which genes are linked to human diseases, which properties are involved in these gene-disease linkages, and which clusters of similar genes are involved in particular diseases.During the course of the research, I shall explore methods for recapitulating existingassociations and predicting novel associations based on diverse forms of data pertainingto genes and diseases. These methods will evaluate the resulting associations in aquantitative manner, and the resulting analyses will be validated to determine the efficacyof the methods.
Background:
Identification of functional causes and contributing mechanisms of disease is a principalaim of biomedical research. In many cases, the term “disease” broadly applies to aheterogeneous set of observable properties, which may arise from multiple molecular  processes. Disease is often characterized by symptoms and a pattern of progression over time. The area of Cancer diseases is particularly broad, encompassing a wide range of complex, abnormal phenotypes. Compared to diseases associated with other organs,many types of cancer like brain cancer tend to be poorly understood: many are difficult tocharacterize and have complex genetic components involving multiple genes.
 
Transcription factors are key regulators of gene expression, involved via processes suchas the recruitment of transcription initiation factors and conformational change of DNA,working alone or as part of protein complexes.
GeneSeeker
can find genes within a chromosomal location that are localized in particular tissues, by looking at human and mouse expression data. Another method of associating disease genes to anatomical locations performed text mining of PubMedabstracts to associate eVOC anatomical ontology terms to gene names.Machine learning approaches can be used when a representative set of disease genes areavailable to use as training data. In
DGP
, a decision tree classification approach is used tofind features common to disease genes based on a training set composed of sampledisease and control proteins. Features were protein length, BLASTP ratios (conservationscore) between a protein and its highest scoring homologue within taxonomic groups(representing phylogenetic conservation and extent) and the conservation score with theclosest paralogue. The study indicates that, on average, hereditary disease genes (genestaken from OMIM) in comparison to randomly selected genes are longer, moreconserved, phylogenetically extended and without close paralogues.
PROSPECTR
uses a wider variety of features, including the length of the gene, thelength of its coding sequence, the length of its cDNA, length of the protein, GC contentand percentage protein identity with its nearest homologue in various species (mouse,worm, fly). The investigators used an alternating decision tree, taking genes from OMIMand comparing against genes not found in OMIM. They also generated two independenttest sets – one using genes from the Human Gene Mutation Database with randomlyselected control genes, and another set of 54 genes not in OMIM, again with a set of randomly selected control genes.
POCUS
takes another machine learning approach, using a selected training set of geneslinked to the target disease.
POCUS
identifies common features between all the traininggenes – InterPro domains, GO annotations, similar expression profile – and assesses thechance that such common features would be shared by chance. This method depends ona carefully selected training set of genes, and focuses the likelihood of these genes allsharing common, disease-related properties, in contrast to methods that focus onoverrepresentation of properties among the training genes.
 
Proposed Method:
Most of the existing methods for the computational prediction of linkages between genesand disease take as input a preliminary list of candidate genes (e.g. genes in a genomicregion linked in a genetic study to a disease), and return as output either a reduced or aranked list. The underlying approaches differ substantively between methods. Examplesof characteristics used in the methods include numerical features derived from the rawsequence of genes and/or encoded proteins, existing annotations of proteins and genes,and abstracts or articles directly referring to the gene. The current methods focus on using properties from a representative set of genes to identify similar genes from the candidateset.We propose a method of extracting gene-disease associations that will emphasiseverifiable supporting evidence for the predicted associations, and a quantitativeevaluation of the strength of the association. We shall investigate both associations between genes and disease, as well as properties of the gene-disease association.We shall consider three base entities – Genes, Diseases, Evidence – and the relationships between these entities.
Goal of Research:
Our goal will be to predict Gene-Disease relationships based on the existence of relationships between other entity pairings. After initial study of mammalian gene-diseaserelationships, we will broaden the approach to incorporate entity relationships involvingorthologous genes in model organisms or related diseases. These paths of supportingevidence will be quantitatively evaluated, making it possible to both extract stronglysupported gene-disease linkages and to rank these linkages.Although the thesis itself will investigate properties of transcription factor genes inCancer diseases, the methods and analysis will be designed for general application. For the initial analysis of the main gene-disease associations.

Share & Embed

More from this user

Add a Comment

Characters: ...

father mike carvellleft a comment

terrific read thanks

Ilham Ramadhanleft a comment

thank u very much

bella n cayunk ku nyeleft a comment

ape daa............