Transcription factors are key regulators of gene expression, involved via processes suchas the recruitment of transcription initiation factors and conformational change of DNA,working alone or as part of protein complexes.
GeneSeeker
can find genes within a chromosomal location that are localized in particular tissues, by looking at human and mouse expression data. Another method of associating disease genes to anatomical locations performed text mining of PubMedabstracts to associate eVOC anatomical ontology terms to gene names.Machine learning approaches can be used when a representative set of disease genes areavailable to use as training data. In
DGP
, a decision tree classification approach is used tofind features common to disease genes based on a training set composed of sampledisease and control proteins. Features were protein length, BLASTP ratios (conservationscore) between a protein and its highest scoring homologue within taxonomic groups(representing phylogenetic conservation and extent) and the conservation score with theclosest paralogue. The study indicates that, on average, hereditary disease genes (genestaken from OMIM) in comparison to randomly selected genes are longer, moreconserved, phylogenetically extended and without close paralogues.
PROSPECTR
uses a wider variety of features, including the length of the gene, thelength of its coding sequence, the length of its cDNA, length of the protein, GC contentand percentage protein identity with its nearest homologue in various species (mouse,worm, fly). The investigators used an alternating decision tree, taking genes from OMIMand comparing against genes not found in OMIM. They also generated two independenttest sets – one using genes from the Human Gene Mutation Database with randomlyselected control genes, and another set of 54 genes not in OMIM, again with a set of randomly selected control genes.
POCUS
takes another machine learning approach, using a selected training set of geneslinked to the target disease.
POCUS
identifies common features between all the traininggenes – InterPro domains, GO annotations, similar expression profile – and assesses thechance that such common features would be shared by chance. This method depends ona carefully selected training set of genes, and focuses the likelihood of these genes allsharing common, disease-related properties, in contrast to methods that focus onoverrepresentation of properties among the training genes.
Add a Comment
father mike carvellleft a comment
Ilham Ramadhanleft a comment
ElChupanibreleft a comment
bella n cayunk ku nyeleft a comment