You are on page 1of 2

Final Year Bioinformatics Projects

Bioinformatics research projects are like traditional biological projects, only they are carried out
using computers. So does this mean you need to be able to know how to write programs to do
bioinformatics research? No. Most bioinformatic projects involve using programs to generate new
data (e.g. BLAST), querying online databases and resources (such as genome browsers), and/or
downloading, integrating and analysing existing datasets (such as microarray or proteomics data).
Others may involve improving or adding information to existing databases, or maybe even building
new ones. Some projects may also involve the design and implementation of new software, or
adding new functions to existing programs. In all cases, bioinformatics projects address a biological
research question, have clearly identifiable objectives that the project sets out to meet, and a proper
study design that attempts to address these objectives, just like any experimental or field project.

Most bioinformatic projects are designed to be accessible to all students in FLS, and can be tailored
to suit the skills and interests of individual students. Even in cases where some programming may be
involved, all interested students should be able to successfully complete the projects regardless of
the degree programme that they are on. The requirements for a bioinformatics project are familiarity
with the standard word processing and spreadsheet tools, an interest in computing, and enthusiasm
for data analysis, but not specialist programming skills.

Examples. These are either examples of projects which have been run already, or generic cases
which represent the types of things students often do on bioinformatics projects.

Understanding differential splicing in the developing chicken embryo using bioinformatics: We had
developed a database of 330,000 chicken cDNAs and ESTs (http://www.chick.umist.ac.uk), used by
researchers all over the world, derived from lots of different tissues including many chick embryos. In
this project, we mapped selected ESTs from different tissues back to the chicken genome, to see if
there are different alternative spliced genes that are represented in the chicken. Specifically, we were
able to identity genes involved in tissue-specific developmental roles, and then identify certain
spliced products favoured in some tissues over others. The project involved running lots of
bioinformatics tools and using the Ensembl genome browser web site, and required good data
handling skills will be essential. The literature review covered alternative gene splicing, tissue specific
gene expression and the development of vertebrates. The student involved got themselves on a
research paper (Tang H, Heeley T, Morlec R, Hubbard SJ. Characterising alternate splicing and tissue
specific expression in the chicken from ESTs. Cytogenet Genome Res. 2007; 117: 268-77).

Hunting for a “missing” gene in a genome: Sometimes a function is known to be present or


expected to be found in an organism, but has not yet been identified in the genome. This might be
because it is a remote (distant) homology requiring more than a simple BLAST search, or it might be
because another gene takes its place, or because it is genuinely missing. Some projects seeks to
address these types of questions, such as trying to identify a missing enzyme in the folate pathway in
the malarial parasite P. falciparum, or trying to identify a “missing” caspase in the mouse genome.
This often involves more than just simple BLAST searching, and gives students exposure to more
advanced tools and methods.
Understanding the functional effects of mutations in relation to disease. Often a number of point
mutations, insertions or deletions are known to correlate with disease. These may be in one gene, a
range of genes or in non-protein-coding regions. However, in many cases the molecular mechanism
of disease is unknown, Bioinformatics methods can be used to understand how these mutations can
lead to disease states. Methods that can be used include analysis of protein structures, including
molecular modelling of both wild-type and mutant forms of proteins, analysis of structural stability,
identification of functional sites, and analysis of promoter regions. An example of this type of
analysis can be found in Briggs et al 2011, Nature Genetics, Pubmed ID 21217755.

Studying the dynamics of transposable elements in eukaryotic genomes: Transposable elements


(TEs) are mobile, repetitive DNA sequences that are among the most dynamic yet least understood
components of eukaryotic genomes. Advances in genome sequencing now provide an unparalleled
opportunity to study the impact of TEs on eukaryotic genome organisation and evolution. The aim of
this project will be to use genomic data from closely related strains and species to identify and
analyse the genomic distribution and sequence evolution of TEs. This project will involve the use of
web-based genome browsers and phylogenetic analysis tools, but will require no background in
programming or genomics.

Encoding specificity in sets of interactions involving paralogues: Within the inter-connects of


biochemical pathways lie many examples of repeated interactions, performing similar functions, but
with different specificity. One example is the hetero-dimerisation of leucine zipper transcription
factors, where protein interaction specificity is coupled to transcriptional regulation. This example
demonstrates our hypothesis, which is that such systems will normally fit together with a common
shaped interface, but add specificity with variation of charge interactions around this shape. In
biochemical terms, such a mechanism allows would-be interacting proteins to ‘speed-date’ at longer
range (with charge interactions), before committing to a functional complex. This project will test
the hypothesis computationally for several paralogue systems, using existing (3D structure-based)
software that deconstructs the various interactions present at interfaces. Results will be considered
in terms of evolutionary constraints on the interfaces, bringing in phylogenetic analysis, and will
consider the implications for design of specificity in synthetic biology.

Integrating proteomics data with genome annotation: genome sequences require annotating.
Although we can generate them at amazing rates, we need to know amongst all the ACGTs where the
genes are, where the exons are, and in particular which ones code for protein. One way to do this is
to use mass spectrometry data to “map back” to the genome sequence to either validate gene
predictions, suggest novel gene structure, or even find completely new genes. This process is termed
“proteogenomics” and we have a run a few projects on these lines, not least for the whipwoem
Trichuris muris which is the mouse version of a human nematode parasite which infects millions of
people worldwide. The project involves mapping the mass spec data on to genome/transcriptome
derived databases to reveal the underlying gene structure.

You might also like