You are on page 1of 3

Final report

Use font size 12, single spaced lines. I will not be marking on english language or grammar, but scientific content and style,
and clarity of presentation of the results.

2 pages maximum for the Introduction, Methods and Results / Discussion

Introduction
Talk about MSH genes, what they do and why we want to investigate the numbers of genes present in different genomes.

Methods
Describe the methods that you used in a way that anyone can repeat. State what genome you are investigating, its Genbank
Accession number and its genome size in bp.
Describe :
1) how to identify the MSH genes from the annotation
2) how to identify the MSH genes using Blast
3) making an alignment of the genes
4) running ProtTest
5) making a tree, with human MSH genes included
6) how to visualize the tree

Results and Discussion


Describe what genes you have discovered and justify how you know that they are really MSH genes. What are the closest
human homologs of each gene ? How do you know this ? You can use the alignment and tree as justification. Talk about the
total number of MSH genes in your genome and what you think this means in terms of the complexity of the organism you
are investigating (eg. its genome size and general complexity). Discuss any problems with the analysis. You should talk a bit
about the biology of the organism you are looking at, and whether this affects the MSH genes. Be logical but don't be afraid
to propose original ideas.

Then on additional pages you should add :

References
Any references that you choose to include

Table 1
A table with the new genes that you have identified, its Genbank accession number and which chromosome it is on. Add
what is the closest human homolog eg. MSH2-6.

Gene Name / Description (if Genbank accession Chromosome number and Closest human homolog
available) number location
MSH1 DDB_G0275999 Q552L1 J3KL42

MSH2 DDB_G0275809 Q553L4 P43246

MSH3 DDB_G0281683 Q1ZXH0 P20585

MSH4 DDB_G0283957 Q54QB8 O15457

MSH5 DDB_G0284747 Q54P75 O43196

MSH6 DDB_G0268614 Q55GU9 P52701

Figure 1 Alignment of the new MSH genes with the human MSH genes
This should be in interleaved format, the sequences should be amino acids. Use a smaller font size 8 is ok !

Figure 2 A phylogenetic tree of the MSH genes with the human MSH genes
The labels on the tree should be informative and the tree should be compact.

Results from ProtTest


If you generated results from ProtTest, list the best model according to the AIC criteria.
3) making an alignment of the genes

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA,


RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences.[1] Aligned
sequences of nucleotide or amino acid residues are typically represented as rows
within a matrix. Gaps are inserted between the residues so that identical or similar
characters are aligned in successive columns.

4) running ProtTest
5) making a tree, with human MSH genes included
6) how to visualize the tree

MSH genes are MutS homologues that are members of a set of genes known as the mismatch repair
genes. MSH genes play a critical part on DNA repair because they encode for proteins that repair made
by DNA replication, they repair DNA damage. Their importance is evidenced by the fact that mutations
or alterations of those genes can result in a series of diseases like hereditary non-polyposis colon cancer
and sporadic cancer. For that reason, these genes are highly conserved. Between the MutS homologues
found in eukaryotes are MSH1, MSH2, MSH3, MSH4, MSH5, MSH6, MSH7 (plants) and MSH8
(Euglenozoa). MSH1 is involved in mitochondrial mismatch repair in fungi, MSH2-MSH6 and MSH2-
MSH3 known are responsible for mismatch recognition in eukaryotes. (Conservation and diversity of
MutS proteins Pawe Sachady) The purpose of this study was to find and investigate the different MutS
homologues in Dictyostelium discoideum, analyze their conservation and compare to the human
genome.

Advances in genomic sequencing mean that the number of species for


which complete genome sequences are available exceeds 1,200. To analyze
the distribution of different MutS homologues all the genomic sequences
assembled in the list of BLAST microbial genomes were examined. In
order to analyze the conservation and diversification of MutS homologues a
selection of representative MutS amino acid sequences were aligned. Every
endeavour was made to ensure that the selection should cover as broad
range of taxonomic groups as possible and include the organisms of
particular clinical, scientific and biotechnological importance, as well as
those whose MutS proteins have been already examined. The list of MutS
representative proteins included 316 different MutS amino acid sequences
from 169 species representing 34 classes of Bacteria and Archaea (Table 3,
supplemental data). In addition, a multiple sequence alignment analysis
was carried out for a group of MSH6 homologues in order to compare the
most conserved MutS and MSH6 amino acid residues.

You might also like