You are on page 1of 13

BACTOSOM-Viewer: a High-Throughput Comparison and Mapping

Tool for Bacterial Proteomes Using Self Organizing Map

Bilal Tamimi*, Sami Salamin*, Hashem


Tamimi, Yaqoub Ashhab
Biotechnology Research Center, Palestine
Polytechnic University, P.O-Box 198, Hebron,
Palestine
* Equal contribution
:Introduction

 Bioinformatics-based approaches have proven powerful and cost-


effective means to study and represent huge genomic and proteomic
data from bacterial genomic projects.

 Most the tools usually perform similarity comparison and do not care of
the differences between genomes.

 There is a need to develop genomic and proteomic tools as well that


shed light on the similarities and differences alike.

 this can help in assigning functions for many hypothetical proteins that
are still annotated as proteins with unknown functions in biological
databases.
:BACTOSOM-Viewer

 An intelligent bioinformatics tool

 Can perform a high-throughput comparison and


mapping to reveal similarities and differences among
bacterial proteomes

 Compare any given bacterial proteome with a set of


essential proteins known here as a core proteome
Methodology
Self-organizing map (SOM) :

• is a type of artificial neural network that is trained


using unsupervised learning to produce a low-
dimensional (typically two-dimensional),
discretized representation of the input space of the
training samples, called a maps (wikipidia).
• Like most artificial neural networks, SOMs operate in
two modes: training and mapping. Training builds the
map using input examples. It is a competitive
process, also called vector quantization. Mapping
automatically classifies a new input vector.
Land Mark Proteins:

 represent the minimal essential genes to


constitute a core bacterial genome.

 since Mycoplasma genitalium is the smallest


known bacteria its 474 gene coding proteins
genes were considered as land mark genes.
Used Data:

 Testing data: Protein sequences Bacteria Name

2059 Brucella melitensis


– The full protein sequences of 6
2029 Brucella abortus
bacteria that represent three
different environmental niche were 3944 Mycobacterium bovis
used to test the performance of 763 Mycoplasma gallisepticum
our system.
1996 Streptococcus agalactiae

– The protein sequences for each 1709 Streptococcus thermophilus


bacterium were downloaded from
the GenBank as FASTA format
with full annotations
:Feature extraction

 since the gene is a long sequences, we think that we


can reduce the sequence length by select the
following:

– selecting gene coding protein instead of the whole genes


– selecting protein sequences rather than gene sequences

• The land mark genes will be used as a reference


point and measures the distances between each
protein found in the bacteria
:Feature extraction

• The feature table was extracted based on pairwise sequence


alignment score between each protein (from the tested
bacteria) and the land mark proteins using Smith-Waterman
algorithm .
• We used the Smith-Waterman algorithm for local alignment:
• algorithm for performing local sequence alignment;
• for determining similar regions between two nucleotide or protein
sequences.
• Instead of looking at the total sequence, the Smith-Waterman
algorithm compares segments of all possible lengths and optimizes
the similarity measure.
:Example

 For example if a bacterium has 600 proteins,


then, the feature vector will have a length of
474 values for each protein, which means a
total of 284400 values (600 * 474).
Protein A: 12 10 30 50 ……
:Steps of work

• downloaded the proteins sequences from the NCBI


website and we used MATLAB to analyzed the data

• starts the alignment of land mark genes with the


samples set to get the features table for each
bacteria as a separated file

• The feature file will contain a feature vector with


length of 474 scoring value for each protein.
Results:
 The present developed system is a desktop
application that uses MATLAB built-in tools and can
be easily used by biologists.

 BACTOSOM-Viewer is a Graphical User Interface .

 where user can select two species from a long list of


available bacteria so as to plot their comparative
proteomic as SOM graph.
 The tool was verified on the six selected bacteria.
Results:
 for testing we used one bacteria
(Brucella abortus) to be analyzed
with Brucella melitensis .

 BACTOSOM-Viewer clusters each


group of similar proteins in the same
nodes. The clustering was in light
with the gene ontology functional
categorizations that are available in
GenBank. The tool has the capability
to identify all species-specific
proteins as isolated small dots and
their distance from the known core
proteins.

You might also like