You are on page 1of 17

MICROBIAL TAXONOMY (MIC 307) [A] O.S.

Obayori
1. CONCEPTUAL CLARIFICATIONS

i. Taxonomy- the science of classification of organisms or the science of ordering


of taxa in a hierarchical classification system.
OR
A formal system for organizing, classifying and naming living things

It has three component parts, namely: Classification, Nomenclature and


Identification

ii. Classification – ordering of organisms into groups based on their relationship


OR

A coherent scheme by which a collection of organisms is arranged so as to


reflect the relationships between individuals and groups (Atlas, 1995)

iii. Nomenclature – part of taxonomy that deals with assigning names to the various
taxonomic rankings or taxa of an organism

iv. Identification – process of discovering and recording the trait of an organism so


that it can be assigned to the proper group and given the correct name based on an
established taxonomic system

Importance of Taxonomy

i. Helps to organise large amount of knowledge


ii. Places organisms into meaningful useful groups with precise names,
thus facilitating communication and avoiding confusion
iii. Allows scientists to make predictions and frame hypothesis about
organisms
iv. Enables accurate classification of organisms

___________________________________________________________________________

i. Binomial Nomenclature – Introduced by Carl von Linn/ Linnaeus


- naming system according to which an organism is assigned a scientific name
composed of two parts: The Genus or Generic name and the Specific name
e.g.

- In this system, the name is written in italics (in printed materials) or


underlined when typed using a typewriter or writing in longhand
- The first letter of the generic name is in upper case while the others are in
lower case
- The specific or species epithet is in lower case all through

1
Homo sapiens

Escherichia coli

Pseudomonas aeruginosa

Lysinibacillus fusiformis

ii. Hierarchical classification – Introduced by Carl von Linn/ Linnaeus


- in this system organisms are organized into a
hierarchical order on the basis of similarities
- the higher level containing more members and the
lowest level being the species in which there is only
one type or one species
- each level is called a TAXON or TAXONOMIC
RANK

The Ranks in the Linnean system are as follows:

Kingdom
Phylum
Class
Order
Family
Genus
Species
Remember!
In the modern system:
i. Domain is the highest level, and there are 3 Domains
ii. Species is the basic taxonomic unit of a classification system

iii. Definitions of the Taxa:


• Division/Phylum- a group of related classes
• Class – a group of related orders
• Order – a group of related families

2
• Family – a group of related genera
• Genus – a group of related species
OR
A well-defined group of one or more species that is clearly separated from other
genera
• Species – a group of organisms of the same kind
OR
A collection of strains that share many stable properties and differ significantly from
other group of strains
OR
A collection of strains with similar G+C composition and at least 70% sequence
similarity.
• Strains – variants of the same species

iv. Systematics- is the study of organisms with the ultimate object of characterizing
and arranging them in an orderly manner.
OR

The science of relationship among organisms


OR

It can also be defined as the comparative study of the diversity of organism, with the
aim of establishing a logical system within which organisms can be described and
classified (Atlas, 1995)

3
2. APPROACHES TO CLASSIFICATION:

i. Phenetic: assesses similarities:


Groups do not necessarily reflect genetic similarity or evolutionary relatedness.
Instead, groups are based on convenient, observable characteristics

ii. Phylogenetic:
Emphasises evolutionary relationship and based on collection of evolutionary
evidence
Groups reflect genetic similarity and evolutionary relatedness

Characteristics used in classification

A. Traditional or classical characteristics


These fall into four categories, namely:
i. Morphological characteristics e.g. cell shape, size and arrangement; colonial
size, topology, colour; endospore; flagella/cilia/other means of motility;
spores; inclusion bodies; ultra-structure; staining properties.
ii. Metabolic/physiological characteristics e.g. Cell wall and membrane
components; carbon and nitrogen metabolism; electron acceptor; growth
temperature, pH range; salinity requirements and tolerance; photosynthetic
pigment; secondary metabolites; metabolic inhibitors.
iii. Ecological characteristics e.g. habitat; interactions; life cycle; growth
requirements
iv. Genetic characteristics e.g. chromosomal gene transfer; plasmids

B. Molecular characteristics
i. Percentage G+C content
ii. Nucleic acid sequence
iii. Nucleic acid hybridization
iv. Protein comparison

16S RIBOSOMAL RNA (rRNA) AND PHYLOGENY


Why has the 16S rRNA gene remained very useful as a phylogenetic marker?

i. The function of ribosomes has not changed for 3.8 billion years
ii. 16S rRNA genes are universally present among all cellular form
iii. The size of 1540 nucleotides makes them easy to analyse
iv. The primary structure in an alternating sequence of invariant DNA, more or
less, conserved to highly variable regions
v. Lateral gene transfer is either totally absent or exceedingly rare
(Philp et al., 2005)

4
Limitations 16S rRNA gene: Sequence similarities of 16S rRNA gene may not
reflect relatedness

Other marker gaining relevance include: 23S rRNA gene, gyrB, rpoB, dnaK,
dsrAB and 16S-23S rDNA ISR

iii. Numerical Taxonomy


In classical or conventional taxonomy:

• Several features are examined


• Selected features are used for grouping. These are considered as most important
• This approach emphasises branching points between groups that are permitted to
represent fundamental differences between taxa
• It is usually represented as a tree with branch points and the taxa at the end of the
branches

Highlights on numerical taxonomy

• Does not emphasise point of branching


• Uses overall degree of similarity between organism to establish a taxon
• Equal weight is given to all the characteristics
• The greater the content of information in the taxa of a classification and the more
characters on which it is based, the better a given classification will be.
• Overall similarity between any two entities is a function of their individual
similarities in each of the many characters in which they are being compared.
• Taxonomy is viewed and practiced as an empirical science.
• Classifications are based on phenetic similarity.

Advantages

• Numerical taxonomy has the power to integrate data from diverse sources, such as
morphology, physiology, chemistry, molecular etc.
• automation makes for greater efficiency
• Being quantitative, the methods provide greater discrimination along the spectrum of
taxonomic differences and are more sensitive in delimiting taxa.
• Numerical taxonomy has led to the reinterpretation of a number of biological concepts
and to the posing of new biological and evolutionary questions.

5
Association Coefficient is used to estimate the degree of similarity between taxonomic units
Simple Matching Coefficient (Ssm)
(++) + (--)
Ssm = ----------------------------------
(++) + (--) + (+-) +(-+)

++ = Positive matches
-- = Negative matches
+ -; + = Mismatches

Jaccard Coefficient (Sj)

(++)
Sj = ---------------------------
(+ +) + (+-) +(-+)

• The use of (--) in Ssm makes organisms that are not similar appear similar
• The Jaccard system eliminates this

A 1.0
B 0.92 1.0
C 0.80 0.72 1.0
D 0.22 0.32 0.28 1.0
E 0.46 0.43 0.47 0.30 1.0
F 0.35 0.45 0.46 0.32 0.32 1.0
A B C D E F

Figure1: Similarity matrix


➢ The results (the association coefficients) obtained from a numerical taxonomy are
represented by similarity matrix in the form of a dendrogram.
➢ Statistical analysis called cluster analysis is used. Usually it is computer assisted.

6
iv. Polyphasic Approach:
Collectively the genotypic, chemotaxonomic and phenotypic methods for determining
taxonomic position of microbes constitutes what is known as polyphasic approach for
bacterial systematics (Prakash et al., 2007)

OR

The use of all possible data, viz., genotypic and phenotypic, to determining phylogeny. The
data used depends on desire.
Techniques and markers used in modern polyphasic approaches for
resolving bacterial hierarchy
1. Chemotaxonomic markers - polyamines, quinones, polar lipids, fatty acids

Up to genus level
2. DNA – DNA hybridization
%G+C
tDNA– PCR

Up to species level
3. DNA probes
DNA sequencing
Up to strain level
4. RNA gene sequencing
Up to species level
5. Cell wall structure – teichoic acids, peptidoglycans
Up to genus level
6. Restriction Fragment Length Polymorphism (RFLP)
Pulse Field Gel Electrophoresis PFGE
Ribotyping
DNA amplification
Phage and Bacteriocin typing
Serological techniques
Up to strain level
___________________________________________________________________
(Prakash et al., 2007)

7
3. PHYLOGENETIC TREES
What is phylogenetic tree?
A phylogenetic tree is an estimate of the relationships among taxa (or sequences) and their
hypothetical common ancestors (Hall, 2013).

OR
Phylogenetic tree is a statement about the evolutionary relationship between a set of
homologous characters of organisms.

OR
A tree-like structure that shows the evolutionary relationships among a set of organisms or
biomolecules

How is Phylogenetic tree different from other biological trees?

Phylogenetic trees represent evolutionary relationships among species. It is


usually constructed by comparing 16s rRNA genes.
Taxonomy trees: show hierarchies of taxonomic ranks from kingdom to
species (refer to NCBI).
While Gene trees: represent evolutionary relationship of particular
biomolecules (e.g. genes or proteins) among species

8
Figure 2: Parts of a phylogenetic tree

•Node: a branch point in a tree (a presumed ancestral OTU)

•Branch: defines the relationship between the taxa in terms of descent and ancestry

•Topology: the branching patterns of the tree

•Branch length (scaled trees only): represents the number of changes that have
occurred in the branch

•Root: the common ancestor of all taxa

•Clade: a group of two or more taxa or DNA sequences that includes both their
common ancestor and their entire descendants

9
Figure 3: Equivalent trees
Out group is a taxon outside the groups of interest. Out group is useful in constructing
evolutionary tree.

Figure 4: What the nodes represent

Rooted tree or Cladogram


It is phylogenetic tree in which all the objects on it share a known common ancestor
Unrooted tree or Phenogram.
All the objects are related descendents but the common root or ancestor cannot be specified.

10
A B C

Unrooted
Rooted

Figure 5: Rooted and Unrooted trees

A Clade–is a group of ancestors and all descendants


Cladogram– branch length does not represent evolutionary time.
Phylogram– branch length represents evolutionary changes
Ultrametric tree - branch length represents time of evolutionary changes

In a Phylogenetic tree:

Nodes actually represent a speciation event


Branch connects the nodes
External branches are branches that end with a tip
Internal branches are branches that do not end with a tip
Terminal nodes operational taxonomic units OTU
Internal nodes hypothetical taxonomic unit HTU

11
4. BUILDING PHYLOGENETIC TREES
The data used for building phylogenetic tree can either be
1. Molecular data i.e gene sequence or protein sequence. Or distance data -
morphological data - amino acid, nucleotide substitution, phenotypic features
The methods for building phylogenetic trees can be distinguished on the basis of
Distance based method (phenetic)
Character based method (cladistic)

Distance based methods: are more rapid and computationally intensive. There is loss
of information because characters are discarded once the matrix is discerned.

Character based methods: make use of all known evolutionary changes to


determine the most likely ancestral relation. There is no loss of information. It is time
consuming.

Discrete characters used in distance based methods include:


• Morphological data
• Protein data (amino acids)
• DNA data (four nucleotides; G + C %
It is assumed that all characters are independent of each other

How is Phylogenetic tree created from molecular data?


Sequence could be of RNA, DNA or proteins
• Align sequences
• Determine number of positions that are different
• Express difference in form of distance.
• Use the measure of difference to create tree.

RNA- usually used to create trees showing broad relationships


DNA - effective for comparing organisms at species and molecular level
Protein- sequence alignment is easier;
-less affected by organism-specific differences in G+C content

12
Properties of a good tree building methods
Efficiency – the faster, the more efficient.
Power - a powerful method produces a reasonable result with limited data.
Consistency -always converge on the right answer given enough data
Robustness- violation of the method’s assumptions may not necessarily result in poor
phylogenies
Falsificability–a good method should be able to reveal when its assumptions are violated
.

Building a phylogenetic tree requires four distinct steps:

Identify and acquire a set of homologous DNA or protein sequences

Align those sequences

Estimate a tree from the aligned sequences

Present that tree in such a way as to clearly convey the relevant information to others.

Adapted from: Hall, 2013

13
Phylogenetic tree
Building methods

Distance based Character based


(Algorithmic method) (Tree searching method)

Neighbour Unweighted Pair Maximum


Joining Group Method Parsimony Maximum
Weighted and using Likelihood
Neighbour Arithmetic mean
Joining (UPGMA)

Weighted Pair
Group Method
and using
Arithmetic Mean
(WPGMA)

Note: the distance methods are faster

Figure 6: Phylogenetic tree building methods

UPGMA
This is the simplest tree building method. Strictly speaking, the algorithm is phenetic.
It is a sequential clustering algorithm. The clustering procedure:
• It assumes that initially each species is a cluster on its own.
• Join closest 2 clusters and recalculate distance of the gained pair by taking average.
• Repeat this process until all species are connected in a single cluster.
.
Merits -output a rooted tree and it is very fast

Demerit - it assumes a constant rate of evolution of the sequences in all branches of the tree

NEIGHBOUR JOINING (NJ)


The Neighbour Joining Method is the most widely used distance based method.
It applies the minimum evolution principle at each step in the clustering process.
Merit- does not assume that the rate of evolution is the same in all branches of the tree.

14
MAXIMUM PARSIMONY (MP)
The method involves computing the minimum number of substitutions over all sites for
each topology
Merit– it is good with very distantly related sequences
Demerit – it is time consuming

MAXIMUM LIKELIHOOD (ML)


This method has some basic assumptions:
i. that different characters evolve independently
ii. that after species have diverged, they also evolved independently.

In this method, the likelihood of observing a given set of sequence data for a specific
substitution model is maximized for each topology and the topology that gives the
highest maximum likelihood is chosen as the ML tree.

Merit- The Maximum Likelihood is the most consistent method

- The method corrects for multiple mutational events at the same site. This makes it
suitable for reconstructing the relationships between sequences that have been
separated for a long time or are evolving rapidly.

Demerit – it is more time consuming than the MP method

See online materials for Phylogenetic tree building algorithms using UGMA and NJ

Good is, Bioinformatics tools and software


have made all of these easy to perform!

15
5. BIOINFORMATICS

Bioinformatics is a multidisciplinary field which combines statistical method and computer


software tools for understanding of biological data. The field engages biologists, computer
scientists, mathematicians, physicists and engineers.

In bioinformatics, the kinds of data biologists play include: sequence data like RNA, DNA,
Protein; proteins; metabolites and metabolic pathways; enzymes; taxonomic information etc

Highlights on Bioinformatics

✓ Is essentially concerned with organizing data in databases such that researchers can
access current data and also submit new data
✓ It is concerned with building of tools, and resources to analyze data
✓ It helps to interpret data in a biologically useful manner such that there is a global
analysis of data to reveal common principles that apply across common system
(Benedik, 2010; S3)

International Nucleotide Sequence Database Collaboration

DNA Databank of Japan -DDBJ


GeneBank at NCBI

European Nucleotide Archives-ENA

Figure 6: Nucleotide sequences database collaboration

NCBI – National Centre for Biotechnology Information


EBI – European Bioinformatics Institute
EMBL- European Molecular Biology Laboratory hosts ENA
Genbank- National Institute for Health’s genetic sequence database

16
Entrez- Molecular Biology Database System. It provides integrated access to nucleotide and
protein sequence data, gene centered and genomic mapping information, 3D, PubMed
MEDLINE
Refseq (NCBI) - Reference Sequence Database, is an open access, annotated and curated
collection of publicly available sequences (DNA, RNA and their protein).

www.ncbi.nlm.nihdi.govEntrez/
www.ncbi.nlm.nlh.gov/

BLAST – Basic Local Alignment Search Tool - Explain BLASTn and BLASTp

MEGA - Molecular Evolutionary Genetic Analysis


Unipro UGENE

17

You might also like