Mic 307 2021 A

MICROBIAL TAXONOMY (MIC 307) [A] O.S.
Obayori
1. CONCEPTUAL CLARIFICATIONS
i. Taxonomy- the science of classification of organisms or the science of ordering

of taxa in a hierarchical classification system.
OR
A formal system for organizing, classifying and naming living things
It has three component parts, namely: Classification, Nomenclature and

Identification
ii. Classification – ordering of organisms into groups based on their relationship

OR
A coherent scheme by which a collection of organisms is arranged so as to

reflect the relationships between individuals and groups (Atlas, 1995)
iii. Nomenclature – part of taxonomy that deals with assigning names to the various
taxonomic rankings or taxa of an organism
iv. Identification – process of discovering and recording the trait of an organism so

that it can be assigned to the proper group and given the correct name based on an
established taxonomic system
Importance of Taxonomy
i. Helps to organise large amount of knowledge

ii. Places organisms into meaningful useful groups with precise names,
thus facilitating communication and avoiding confusion
iii. Allows scientists to make predictions and frame hypothesis about
organisms
iv. Enables accurate classification of organisms
___________________________________________________________________________
i. Binomial Nomenclature – Introduced by Carl von Linn/ Linnaeus

- naming system according to which an organism is assigned a scientific name
composed of two parts: The Genus or Generic name and the Specific name
e.g.
- In this system, the name is written in italics (in printed materials) or

underlined when typed using a typewriter or writing in longhand
- The first letter of the generic name is in upper case while the others are in
lower case
- The specific or species epithet is in lower case all through
1
Homo sapiens
Escherichia coli
Pseudomonas aeruginosa
Lysinibacillus fusiformis
ii. Hierarchical classification – Introduced by Carl von Linn/ Linnaeus

- in this system organisms are organized into a
hierarchical order on the basis of similarities
- the higher level containing more members and the
lowest level being the species in which there is only
one type or one species
- each level is called a TAXON or TAXONOMIC
RANK
The Ranks in the Linnean system are as follows:
Kingdom
Phylum
Class
Order
Family
Genus
Species
Remember!
In the modern system:
i. Domain is the highest level, and there are 3 Domains
ii. Species is the basic taxonomic unit of a classification system
iii. Definitions of the Taxa:

• Division/Phylum- a group of related classes
• Class – a group of related orders
• Order – a group of related families
2
• Family – a group of related genera
• Genus – a group of related species
OR
A well-defined group of one or more species that is clearly separated from other
genera
• Species – a group of organisms of the same kind
OR
A collection of strains that share many stable properties and differ significantly from
other group of strains
OR
A collection of strains with similar G+C composition and at least 70% sequence
similarity.
• Strains – variants of the same species
iv. Systematics- is the study of organisms with the ultimate object of characterizing
and arranging them in an orderly manner.
OR
The science of relationship among organisms

OR
It can also be defined as the comparative study of the diversity of organism, with the
aim of establishing a logical system within which organisms can be described and
classified (Atlas, 1995)
3
2. APPROACHES TO CLASSIFICATION:
i. Phenetic: assesses similarities:

Groups do not necessarily reflect genetic similarity or evolutionary relatedness.
Instead, groups are based on convenient, observable characteristics
ii. Phylogenetic:
Emphasises evolutionary relationship and based on collection of evolutionary
evidence
Groups reflect genetic similarity and evolutionary relatedness
Characteristics used in classification
A. Traditional or classical characteristics

These fall into four categories, namely:
i. Morphological characteristics e.g. cell shape, size and arrangement; colonial
size, topology, colour; endospore; flagella/cilia/other means of motility;
spores; inclusion bodies; ultra-structure; staining properties.
ii. Metabolic/physiological characteristics e.g. Cell wall and membrane
components; carbon and nitrogen metabolism; electron acceptor; growth
temperature, pH range; salinity requirements and tolerance; photosynthetic
pigment; secondary metabolites; metabolic inhibitors.
iii. Ecological characteristics e.g. habitat; interactions; life cycle; growth
requirements
iv. Genetic characteristics e.g. chromosomal gene transfer; plasmids
B. Molecular characteristics
i. Percentage G+C content
ii. Nucleic acid sequence
iii. Nucleic acid hybridization
iv. Protein comparison
16S RIBOSOMAL RNA (rRNA) AND PHYLOGENY

Why has the 16S rRNA gene remained very useful as a phylogenetic marker?
i. The function of ribosomes has not changed for 3.8 billion years
ii. 16S rRNA genes are universally present among all cellular form
iii. The size of 1540 nucleotides makes them easy to analyse
iv. The primary structure in an alternating sequence of invariant DNA, more or
less, conserved to highly variable regions
v. Lateral gene transfer is either totally absent or exceedingly rare
(Philp et al., 2005)
4
Limitations 16S rRNA gene: Sequence similarities of 16S rRNA gene may not
reflect relatedness
Other marker gaining relevance include: 23S rRNA gene, gyrB, rpoB, dnaK,
dsrAB and 16S-23S rDNA ISR
iii. Numerical Taxonomy

In classical or conventional taxonomy:
• Several features are examined

• Selected features are used for grouping. These are considered as most important
• This approach emphasises branching points between groups that are permitted to
represent fundamental differences between taxa
• It is usually represented as a tree with branch points and the taxa at the end of the
branches
Highlights on numerical taxonomy
• Does not emphasise point of branching

• Uses overall degree of similarity between organism to establish a taxon
• Equal weight is given to all the characteristics
• The greater the content of information in the taxa of a classification and the more
characters on which it is based, the better a given classification will be.
• Overall similarity between any two entities is a function of their individual
similarities in each of the many characters in which they are being compared.
• Taxonomy is viewed and practiced as an empirical science.
• Classifications are based on phenetic similarity.
Advantages
• Numerical taxonomy has the power to integrate data from diverse sources, such as
morphology, physiology, chemistry, molecular etc.
• automation makes for greater efficiency
• Being quantitative, the methods provide greater discrimination along the spectrum of
taxonomic differences and are more sensitive in delimiting taxa.
• Numerical taxonomy has led to the reinterpretation of a number of biological concepts
and to the posing of new biological and evolutionary questions.
5
Association Coefficient is used to estimate the degree of similarity between taxonomic units
Simple Matching Coefficient (Ssm)
(++) + (--)
Ssm = ----------------------------------
(++) + (--) + (+-) +(-+)
++ = Positive matches
-- = Negative matches
+ -; + = Mismatches
Jaccard Coefficient (Sj)
(++)
Sj = ---------------------------
(+ +) + (+-) +(-+)
• The use of (--) in Ssm makes organisms that are not similar appear similar
• The Jaccard system eliminates this
A 1.0
B 0.92 1.0
C 0.80 0.72 1.0
D 0.22 0.32 0.28 1.0
E 0.46 0.43 0.47 0.30 1.0
F 0.35 0.45 0.46 0.32 0.32 1.0
A B C D E F
Figure1: Similarity matrix

➢ The results (the association coefficients) obtained from a numerical taxonomy are
represented by similarity matrix in the form of a dendrogram.
➢ Statistical analysis called cluster analysis is used. Usually it is computer assisted.
6
iv. Polyphasic Approach:
Collectively the genotypic, chemotaxonomic and phenotypic methods for determining
taxonomic position of microbes constitutes what is known as polyphasic approach for
bacterial systematics (Prakash et al., 2007)
OR
The use of all possible data, viz., genotypic and phenotypic, to determining phylogeny. The
data used depends on desire.
Techniques and markers used in modern polyphasic approaches for
resolving bacterial hierarchy
1. Chemotaxonomic markers - polyamines, quinones, polar lipids, fatty acids
Up to genus level
2. DNA – DNA hybridization
%G+C
tDNA– PCR
Up to species level
3. DNA probes
DNA sequencing
Up to strain level
4. RNA gene sequencing
Up to species level
5. Cell wall structure – teichoic acids, peptidoglycans
Up to genus level
6. Restriction Fragment Length Polymorphism (RFLP)
Pulse Field Gel Electrophoresis PFGE
Ribotyping
DNA amplification
Phage and Bacteriocin typing
Serological techniques
Up to strain level
___________________________________________________________________
(Prakash et al., 2007)
7
3. PHYLOGENETIC TREES
What is phylogenetic tree?
A phylogenetic tree is an estimate of the relationships among taxa (or sequences) and their
hypothetical common ancestors (Hall, 2013).
OR
Phylogenetic tree is a statement about the evolutionary relationship between a set of
homologous characters of organisms.
OR
A tree-like structure that shows the evolutionary relationships among a set of organisms or
biomolecules
How is Phylogenetic tree different from other biological trees?
Phylogenetic trees represent evolutionary relationships among species. It is

usually constructed by comparing 16s rRNA genes.
Taxonomy trees: show hierarchies of taxonomic ranks from kingdom to
species (refer to NCBI).
While Gene trees: represent evolutionary relationship of particular
biomolecules (e.g. genes or proteins) among species
8
Figure 2: Parts of a phylogenetic tree
•Node: a branch point in a tree (a presumed ancestral OTU)
•Branch: defines the relationship between the taxa in terms of descent and ancestry
•Topology: the branching patterns of the tree
•Branch length (scaled trees only): represents the number of changes that have
occurred in the branch
•Root: the common ancestor of all taxa
•Clade: a group of two or more taxa or DNA sequences that includes both their
common ancestor and their entire descendants
9
Figure 3: Equivalent trees
Out group is a taxon outside the groups of interest. Out group is useful in constructing
evolutionary tree.
Figure 4: What the nodes represent
Rooted tree or Cladogram

It is phylogenetic tree in which all the objects on it share a known common ancestor
Unrooted tree or Phenogram.
All the objects are related descendents but the common root or ancestor cannot be specified.
10
A B C
Unrooted
Rooted
Figure 5: Rooted and Unrooted trees
A Clade–is a group of ancestors and all descendants

Cladogram– branch length does not represent evolutionary time.
Phylogram– branch length represents evolutionary changes
Ultrametric tree - branch length represents time of evolutionary changes
In a Phylogenetic tree:
Nodes actually represent a speciation event

Branch connects the nodes
External branches are branches that end with a tip
Internal branches are branches that do not end with a tip
Terminal nodes operational taxonomic units OTU
Internal nodes hypothetical taxonomic unit HTU
11
4. BUILDING PHYLOGENETIC TREES
The data used for building phylogenetic tree can either be
1. Molecular data i.e gene sequence or protein sequence. Or distance data -
morphological data - amino acid, nucleotide substitution, phenotypic features
The methods for building phylogenetic trees can be distinguished on the basis of
Distance based method (phenetic)
Character based method (cladistic)
Distance based methods: are more rapid and computationally intensive. There is loss
of information because characters are discarded once the matrix is discerned.
Character based methods: make use of all known evolutionary changes to

determine the most likely ancestral relation. There is no loss of information. It is time
consuming.
Discrete characters used in distance based methods include:

• Morphological data
• Protein data (amino acids)
• DNA data (four nucleotides; G + C %
It is assumed that all characters are independent of each other
How is Phylogenetic tree created from molecular data?

Sequence could be of RNA, DNA or proteins
• Align sequences
• Determine number of positions that are different
• Express difference in form of distance.
• Use the measure of difference to create tree.
RNA- usually used to create trees showing broad relationships

DNA - effective for comparing organisms at species and molecular level
Protein- sequence alignment is easier;
-less affected by organism-specific differences in G+C content
12
Properties of a good tree building methods
Efficiency – the faster, the more efficient.
Power - a powerful method produces a reasonable result with limited data.
Consistency -always converge on the right answer given enough data
Robustness- violation of the method’s assumptions may not necessarily result in poor
phylogenies
Falsificability–a good method should be able to reveal when its assumptions are violated
.
Building a phylogenetic tree requires four distinct steps:
Identify and acquire a set of homologous DNA or protein sequences
Align those sequences
Estimate a tree from the aligned sequences
Present that tree in such a way as to clearly convey the relevant information to others.
Adapted from: Hall, 2013
13
Phylogenetic tree
Building methods
Distance based Character based

(Algorithmic method) (Tree searching method)
Neighbour Unweighted Pair Maximum

Joining Group Method Parsimony Maximum
Weighted and using Likelihood
Neighbour Arithmetic mean
Joining (UPGMA)
Weighted Pair
Group Method
and using
Arithmetic Mean
(WPGMA)
Note: the distance methods are faster
Figure 6: Phylogenetic tree building methods
UPGMA
This is the simplest tree building method. Strictly speaking, the algorithm is phenetic.
It is a sequential clustering algorithm. The clustering procedure:
• It assumes that initially each species is a cluster on its own.
• Join closest 2 clusters and recalculate distance of the gained pair by taking average.
• Repeat this process until all species are connected in a single cluster.
.
Merits -output a rooted tree and it is very fast
Demerit - it assumes a constant rate of evolution of the sequences in all branches of the tree
NEIGHBOUR JOINING (NJ)

The Neighbour Joining Method is the most widely used distance based method.
It applies the minimum evolution principle at each step in the clustering process.
Merit- does not assume that the rate of evolution is the same in all branches of the tree.
14
MAXIMUM PARSIMONY (MP)
The method involves computing the minimum number of substitutions over all sites for
each topology
Merit– it is good with very distantly related sequences
Demerit – it is time consuming
MAXIMUM LIKELIHOOD (ML)

This method has some basic assumptions:
i. that different characters evolve independently
ii. that after species have diverged, they also evolved independently.
In this method, the likelihood of observing a given set of sequence data for a specific
substitution model is maximized for each topology and the topology that gives the
highest maximum likelihood is chosen as the ML tree.
Merit- The Maximum Likelihood is the most consistent method
- The method corrects for multiple mutational events at the same site. This makes it
suitable for reconstructing the relationships between sequences that have been
separated for a long time or are evolving rapidly.
Demerit – it is more time consuming than the MP method
See online materials for Phylogenetic tree building algorithms using UGMA and NJ
Good is, Bioinformatics tools and software

have made all of these easy to perform!
15
5. BIOINFORMATICS
Bioinformatics is a multidisciplinary field which combines statistical method and computer

software tools for understanding of biological data. The field engages biologists, computer
scientists, mathematicians, physicists and engineers.
In bioinformatics, the kinds of data biologists play include: sequence data like RNA, DNA,
Protein; proteins; metabolites and metabolic pathways; enzymes; taxonomic information etc
Highlights on Bioinformatics
✓ Is essentially concerned with organizing data in databases such that researchers can
access current data and also submit new data
✓ It is concerned with building of tools, and resources to analyze data
✓ It helps to interpret data in a biologically useful manner such that there is a global
analysis of data to reveal common principles that apply across common system
(Benedik, 2010; S3)
International Nucleotide Sequence Database Collaboration
DNA Databank of Japan -DDBJ

GeneBank at NCBI
European Nucleotide Archives-ENA
Figure 6: Nucleotide sequences database collaboration
NCBI – National Centre for Biotechnology Information

EBI – European Bioinformatics Institute
EMBL- European Molecular Biology Laboratory hosts ENA
Genbank- National Institute for Health’s genetic sequence database
16
Entrez- Molecular Biology Database System. It provides integrated access to nucleotide and
protein sequence data, gene centered and genomic mapping information, 3D, PubMed
MEDLINE
Refseq (NCBI) - Reference Sequence Database, is an open access, annotated and curated
collection of publicly available sequences (DNA, RNA and their protein).
www.ncbi.nlm.nihdi.govEntrez/
www.ncbi.nlm.nlh.gov/
BLAST – Basic Local Alignment Search Tool - Explain BLASTn and BLASTp
MEGA - Molecular Evolutionary Genetic Analysis

Unipro UGENE
17

Mic 307 2021 A

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mic 307 2021 A

Uploaded by

Copyright:

Available Formats

MICROBIAL TAXONOMY (MIC 307) [A] O.S.

i. Taxonomy- the science of classification of organisms or the science of ordering

It has three component parts, namely: Classification, Nomenclature and

ii. Classification – ordering of organisms into groups based on their relationship

A coherent scheme by which a collection of organisms is arranged so as to

iv. Identification – process of discovering and recording the trait of an organism so

i. Helps to organise large amount of knowledge

i. Binomial Nomenclature – Introduced by Carl von Linn/ Linnaeus

- In this system, the name is written in italics (in printed materials) or

ii. Hierarchical classification – Introduced by Carl von Linn/ Linnaeus

The Ranks in the Linnean system are as follows:

iii. Definitions of the Taxa:

The science of relationship among organisms

i. Phenetic: assesses similarities:

Characteristics used in classification

A. Traditional or classical characteristics

16S RIBOSOMAL RNA (rRNA) AND PHYLOGENY

iii. Numerical Taxonomy

• Several features are examined

Highlights on numerical taxonomy

• Does not emphasise point of branching

Jaccard Coefficient (Sj)

Figure1: Similarity matrix

How is Phylogenetic tree different from other biological trees?

Phylogenetic trees represent evolutionary relationships among species. It is

•Node: a branch point in a tree (a presumed ancestral OTU)

•Topology: the branching patterns of the tree

•Root: the common ancestor of all taxa

Figure 4: What the nodes represent

Rooted tree or Cladogram

Figure 5: Rooted and Unrooted trees

A Clade–is a group of ancestors and all descendants

Nodes actually represent a speciation event

Character based methods: make use of all known evolutionary changes to

Discrete characters used in distance based methods include:

How is Phylogenetic tree created from molecular data?

RNA- usually used to create trees showing broad relationships

Building a phylogenetic tree requires four distinct steps:

Identify and acquire a set of homologous DNA or protein sequences

Align those sequences

Estimate a tree from the aligned sequences

Adapted from: Hall, 2013

Distance based Character based

Neighbour Unweighted Pair Maximum

Note: the distance methods are faster

Figure 6: Phylogenetic tree building methods

NEIGHBOUR JOINING (NJ)

MAXIMUM LIKELIHOOD (ML)

Merit- The Maximum Likelihood is the most consistent method

Demerit – it is more time consuming than the MP method

Good is, Bioinformatics tools and software

Bioinformatics is a multidisciplinary field which combines statistical method and computer

International Nucleotide Sequence Database Collaboration

DNA Databank of Japan -DDBJ

European Nucleotide Archives-ENA

Figure 6: Nucleotide sequences database collaboration

NCBI – National Centre for Biotechnology Information

MEGA - Molecular Evolutionary Genetic Analysis

You might also like