JH

Q1 A scientist wants to know structure of a protein for which there is no
structure available in PDB. Can bioinformatics help him to find the

structure of this Protein.How?
Ans-Yes ,bioinformatics can help him to find structure of protein using
different types of methods such as Gor and chau fasman method
The gor method:-
The GOR method (Garnier-Osguthorpe-Robson) is an information theory-
based method for the prediction of secondary structures in proteins.[1] It was
developed in the late 1970's shortly after the simpler Chou-Fasman method.
Like Chou-Fasman, the GOR method is based on probability parameters
derived from empirical studies of known protein tertiary structures solved by
X-ray crystallography. However, unlike Chou-Fasman, the GOR method
takes into account not only the propensities of individual amino acids to
form particular secondary structures, but also the conditional probability of
the amino acid to form a secondary structure given that its immediate
neighbors have already formed that structure. The method is therefore
essentially Bayesian in its analysis.[2]
The GOR method analyzes sequences to predict alpha helix, beta sheet, turn,
or random coil secondary structure at each position based on 17-amino acid
sequence windows. The original description of the method included four
scoring matrices of size 17x20, where the columns correspond to the log-
odds score, which reflects the probability of finding a given amino acid at
each position in the 17-residue sequence. The four matrices reflect the
probabilities of the central, ninth amino acid being in a helical, sheet, turn, or
coil conformation. In subsequent revisions to the method, the turn matrix
was eliminated due to the high variability of sequences in turn regions
(particularly over such a large window). The method was considered as best
requiring at least four contiguous residues to score as alpha helices to
classify the region as helical, and at least two contiguous residues for a beta
sheet
The mathematics and algorithm of the GOR method were based on a earlier
series of studies by Robson and colleagues reported mainly in the Journal of
Molecular Biology . The latter describes the information theoretic
expansions in terms of conditional information measures. The use of the
word "simple" in the title of the GOR paper reflected the fact that the above
earlier methods provided proofs and techniques somewhat daunting by being
rather unfamilar in protein science in the early 1970s; even Bayes methods
were then unfamilar and controversial. An important feature of these early
studies, which survived in the GOR method, was the treatment of the sparse
protein sequence data of the early 1970s by expected information measures.
That is, expectations on a Bayesian basis considering the distribution of
plausible information measure values given the actual frequencies (numbers
of observations). The expectation measures resulting from integration over
this and similar distributions may now be seen as composed of “incomplete”
or extended zeta functions, e.g. z(s,observed frequency) − z(s,expected
frequency) with incomplete zeta function z(s, n) = 1 + (1/2)s + (1/3)s+ (1/4)s
+ …. +(1/n)s. The GOR method used s=1. Also, in the GOR method and the
earlier methods, the measure for the contrary state to e.g. helix H, i.e. ~H,
was subtracted from that for H, and similarly for beta sheet, turns, and coil
or loop. Thus the method can be seen as employing a zeta function estimate
of log predictive odds. An adjustable decision constant could also be
applied, which thus also implies a decision theory approach; the GOR
method allowed the option to use decision constants to optimize predictions
for different classes of protein. The expected information measure used as a
basis for the information expansion was less important by the time of
publication of the GOR method because protein sequence data became more
plentiful, at least for the terms considered at that time. Then, for s=1, the
expression z(s,observed frequency) − z(s,expected frequency) approaches
the natural logarithm of (observed frequency / expected frequency) as
frequencies increase. However, this measure (including use of other values
of s) remains important in later more general applications with high
dimensional data, where data for more complex terms in the information
expansion are inevitably sparse.
The GOR method
Position-dependent propensities for helix, sheet or turn is calculated for each
amino acid. For each position j in the sequence, eight residues on either side
are considered.
A helix propensity table contains information about propensity for residues
at 17 positions when the conformation of residue j is helical. The helix
propensity tables have 20 x 17 entries.
Build similar tables for strands and turns.
GOR simplification:
The predicted state of AAj is calculated as the sum of the position-dependent
propensities of all residues around AAj.
GOR can be used at : http://abs.cit.nih.gov/gor/ (current version is GOR IV)

The GOR method (version IV) was reported by the authors to perform single
sequence prediction accuracy with an accuracy of 64.4% as assessed through
jackknife testing over a database of 267 proteins with known structure.
(Garnier, J. G., Gibrat, J.-F., , Robson, B. (1996) In: Methods in Enzymology
(Doolittle, R. F., Ed.) Vol. 266, pp. 540-53.)
The GOR method relies on the frequencies observed in the database for
residues in a 17- residue window (i.e. eight residues N-terminal and eight C-
terminal of the central window position) for each of the three structural
states.
The Chou-Fasman method:-

If you were asked to determine whether an amino acid in a
protein of interest is
part of a _-helix or _-sheet, you might think to look in a
protein database and
see which secondary structures amino acids in similar
contexts belonged to. The
Chou-Fasman method (1978) is a combination of such
statistics-based methods
and rule-based methods. Here are the steps of the Chou-
Fasman algorithm:
1. Calculate propensities from a set of solved structures. For
all 20 amino acids i,
calculate these propensities by: Pr[ij_-sheet]
That is, we determine the probability that amino acid i is in

each structure,
normalized by the background probability that i occurs at all.
For example,
let's say that there are 20,000 amino acids in the database,
of which 2000 are
serine, and there are 5000 amino acids in helical
conformation, of which 500 are
serine. Then the helical propensity for serine is:
. Note that often you will see propensities

de_ned as:
and that these two formulations are equivalent.

2. Once the propensities are calculated, each amino acid is
categorized using the
propensities as one of: helix-former, helix-breaker, or helix-
indi_erent. (That
is, helix-formers have high helical propensities, helix-
breakers have low helical
propensities, and helix-indi_erents have intermediate
propensities.) Each
amino acid is also categorized as one of: sheet-former,
sheet-breaker, or sheetindi
_erent. For example, it was found (as expected) that glycine
and prolines
are helix-breakers.
3. When a sequence is input, _nd nucleation sites. These are
short subsequences
with a high-concentration of helix-formers (or sheet-
formers). These sites are
found with some heuristic rule (e.g. \a sequence of 6 amino
acids with at least
4 helix-formers, and no helix-breakers").
4. Extend the nucleation sites, adding residues at the ends,
maintaining an average
propensity greater than some threshold.
5. Step 4 may create overlaps; _nally, we deal with these
overlaps using some
heuristic rules.
Figure 1: In the Chou-Fasman method, nucleation sites are

found along the protein
using a heuristic rule, and then extended.
Q2. A scientist discovers that a gene of two organisms are closely related
in a phylogenetic tree.Can he interpret that two organisms are closely
related by comparison of only a single gene.
Ans:-No, he cannot interpret that two organisms are closely related by
comparison of only a single gene .A phylogenetic tree, also known as a
phylogeny, is a diagram that depicts the lines of evolutionary descent of
different species, organisms, or genes from a common ancestor. Phylogenies
are useful for organizing knowledge of biological diversity, for structuring
classifications, and for providing insight into events that occurred during
evolution. Furthermore, because these trees show descent from a common
ancestor, and because much of the strongest evidence for evolution comes in
the form of common ancestry, one must understand phylogenies in order to
fully appreciate the overwhelming evidence supporting the theory of
evolution.
Tree diagrams have been used in evolutionary biology since the time of
Charles Darwin. Therefore, one might assume that, by now, most scientists
would be exceedingly comfortable with "tree thinking"--reading and
interpreting phylogenies. However, it turns out that the tree model of
evolution is somewhat counterintuitive and easily misunderstood. This may
be the reason why biologists have only in the last few decades come to
develop a rigorous understanding of phylogenetic trees. This understanding
allows present-day researchers to use phylogenies to visualize evolution,
organize their knowledge of biodiversity, and structure and guide ongoing
evolutionary research
What an Evolutionary Tree Represents
To better understand what a phylogeny represents, start by imagining one
generation of butterflies of a particular species living the same area and
producing offspring. If you focus on four individual butterflies in both the
parental and offspring generations, the resulting pedigree may appear like
the one in Figure 1B.
Now, expand your image to encompass all the butterflies of this species in a
particular meadow over several generations. A pedigree for this population
might look something like the one in Figure 1C. Note that each individual in
the figure has two parents, but each gives rise to a variable number of
offspring in the next generation.
Next, imagine taking your pedigree and getting rid of the organisms, thus
keeping only the descent relationships, which are the glue that holds the
population together (Figure 1D inset box). Then zoom out even farther to
include many more individuals (say, from multiple meadows in the same
region) and more generations. For example, the whole of Figure 1D is
derived from a similar diagram as the inset box, but it now includes more
individuals and many generations. As you can see, if one were to try to
represent a typical population of several thousand individuals that persists
for hundreds or thousands of generations, all one would see would be a
fuzzy line.
Individual populations may be fairly isolated for some period of time.

However, on an evolutionary timescale, migration will occur among the
discrete populations that make up a typical species. This gene flow between
populations has the effect of "braiding" the population lineages into a single
species lineage, which might be thought of as resembling Figure 1E.
Moreover, during evolution, lineages often split. This occurs when

populations or groups of populations become genetically isolated from one
another. Lineages most commonly split because of the migration of a few
individuals to a new, isolated region (e.g., an island). This is sometimes
called a founder event. Alternatively, a formerly contiguous range can be
broken up by geological or climatic events (e.g., the creation of mountains,
rivers, or patches of inhospitable terrain). This phenomenon is called
vicariance. No matter whether populations split due to founder events or
vicariance, if the isolated populations remain separate, they will start
evolving differences from one another (Figure 1F). After all, a mutation that
arises in one population will have no way to get to the other population.
Thus, even a mutation that would be selectively favored in both populations
will become fixed in only one of the groups.
Figure 2: Branching pattern of four species.
As a consequence of this genetic isolation, the lineages will evolve

separately, becoming more and more different over time. If they remain
apart for long periods, enough physiological and behavioral differences may
evolve to result in reproductive isolation, such that it will be impossible for
individuals from the two lineages to reproduce even in the case that they do
come back into contact. Because of this, it is a useful simplification to
assume that once lineages diverge, the two sets of descendants will remain
distinct.
Figure 2B shows what we might see if we followed the fate of a single

ancestral lineage (Figure 2A) long enough that it gave rise to four
descendant lineages (species). This example includes three lineages that
were established but became extinct before the end of the observation
period. This diagram is an example of a simple phylogenetic tree.
In most cases, researchers draw phylogenetic trees in such a way as to record

only those events that are relevant to a set of living taxa. Most commonly,
these taxa are species. For example, Figure 2C shows the basic tree we could
draw to represent the history of the four "tip" species, A through D. This tree
shows that species A and B share a more recent common ancestor with each
other than with either species C or species D. Likewise, species C and D
share a more recent common ancestor with each other than with either
species A or species B. This example illustrates the fact that a phylogeny is,
at its most basic level, a history of descent from common ancestry.
Phylogenetic trees are fractal in the sense that the same pattern is found
whether we consider recently diverged lineages or deep splits in the tree of
life. Indeed, the most basic postulate of evolutionary theory is that the same
general phenomenon of descent from common ancestry applies to both the
most diverse branches of the tree of life and the most similar. As a result, the
tree structure is extremely helpful in tracking biological diversity at all levels

JH

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JH

Uploaded by

Copyright:

Available Formats

Q1 A scientist wants to know structure of a protein for which there is no

structure available in PDB. Can bioinformatics help him to find the

GOR can be used at : http://abs.cit.nih.gov/gor/ (current version is GOR IV)

The Chou-Fasman method:-

That is, we determine the probability that amino acid i is in

. Note that often you will see propensities

and that these two formulations are equivalent.

Figure 1: In the Chou-Fasman method, nucleation sites are

Individual populations may be fairly isolated for some period of time.

Moreover, during evolution, lineages often split. This occurs when

As a consequence of this genetic isolation, the lineages will evolve

Figure 2B shows what we might see if we followed the fate of a single

In most cases, researchers draw phylogenetic trees in such a way as to record

You might also like