Professional Documents
Culture Documents
JH
JH
The GOR method analyzes sequences to predict alpha helix, beta sheet, turn,
or random coil secondary structure at each position based on 17-amino acid
sequence windows. The original description of the method included four
scoring matrices of size 17x20, where the columns correspond to the log-
odds score, which reflects the probability of finding a given amino acid at
each position in the 17-residue sequence. The four matrices reflect the
probabilities of the central, ninth amino acid being in a helical, sheet, turn, or
coil conformation. In subsequent revisions to the method, the turn matrix
was eliminated due to the high variability of sequences in turn regions
(particularly over such a large window). The method was considered as best
requiring at least four contiguous residues to score as alpha helices to
classify the region as helical, and at least two contiguous residues for a beta
sheet
The mathematics and algorithm of the GOR method were based on a earlier
series of studies by Robson and colleagues reported mainly in the Journal of
Molecular Biology . The latter describes the information theoretic
expansions in terms of conditional information measures. The use of the
word "simple" in the title of the GOR paper reflected the fact that the above
earlier methods provided proofs and techniques somewhat daunting by being
rather unfamilar in protein science in the early 1970s; even Bayes methods
were then unfamilar and controversial. An important feature of these early
studies, which survived in the GOR method, was the treatment of the sparse
protein sequence data of the early 1970s by expected information measures.
That is, expectations on a Bayesian basis considering the distribution of
plausible information measure values given the actual frequencies (numbers
of observations). The expectation measures resulting from integration over
this and similar distributions may now be seen as composed of “incomplete”
or extended zeta functions, e.g. z(s,observed frequency) − z(s,expected
frequency) with incomplete zeta function z(s, n) = 1 + (1/2)s + (1/3)s+ (1/4)s
+ …. +(1/n)s. The GOR method used s=1. Also, in the GOR method and the
earlier methods, the measure for the contrary state to e.g. helix H, i.e. ~H,
was subtracted from that for H, and similarly for beta sheet, turns, and coil
or loop. Thus the method can be seen as employing a zeta function estimate
of log predictive odds. An adjustable decision constant could also be
applied, which thus also implies a decision theory approach; the GOR
method allowed the option to use decision constants to optimize predictions
for different classes of protein. The expected information measure used as a
basis for the information expansion was less important by the time of
publication of the GOR method because protein sequence data became more
plentiful, at least for the terms considered at that time. Then, for s=1, the
expression z(s,observed frequency) − z(s,expected frequency) approaches
the natural logarithm of (observed frequency / expected frequency) as
frequencies increase. However, this measure (including use of other values
of s) remains important in later more general applications with high
dimensional data, where data for more complex terms in the information
expansion are inevitably sparse.
The GOR method
Position-dependent propensities for helix, sheet or turn is calculated for each
amino acid. For each position j in the sequence, eight residues on either side
are considered.
A helix propensity table contains information about propensity for residues
at 17 positions when the conformation of residue j is helical. The helix
propensity tables have 20 x 17 entries.
Build similar tables for strands and turns.
GOR simplification:
The predicted state of AAj is calculated as the sum of the position-dependent
propensities of all residues around AAj.
The GOR method relies on the frequencies observed in the database for
residues in a 17- residue window (i.e. eight residues N-terminal and eight C-
terminal of the central window position) for each of the three structural
states.
Q2. A scientist discovers that a gene of two organisms are closely related
in a phylogenetic tree.Can he interpret that two organisms are closely
related by comparison of only a single gene.
Ans:-No, he cannot interpret that two organisms are closely related by
comparison of only a single gene .A phylogenetic tree, also known as a
phylogeny, is a diagram that depicts the lines of evolutionary descent of
different species, organisms, or genes from a common ancestor. Phylogenies
are useful for organizing knowledge of biological diversity, for structuring
classifications, and for providing insight into events that occurred during
evolution. Furthermore, because these trees show descent from a common
ancestor, and because much of the strongest evidence for evolution comes in
the form of common ancestry, one must understand phylogenies in order to
fully appreciate the overwhelming evidence supporting the theory of
evolution.
Tree diagrams have been used in evolutionary biology since the time of
Charles Darwin. Therefore, one might assume that, by now, most scientists
would be exceedingly comfortable with "tree thinking"--reading and
interpreting phylogenies. However, it turns out that the tree model of
evolution is somewhat counterintuitive and easily misunderstood. This may
be the reason why biologists have only in the last few decades come to
develop a rigorous understanding of phylogenetic trees. This understanding
allows present-day researchers to use phylogenies to visualize evolution,
organize their knowledge of biodiversity, and structure and guide ongoing
evolutionary research
What an Evolutionary Tree Represents
To better understand what a phylogeny represents, start by imagining one
generation of butterflies of a particular species living the same area and
producing offspring. If you focus on four individual butterflies in both the
parental and offspring generations, the resulting pedigree may appear like
the one in Figure 1B.
Now, expand your image to encompass all the butterflies of this species in a
particular meadow over several generations. A pedigree for this population
might look something like the one in Figure 1C. Note that each individual in
the figure has two parents, but each gives rise to a variable number of
offspring in the next generation.
Next, imagine taking your pedigree and getting rid of the organisms, thus
keeping only the descent relationships, which are the glue that holds the
population together (Figure 1D inset box). Then zoom out even farther to
include many more individuals (say, from multiple meadows in the same
region) and more generations. For example, the whole of Figure 1D is
derived from a similar diagram as the inset box, but it now includes more
individuals and many generations. As you can see, if one were to try to
represent a typical population of several thousand individuals that persists
for hundreds or thousands of generations, all one would see would be a
fuzzy line.