You are on page 1of 2

9

Chou-Fasman Prediction of the


Secondary Structure of Proteins
The Chou-Fasman-Prevelige Algorithm

Peter Prevelige, Jr., and Gerald D. Fasman

L Introduction .................................................................... 391


II. Review of the Method and Rationale of the Chou-Fasman Algorithm ..................... 392
III. Operation of the Prediction Program: The Input File ................................... 394
IV. Data Reduction: The Output File ................................................... 396
V. Analysis of Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 397
A. Search for Helical Regions .................................................... 397
B. Search for ~-Sheet Regions .................................................... 397
C. Search for ~ Turns ........................................................... 398
D. Resolving Overlapping Regions ................................................. 399
VI. Graphic Display ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 399
VII. Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 399
VIII. Lotus Hints ..................................................... , ............... 400
IX. Accuracy of Prediction ........................................................... 402
X. Prediction of Staphylococcal Nuclease ............................................... 402
Xl. Prediction of Subtilisin ........................................................... 407
XII. Appendixes ..................................................................... 413
Appendix I: C Language Source Code for Program PREDICT ••... , ..•..•.•.•.....••....• 413
Appendix 2: Include File "Protein.Dat" ............................................ 416
XIII. References ............................... ......................... ;............ 416

I. INTRODUCTION

The Chou-Fasman algorithm for the prediction of protein secondary structure is one of the
most widely used predictive schemes. This is because of its relative simplicity and its reasona-
bly high degree of accuracy.

Peter Prevelige, Jr. • Department of Biology, Massachusetts Institute of Technology, Cambridge, Mas-
sachusetts 02139. Gerald D. Fasman • Graduate Department of Biochemistry, Brandeis University,
Waltham, Massachusetts 02254.

391
G. D. Fasman (ed.), Prediction of Protein Structure and the Principles of Protein Conformation
© Plenum Press, New York 1989
392 Peter Prevelige, Jr., and Gerald D. Fasman

A number of modifications of the Chou-Fasman algorithm have been developed and


published (see G. D. Fasman, Chapter 6, this volume, for a review). However, in general
these suffer from one of two faults: either they are completely computerized and hide much of
the decision-making process from the user or they leave the user to make decisions but do not
adequately describe the decision-making process used by the authors.
This chapter attempts to outline the approach that has been successfully employed by the
authors over the past several years. The approach is one in which a computer program is
employed to perform the arithmetic calculations and then the data reduction is performed by
hand. This approach utilizes the computer to reduce the tedious calculations while at the same
time allowing the individual to bring his experience and intuition to bear. The computer
program itself was developed from ideas in a program originally written by Dr. George Long
and Jeff Siegel in 1979.
The first section of this chapter reviews the Chou-Fasman method for prediction of
protein structure. This is followed by a section that lays out the mechanics of operating the
program and then by a discussion of the process of data reduction. Finally, worked examples
are provided in the hope that they will make more concrete the many considerations involved
in predicting a protein secondary structure.

II. REVIEW OF THE METHOD AND RA TIONALE OF THE CHOU-FASMAN


ALGORITHM

The Chou-Fasman algorithm is an algorithm to predict the secondary structure of proteins


from their amino acid sequence. It falls into the class ofthe statistical approach as discussed by
Fasman (Chapter 6, this volume).
The x-ray-determined structures of 15 proteins containing 2473 amino acid residues were
carefully examined, and the number of occurrences of a given amino acid in the 0: helix, ~
sheet, and coil was tabulated (Table I). From this, the conformational parameters for each
amino acid were calculated by considering the relative frequency of a given amino acid within
a protein, its occurrence in a given type of secondary structure, and the fraction of residues
occurring in that type of structure (Chou and Fasman, 1974a). This conformational parameter
is essentially a measure of a given amino acid's preference to be found in 0: helix, ~ sheet, or
coil. These parameters, symbolized by Po.' PI">' and Pc, respectively, presumably contain
information about the physical-chemical parameters defining protein stability, such as hydro-
phobicity, properly weighted for their relative importance. These parameters therefore should
be useful for predicting a protein's secondary structure based on the amino acid sequence.
Having computed these conformational parameters, Chou and Fasman formulated a set of
empirical rules for predicting secondary structure (Chou and Fasman, 1974b). The develop-
ment of these empirical rules was guided by underlying considerations of protein structure.
These rules, when applied by Chou and Fasman, resulted in a 70-80% predictive accuracy.
The rules were never developed as a computer algorithm and hence lack the type of rigorous
definition that a computer algorithm requires. This has led to a wide variety of implementa-
tions, which have an equally wide variety of accuracies.
Chou and Fasman later extended the analysis of 0: helix, ~ sheet, and coil to include 29
proteins of known x-ray structure. This increased the total number of residues classified to
4741, or approximately double the initial number (Chou and Fasman, 1978). The most
pronounced change occurred for Met. This change resulted from an underrepresentation of Met
in the initial 15 proteins examined. Less pronounced changes were also seen in Asn, Asp, Ala,
His, Gly, lIe, Lys, and Tyr (Table II).

You might also like