You are on page 1of 4

Research: Science and Education

Anticancer Activity of Estradiol Derivatives: A Quantitative StructureActivity Relationship Approach

Ken Muranaka Ks Garden Nishioji, Suite 401, Kasuga Hachijo Sagaru, Minami-ku, Kyoto 601-8312, Japan;

Today it would cost as much as $500 million to bring a new drug to market, and clinical development and the approval process can take over 10 years while the success rates of lead compounds remain very low. A systematic and efficient approach to drug discovery is imperative (13). Medicinal chemistry draws on the principles of chemistry, biochemistry, molecular biology, and pharmacology to introduce new therapeutic agents (4 ). To facilitate the drug discovery process, quantitative structureactivity relationship (QSAR) analysis using state-of-the-art computing technology has become a standard method of pharmaceutical and agrochemical scientists. By QSAR, one can exploit a simple view of ligand (drug) structure versus biological or pharmacological activities in conventional terms even in the absence of information regarding the three-dimensional structure (X-ray crystallographic data) of a receptor or an enzyme (57 ). QSAR techniques are very practical in lead optimization and can produce a serendipitous discovery. Numerous adaptations and applications are being found in industry. In terms of the pedagogical value of QSAR or QSPR (quantitative structureproperty relationship) with real-world data, only a few articles have appeared in this Journal (811). Commercial packages to implement modern QSAR techniques are highly priced, but the essence of QSAR can be taught without them. As a classroom example, published data on anticancer activities of estra-

diol analogs (12) will be analyzed by a QSAR approach using only the Lotus function in Microsoft Excel 97 and the Microsoft Excel 97 statistical tool.W StructureActivity Data for Cytotoxicity of Estradiol Analogs The formation of new blood vessels (angiogenesis) is a critical factor in the growths of tumors. Antiangiogenic therapy is a promising concept that has actually been applied in the treatment of tumors and angiogenic diseases. The first steroid to have inhibitory effects on angiogenesis was 2methoxyestradiol, which is believed to be a potent inhibitor of endothelial cell proliferation and migration (13). 2-Methoxyestradiol is a naturally occurring mammalian metabolite of the female sex hormone estradiol. It blocks mitosis via inhibition of tubulin polymerization by binding to the colchicine (an antineoplastic agent) binding site of tubulin (12 ). Analogs of 2-methoxyestradiol (see structure) were synthesized, and their cytotoxicities and antitubulin activities were examined (12).
17 1 2 A B C D


3 4 6

Table 1. Parameter Table, First Step for QSAR Analysis

No. 1 8a 8b 8c 9 10a 10b 10c 10d 10e 11a 11b 11c 11d 11e 13 14 15 22 23 C-2 R CH3O C2H5O n-C3H7O i-C3H7O CH3 H2C=CH CH3CH=CH C2H5CH=CH n-C3H7CH=CH (CH3)2C=CH C2H5 n-C3H7 n-C4H9 n-C5H11 (CH3)2CHCH2 I(17-OAc) I C2H5S CH3CONH C2H6N IC50 2.9 0.91 4.2 4.8 17 2.4 1.1 8.6 40 9.4 7.7 4.9 40 40 40 40 4.8 10 40 3 log(1/IC50) 5.538 6.041 5.377 5.319 4.770 5.620 5.959 5.066 4.398 5.027 5.114 5.310 4.398 4.398 4.398 4.398 5.319 5.000 4.398 5.523 0.02 0.38 1.05 0.36 0.56 0.82 1.22 1.02 1.55 2.13 2.2 1.7 1.12 1.07 0.97 0.08 MR 0.79 1.25 1.71 1.71 0.56 1.1 1.56 2.03 2.03 1.03 1.5 1.96 2.42 1.96 1.39 1.84 1.49 1.5 0 0 0 1 0 0 1 0 Ied 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 Ibr 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1


The results are summarized in Table 1. Compound numbers correspond to those used in the original article (12). With the exception of the 17-acetate derivative 13, analogs were modified only at the 2-position. Because compound 13 does not have significant inhibitory effect, it was not included in this analysis. The common logarithms of the reciprocal values of the tubulin polymerization inhibitory activities, IC50 (M), were used to define the biological response. This indicates that the smaller the concentration used to cause the inhibitory effect is, the larger is the log of 1/IC50 and thus larger is the response. Building the QSAR Models The aim of QSAR studies is to best correlate the physicochemical properties and structural features of a set of the congeners with the observed biological responses. Because the structural modifications were introduced solely at the carbon-2 position in the structure shown, appropriate descriptors or parameters for the substituents or molecules can be correlated to the observed biological or pharmacological responses (dependent variable), and can thus be the explanatory (independent) variables in multiple linear regression. The regression models are the QSAR models that can be used to predict or design the optimized candidates for the lead compounds.


Journal of Chemical Education Vol. 78 No. 10 October 2001

Research: Science and Education

Numerous physicochemical properties and structural parameters have been devised for QSAR studies (1520). Hydrophobicity is the most used property. It is the 1-octanol/ water partition coefficient (P); its logarithm, log P, for a molecule or for a substituent (symbolized as ) can characterize the optimized candidate structures in terms of solubility in water. The electronic effect (known as the Hammett ) may provide a clue to the best substituents on the basis of their electron-releasing or electron-withdrawing properties. Tafts steric effect (Es), molar refractivity (MR), or Verloop parameters can define structural features. Indicator variables to define properties or characteristics by 1 (presence) or 0 (absence) can also be included. QSAR models can be empirical (e.g., physicochemical properties), quantum chemical (e.g., net atomic charges), nonempirical (e.g., molecular connectivity), or three-dimensional, and may be classified accordingly. Finding the QSAR parameters (independent variables) that best explain the observed activities is important in order to interpret the data and elucidate the lead compounds. Given QSAR data, we may have many different QSAR models and many different interpretations (e.g., given a historical chart of the Dow Jones Industrial Average, a variety of econometric models can be proposed). Once the QSAR parameters have been defined, statistical judgment based on correlation coefficient (r) and p (probability) values for these parameters in multiple regression can be used to propose the best models. As indicated in Table 1, the substituents hydrophobicity (), molar refractivity (MR), and two indicator variables (Ied, Ibr) were chosen as the QSAR parameters for this study. All values of and MR were obtained from the literature (17). Ied

is 1 if oxygen, nitrogen, or a double bond is present to define the effect of increased electron density; otherwise, Ied is set equal to 0. Similarly, Ibr indicates the effect of branching; it is defined to be 1 for branched substituents and otherwise is 0. Table 2 is the correlation matrix for the QSAR parameters. In multiple linear regression, regressors should be independent of one another; otherwise, there is a problem called multicolinearity, and the proposed model would not be reliable. In this QSAR table, and MR are correlated by .55 (55%). An increase in MR means larger substituents; hydrophobicity should increase with increasing number of carbon atoms. These two variables may provide similar information about size. When constructing a multiple linear regression model, inclusion of both and MR as explanatory variables must be avoided. Multiple linear regression can be done easily if a Lotus tool in Excel is used. Alternatively, the add-in for statistical analysis (Data Analysis in Tools) in Excel can be used easily to implement the regression analysis, as output containing standard errors and p values are automatically generated. The Lotus tool has more pedagogical value because a student must compute p values from standard errors.W Results and Discussion Two QSAR models were obtained. The values in parentheses are p values, and a parameter is considered statistically significant if p < .05; r is the regression correlation coefficient, and n is the number of the observations. The first model optimizes the structureactivity data with the QSAR parameters , the quadratic term 2, and the indicator variable Ied. log(1/IC50) = 4.963 + 0.384 0.2872 + 0.593Ied (1) (0.031) (0.011) (0.006) r = .881; n = 16 All the p values for the regression coefficients are less than .05 (5%), and r is high (88.1%). Because the coefficient of Ied is positive, an increase in electron density should increase the biological response. This is consistent with the fact that the congeners with oxygen or double bond (8a and 10b) show a strong inhibitory effect on tubulin polymerization. The quadratic term for the hydrophobic factor was included in eq 1 so that an optimized value of can be obtained. But the optimum can be found only if the regression coefficient of 2 is negative, because such a parabola in terms of opens downward. Taking the partial derivatives of both sides of eq 1 with respect to and solving for , we find 0.67 to be the optimal value for the hydrophobicity, which suggests that slightly hydrophobic substituents are favored. The fact that such an optimum exists and the finding that an increase in electron density will increase the response would explain why the response of compound 8a (the ethoxy substituent containing an oxygen) is so high. Figure 1 is a graph indicating how the observed inhibition can be explained by hydrophobicity . Two theoretical plots (for Ied = 0, 1) are drawn. The parabolic curves predict the presence of an optimum hydrophobic structure. However, the QSAR model obtained (eq 1) is valid only for the values that are between the two extreme values in the data set (Table 1). This is a limitation of QSAR approach as a predictive tool.

Table 2. Correlation Matrix for the QSAR Parameters

log(1/IC50) log(1/IC50) 1 MR Ied Ibr 1 0.26801 MR 0.41382 0.549968 1 Ied 0.577369 0.37166 0.0565 1 I br 0.41583 0.24965 0.271511 0.16736 1


Observed I ed = 1 I ed = 0






4.0 -1.0







Figure 1. Parabolic relationship between inhibition and hydrophobicity. Vol. 78 No. 10 October 2001 Journal of Chemical Education


Research: Science and Education

The second QSAR model uses MR, Ied, and Ibr as the independent variables. log(1/IC50) = 4.137 + 1.640MR 0.665MR2 + 0.555Ied 0.453Ibr (2) (0.031) (0.011) (0.000) (0.015) r = .911; n = 18 All the regression coefficients are significant, and the correlation r has improved. Because the regression coefficient of MR2 is negative, the optimal MR should exist, and it has been computed to be 1.23. Again, the regression coefficient of Ied is positive so that increased electron density is advantageous to enhance the inhibitory effect. Branching in the substituents is not favored, as the regression coefficient of Ibr is negative. The molar refractivities of 8a, 10a, and 10b are close to the optimum. These substituents are not branched, and they have an increased electron density due to oxygen or a double bond. Table 1 shows that the natural product 2-methoxyestradiol (1) is a potent inhibitor of tubulin polymerization. But the two QSAR models suggest that more potent analogs of this natural product should exit. The optimal value of MR indicates that there is a critical size factor for the substituent. As branching would decrease the activity, the shape of the substituent is important also. Therefore the optimized structure of the substituent at the 2-position must be a 2- or 3-carbon unbranched chain containing a double bond, oxygen, or nitrogen. This should explain the high inhibitory activities of 8a, 10a, and 10b. The low activity of 10c, though the structure is unbranched and has a double bond, is then clear: it is too long, or the MR is much larger than the optimum. The branching index proposed by Randic (11, 21, 22) may be used in place of the indicator variable Ibr, but eq 2 would not be statistically valid (i.e., p values for the QSAR parameters will be greater than .05). Though controversial, this topological index was shown to be highly correlated to molar refractivity (and possibly to the partition coefficient) and thus would not give medicinal chemists any additional information for finding more potent inhibitors (2326 ). Molar refractivity and hydrophobicity are correlated (Table 1), and both can be a measure of how bulky the substituent R is. However, to really determine the effect of branching in R on inhibition, the indicator variable should be an appropriate choice in this QSAR analysis. The Randic branching index can be computed by hand, and the method is well explained in this Journal (9, 11). Proposing another QSAR model with this index as a variable may be a good exercise. If a student has access to molecular modeling software, net atomic (oxygen) charge on the C3-OH position may be a better parameter than the indicator variable Ied . Chance correlation may occur when there are more variables than the actual number of observations (27, 28). Multicolinearity can be a problem. Outliers can profoundly affect the analysis. Building a multiple linear regression model without being aware of these problems should be avoided (29). Partial least squares (PLS) regression may be used, but this will require a software package other than Excel (30 33). PLS can produce a model even when there are more QSAR parameters to be screened than there are observations; cross-validation technique evaluates the resulting model in terms of how well it predicts (rather than how well it fits) the

data. PLS is used in the three-dimensional QSAR approach CoMFA (comparative molecular field analysis). More information about QSAR/QSPR and relevant software is available on the Web (30, 31, 3446 ). Conclusion An underlying concept in rational design and discovery of new drugs is the pharmacophore, a three-dimensional arrangement of key molecular features that would form specific interactions with a target receptor (18). Biological response is elicited upon formation of a drugreceptor complex for which there is one rate-determining reaction at the active site. The true effectiveness of a drug is then its ability to selectively hit the target so that adverse reactions will be minimized. QSAR models linearly correlate the structural features and physicochemical properties of a drug molecule with the biological activity by multiple linear regression technique (19). Biological response cannot be explained by a one-step reaction. It is also highly likely that a biological system is a nonlinear system. Molecular descriptors used in QSAR must be incomplete. The QSAR approach has been criticized as being a conceptual offshoot of what used to be called absolute reaction rate theory and as not having been very successful, even when supplemented by a host of empirical and semiempirical information coming directly from bioassays (47 ). A pharmacophore, however, can be used to suggest new compounds that might exhibit an improved biological response. For example, a proposed pharmacophore for tubulin polymerization inhibitor has the following structural features at the 2-position: (i) increased electron density, (ii) slightly hydrophobic group, and (iii) an unbranched chain with the length of 2 or 3 carbon atoms. Further, relative distance information given by the 3-hydroxyl, the 2-position, and the center of ring A in the steroid backbone may be an additional search query. Items to define a pharmacophore are like the keywords for a database search. One can search through the databases that contain three-dimensional structures of known chemicals and drugs to devise novel structures that best match the proposed pharmacophore. Pharmacophore identification can be done without crystallographic data on the target receptor or enzyme. The QSAR approach is therefore an effort to reduce the time and the money in lead generation. Not only that, the QSAR/QSPR approach is applied in many areas to reduce animal testing (38), to protect the environment by predicting toxicity (37, 48), to estimate drug absorption in humans (49), and to predict chromatographic retention coefficients (50), to mention a few. Numerous structureactivity data are published in journals such as Journal of Medicinal Chemistry. CA Selects of Chemical Abstracts Service has the title StructureActivity Relationships (51). Although three-dimensional QSAR techniques are popular (18), one can still perform a two-dimensional QSAR analysis on published data by multiple linear regression using Excel or Lotus. It would be a challenging exercise to compare the results of the two-dimensional analysis and the published three-dimensional QSAR models. The major structural requirements revealed should have a good agreement between the two-dimensional and three-dimensional analyses (52).


Journal of Chemical Education Vol. 78 No. 10 October 2001

Research: Science and Education


Supplemental Material

A sample workbook (Microsoft Excel file) is available in this issue of JCE Online. The workbook contains the data, the correlation matrix, the regression outputs, and the manual on how to use the Louts tool and the Excel add-in for multiple regression. Literature Cited
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Zarrabian, S. Chem. Eng. News 2000, 78 (Feb 7), 5. Thayer, A. M. Chem. Eng. News 2000, 78 (Feb 7), 1932. Wilson, E. Chem. Eng. News 2000, 78 (Jul 3), 27. Korolkovas, A.; Burckhalter, J. H. Essentials of Medicinal Chemistry; Wiley: New York, 1976. Topliss, J. G. Perspect. Drug Discov. Design 1993, 1, 253268. Van de Waterbeemd, H. Quant. Struct.Act. Relat. 1992, 11, 200204. Van de Waterbeemd, H. Drug Design Discov. 1993, 9, 277285. Hansch, C. J. Chem. Educ. 1974, 51, 360365. Seybold, P. G.; May, M.; Bagal, U. A. J. Chem. Educ. 1987, 64, 575581. Roy, G. J. Chem. Educ. 1989, 66, 435436. Hansen, P. J.; Jurs, P. C. J. Chem. Educ. 1988, 65, 574580. Cushman, M.; He, H.-M.; Katzenellenbogen, J. A.; Lin, C. M.; Hamel, E. J. Med. Chem. 1995, 38, 20412049. Fotsis, T.; Zhang, Y.; Pepper, M. S.; Adlercreutz, H.; Montesano, R.; Nawroth, P. P.; Schweigerer, L. Nature 1994, 368, 237239. DAmato, R. J.; Lin, C. M.; Flynn, E.; Folkman, J.; Hamel, E. Proc. Natl. Acad. Sci. USA 1994, 91, 39643968. Hansch, C.; Fujita, T. J. Am. Chem. Soc. 1964, 86, 16161626. Hansch, C.; Leo, A. Exploring QSAR: Fundamentals and Applications in Chemistry and Biology; American Chemical Society: Washington, DC, 1995. Hansch, C.; Leo, A.; Hoekman, D. Exploring QSAR: Hydrophobic, Electronic, and Steric Constants; American Chemical Society: Washington, DC, 1995. 3D QSAR in Drug Design: Theory, Methods and Applications; Kubinyi, H., Ed.; ESCOM Science: Leiden, Netherlands, 1993. Purcell, W. P.; Bass, G. E.; Clayton, J. M. Strategy of Drug Design: A Guide to Biological Activity; Wiley: New York, 1973. Patrick, G. L. An Introduction to Medicinal Chemistry; Oxford University Press: Oxford, 1995. Randic, M. J. Am. Chem. Soc. 1975, 97, 66096615. Murray, W. J.; Kier, L. B. J. Med. Chem. 1976, 19, 573578. Saxena, A. K. Quant. Struct.Act. Relat. 1995, 14, 3138. Saxena, A. K. Quant. Struct.Act. Relat. 1995, 14, 142148. Kubinyi, H. Quant. Struct.Act. Relat. 1995, 14, 149150. Saxena, A. K. Quant. Struct.Act. Relat. 1995, 14, 150. Topliss, J. G.; Costello, R. J. J. Med. Chem. 1972, 15, 10661069. Topliss, J. G.; Edwards, R. P J. Med. Chem. 1979, 22, 12381244. . Henderson. H. V.; Velleman, P. F. Biometrics 1981, 37, 391411. Molecular Analysis Pro with Molecular Modeling Pro (provided by Infochem in UK) gives chemists a QSAR/QSPR system that includes a partial least squares (PLS) regression routine developed by William J. Dunn; (accessed Jun 2001). Tripos is the developer of a three-dimensional QSAR technique, CoMFA; (accessed Jun 2001).

14. 15. 16.


18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.


32. Dunn, W. J. III; Wold, S.; Edlund, U.; Hellberg, S.; Gasteiger, J. Quant. Struct.Act. Relat. 1984, 3, 131137. 33. Cramer, R. D. III; Patterson, D. E.; Bunce, J. D. J. Am. Chem. Soc. 1988, 110, 59595967. 34. Accelrys provides molecular modeling and QSAR systems; (accessed Jun 2001). 35. Wavefunction is known for Spartan, which performs molecular mechanics, semiempirical, and ab initio calculations; http://www. (accessed Jun 2001). 36. Semichem provides a QSAR program CODESSA that ties with AMPAC (semiempirical methods). Alan Katritzky (University of Florida) and Mati Karelson (University of Tartu, Estonia) are responsible for the development of CODESSA. http:// (accessed Jun 2001). 37. The Computerized Molecular Evaluation of Toxicity (COMET) project in Italy has a quantitative approach that includes QSAR/QSPR to study toxicological/ecotoxicological data; http://www. (accessed Jun 2001). 38. Altweb (Alternatives to Animal Testing) at The Johns Hopkins University has abstracts of QSAR/QSPR articles in biomedical fields; (accessed Jun 2001). 39. Hypercube in Canada developed the Windows-based molecular modeling environment HyperChem for quantum chemical calculations, molecular mechanics, and dynamics; http:// (accessed Jun 2001). 40. Molecular Simulations provides various molecular graphics and modeling systems for life science and materials research; http:// (accessed Jun 2001). 41. The Institute of Medicinal Molecular Design, established by Akiko Itai in Japan, is developing software for three-dimensional rational design of drugs; (accessed Jun 2001). 42. Drug Design Laboratory at Milan University in Italy; http:// (accessed Jun 2001). 43. The Journal of the Chemical Computing Group in Canada provides QSAR applications to obtain molecular descriptors; (accessed Jun 2001). 44. ChemSW has a variety of chemistry software, including QSAR/QSPR packages for Windows; (accessed Jun 2001). 45. The QSAR Server at the University of North Carolina Chapel Hill contains information on QSAR-related software; http:// (accessed Jun 2001). 46. NetSci has a comprehensive list of the software for computerassisted drug design (CADD), and computer-assisted molecular design (CAMD); Modeling/CADD/ (accessed Jun 2001). 47. Rosen, R. Comput. Chem. 1996, 20, 95100. 48. Drew, M. G. B.; Lumley, J. A.; Price, N. R. Quant. Struct. Act. Relat. 1999, 18, 573583. 49. Raevsky, O. A.; Fetisov, V. I.; Trepalina, E. P.; McFarland, J. W.; Schaper, K.-J. Quant. Struct.Act. Relat. 1999, 19, 366374. 50. Kaliszan, R. Quantitative StructureChromatographic Retention Relationships; Wiley: New York, 1987. 51. CA Selects: StructureActivity Relationships; http://caselects. (accessed Jun 2001). 52. Muszynski, I. C.; Scapozza, L.; Kovar, K.-A.; Folkers, G. Quant. Struct.Act. Relat. 1999, 18, 342353. Vol. 78 No. 10 October 2001 Journal of Chemical Education