Literature thesis

Virtual screening of Cytochrome P450 ligands: Challenges and considerations

Andrianopsyah Mas Jaya Putra

Supervisors: dr. Daan P. Geerke Prof. dr. Nico P.E. Vermeulen

Department of Chemistry & Pharmaceutical Sciences Faculty of Sciences - Vrije Universiteit, the Netherlands July 2010

1

Contents

Abstract .................................................................................................................................. 1 Introduction ............................................................................................................... 1.1 1.2 1.3 1.4 The importance of virtual screening of CYP450 ligands ................................. Characteristics of CYP450s and their substrates .............................................. Validation of virtual screening model .............................................................. Limitations of this thesis ..................................................................................

4 5 5 8 9 11 12 13 15 17 18 18

2 Docking of CYP450 ligands ..................................................................................... 2.1 2.2 2.3 2.4 2.5 Effect of CYP450 structure on docking of CYP450 ligands .......................... Effect of water molecules in CYP450's active site on docking of CYP450 ligands ............................................................................................................. Effect of ligand restraining on docking of CYP450 ligands ........................... The issue of scoring function .......................................................................... Summary ..........................................................................................................

3 Shape-matching, pharmacophore-matching, and field calculation of CYP450 ligands ........................................................................................................................ 3.1 3.2 3.3 3.4 Shape-matching of CYP450 ligands ............................................................... Pharmacophore-matching of CYP450 ligands ................................................ Field calculation of CYP450 ligands .............................................................. Summary .......................................................................................................... 19 20 22 23 27 28 28 30 31 34 36 37

4 QSAR and classification of CYP450 ligands ........................................................... 4.1 4.2 QSAR of CYP450 ligands .............................................................................. Classification of CYP450 ligands by machine learning ................................... 4.2.1 Classification of CYP450 ligands by Support Vector Machine (SVM) 4.2.2 Classification of CYP450 ligands by decision tree ............................. 4.3 Summary ..........................................................................................................

5 Conclusions and perspectives ...................................................................................

2

Acknowledgments .................................................................................................... References ..................................................................................................................

38 39

3

Abstract

CYP450s (Cytochrome P450s) are liver enzymes involved in the Phase I metabolism. Binding of drugs to a CYP450 can lead to formation of reactive metabolites, or to CYP450 inhibition and drug-drug interactions. For these reasons, it is necessary in an early stage of drug development to predict possible interactions with CYP450s to reduce attrition risks. However, invitro and in-vivo experiments to test CYP450 affinity and/or activity for large sets of drug candidates can be laborious. Alternatively, computational models can be used to predict CYP450 affinities and/or activities of the compounds. This prediction then serves as a guide for in-vitro screening; thereby, the screening could be performed more efficiently. The computational approach to predict a compound's affinity / activity to a particular target is known as virtual screening. Examples of virtual screening techniques are: docking, shape-matching, pharmacophore-matching, field calculation, QSAR, and machine learning. In turn, these techniques can be classified into: protein-based techniques (if the models are generated using a protein structure) or ligand-based techniques (if the models are generated from structures or chemical properties of active ligands). This thesis aims on presenting an overview of challenges and considerations in the application of these techniques on CYP450 ligands in the last five years, related to the chemical natures of the relevant CYP450 isoforms (e.g. flexibility of CYP450) and their ligands. ■ Keywords: Cytochrome P450, virtual screening, docking, shape-matching, pharmacophore, field, QSAR, machine learning

4

1 Introduction

1.1. The importance of virtual screening of CYP450 ligands CYP450s (Cytochrome P450s) are enzymes in humans, in which some of them are involved in the Phase I metabolism in the liver (Rock et al., 2008). They contribute around 75% to the metabolism of top 200 drugs which were prescribed in the U.S. in 2002 (Williams et al., 2004). The most contributive CYP450s are: CYP450 3A4 (CYP3A4), CYP450 2C9 (CYP2C9), CYP450 2C19 (CYP2C19), CYP450 2D6 (CYP2D6), and CYP450 1A2 (CYP1A2) (Williams et al., 2004). A CYP450 transforms its substrate into a more polar one, in order to ease its excretion from the body (Boelsterli, 2009). This transformation is facilitated by an Fe atom which is bound to a heme cofactor inside the catalytic pocket of the CYP450 (Figure 1.1). Figure 1.2 describes a typical example of this transformation.

Figure 1.1. An example of CYP450's active site structure (1.95 Å crystal structure of CYP1A2; PDB file: 2HI4). Blue sticks represent amino acids, and blue ribbons represent the backbone. Reddish sticks represent the heme, and the orange ball at the center of it represents the Fe atom. Above the heme, there is a BHF (2-phenyl-4H-benzo(H)chrome-4-one) ligand, depicted as yellow sticks. The lone red ball represents a crystal water molecule. (Sansen et al., 2007)

5

Figure 1.2. An example of catalytic transformation by CYP450 (Rock et al., 2008). Nitrogens are parts of heme that bind Fe (not fully drawn). “Cys” refers to cysteine of the CYP450, below the heme. Step which is relevant to this thesis is marked by a blue dashed border.

An orally administered drug could bind to a CYP450 and be transformed into a reactive metabolite, which is potentially hazardous (Boelsterli, 2009). Or, it functions as a CYP450 inhibitor, leading to a drug-drug interaction (Boelsterli, 2009). The drug-drug interaction could also be caused by the genetically-implied absence or inactivation of a CYP450 (referred as genetic polymorphism) (Ingelman-Sundberg et al., 1999). This drug-drug interaction through CYP450 has caused the withdrawal of several drugs (Lin et al., 1998). Due to these reasons, it is necessary to know in the early stage of drug development if the drug would bind to CYP450, so that a decision could be made: whether its development will be carried on or not. In Figure 1.2, this challenge is related to step (i). Such challenge is addressed by testing the drug on CYP450s in-vitro (Zlokarnik et al., 2005). However, this method is laborious for a large set of compounds. Alternatively, the in-vitro test is approached by a computational model that can be used to predict the affinities / activities of the compounds against a CYP450. This prediction then serves as a guide for the in-vitro screening. Thereby, the in-vitro screening could be performed more efficiently. Table 1.1 shows a comparison of typical cost between computer modeling and other experiments.

6

Table 1.1. Typical costs of various experiments in drug discovery and development (Young, 2009)

The computational approach to predict a compound's affinity / activity to a particular target is known as virtual screening. Examples of virtual screening techniques are presented in Table 1.2. These techniques can be classified into two based on their models. If their models are generated by involving a protein structure (X-Ray, NMR, or homology), they are called structure-based techniques (In this thesis, they are termed protein-based techniques for clarity) (Good, 2006). If their models are generated from structure of active ligand, then they are called ligand-based techniques (Good, 2006). Table 1.2. Classification of virtual screening techniques Protein-based techniques Docking Shape-matching QSAR Pharmacophore- Machine learnings matching Field calculation Ligand-based techniques

Since a virtual screening model is just an approach or product of approximation to real screening, it contains imperfections. However, the model could be improved by realizing the challenges in the approximations, and could still be useful with some considerations. In line with this idea, this thesis is aimed to present an overview of challenges and considerations in the application of the virtual screening techniques on CYP450 ligands in the last 5 years.

7

1.2. Characteristics of CYP450s and their substrates Table 1.3 lists available crystal structures of CYP450s so far. They are: CYP450 1A2, 2C9, 2D6, and 3A4 structures (Stjernschantz et al., 2008). Evolutionary relationships between these CYP450s are described by a tree in Figure 1.3.

Table 1.3. Available crystal structures of human CYP450s (Stjernschantz et al., 2008)

Figure 1.3. Evolutionary relationships between several human CYP450s (Ingelman-Sunberg et al., 1999)

8

From Figure 1.3, we acknowledge that the four CYP450s are distantly-related to each other. The differences between them and between their substrates are summarized in Table 1.4.

Table 1.4. Characteristics of several CYP450s and their substrates in general (Lewis et al., 2002. Arimoto, 2006. de Groot et al., 2006.)

CYP 1A2 2C9 2D6 3A4

Relative volume of active site Small Medium Medium Large Acidic

General characteristics of substrates Planar, lipophilic, neutral or basic Lipophilic, neutral or basic Globular, lipophilic, neutral or basic

As shown in the table, there are differences of active site volume between the CYP450s; and also differences in their substrate characteristics. Since the active site of CYP450 is taken into account in protein-based virtual screening, its volume should have an impact to protein-based virtual screening, as described in the following chapters. Meanwhile, CYP450 substrate characteristics can be exploited for ligand-based virtual screening.

1.3. Validation of virtual screening model A virtual screening model should be validated to see if it gives correct predictions of compound affinities / activities. For this validation, a training dataset is provided, which consists of a small number of substrates or inhibitors with known affinity / activity data (termed: actives). A large number of non-substrates or non-inhibitors (termed: inactives) is added to this dataset. The model should retrieve as many actives and as few inactives in the dataset as possible. Results of this training are used to improve the model. The training is iterative until a final model is obtained. Then the final model is tested again on a test dataset which has a similar composition but different members (Triballeau et al., 2006). Results from this validation are classified into four groups: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) (Kirchmair et al., 2008) (Figure 1.4).

9

True positives are actives which are retrieved by the model. False positives are inactives which are retrieved by the model. True negatives are inactives which are not retrieved by the model. While false negatives are inactives which are not retrieved by the model. Retrieval or selection of a model may contain some true and false positives (TP + FP).

Figure 1.4. Validation of a virtual screening model (Kirchmair et al., 2008). FN = False Negatives. FP = False Positives. TN = True Negatives. TP = True Positives.

Using these groups, the quality of a model is then expressed with a metrics. The most popular metrics are: Sensitivity, Positive Precision, and Enrichment (Triballeau et al., 2006). Sensitivity equals TP/(TP+FN). It is a measurement of how well the model can retrieve active ligands in the training / test dataset. Positive Precision (sometimes is called: Hit Rate) equals TP/ (TP+FP) (For the rest of this report, it will be called “Precision” only). It is a measurement of how well the model can retrieve active ligands in a the training / test dataset in the presence of inactive ligands. Therefore, it reflects the selectivity of the model. Enrichment equals (TP/(TP+FP))/ ((TP+FN)/(TP+FP+TN+FN)). Enrichment indicates how many times the model works better than a random selection in retrieving actives. Throughout this thesis, these three metrics will be mentioned frequently. The quality of a virtual screening model is determined by the training and test datasets. Due to this, the affinity / activity data of both datasets should be consistent. Li and colleagues (2008) recommended that ligands in the datasets should have been tested in a uniform way (with the same assay procedure, in the same laboratory), since different assay procedures from different laboratories would result in different affinity / activity data.

10

1.4. Limitations of this thesis Before ending this chapter, the author would like to emphasize that this thesis is limited on the prediction of CYP450 ligand affinity / activity. It is not about the prediction of metabolite of a CYP450 substrate, although it will touch the issue of the substrate's SOM (site of metabolism) prediction. This thesis will also not discuss the issue of CYP450 allosteric binding site.

11

2 Docking of CYP450 ligands

With the availability of CYP450 crystal structures (Table 1.3), the binding event between a CYP450 and its ligand can be examined more carefully by molecular dynamics. In molecular dynamics, a protein is simulated to interact with its ligand in water in a systematic (time-connected) and flexible manner – thereby relevant conformations and orientations of the protein and the ligand are sampled adequately – afterwhich the standard Gibbs binding energy of the ligand (ΔG0bind) can be calculated (Leach, 2001). This energy corresponds to the affinity of the ligand (Ki) through Equation 2.1: ΔG0bind = 2.303 RT log Ki (Equation 2.1)

where R is the gas constant and T is the absolute temperature (Schneider et al., 2008). Therefore, the calculated standard Gibbs binding energy of a CYP450 ligand can be used to predict its affinity. Application of molecular dynamics for sampling CYP450 ligand conformations was exemplified recently by Vasanthanathan and colleagues (2010). For calculation of the ΔGbind, they employed an empirical method called LIE (Linear Interaction Energy) (For details of this method, refer to Vasanthanathan et al., 2010). This method gave them a ΔGbind root mean square error of 3.7 kJ/mol for 8 ligands in their training set. However, as they stated, molecular dynamics is still computationally expensive. Alternatively, a simplified technique of simulation called docking is used for virtual screening. In docking, the ligand's conformations and orientations (called: poses) are sampled with no time connections (Leach, 2001). A sum of all energies of interactions between the protein and the ligand (ΔGbind), called scoring function, is provided for every pose (Leach, 2001). It is used to rank the poses and ligands and predict the relative affinities of the ligands. Below is an example of a scoring function, called ChemScore, which is implemented in GOLD docking software (Equation 2.2). ΔGbind = a + b Shbond + c Smetal + d Slipo + e Hrot (Equation 2.2)

12

In this equation, Shbond, Smetal, Slipo, and Hrot are: total energy of Hydrogen bonds, total energy of ligand's interactions with metal (e.g. with Fe atom at heme in the active site of a CYP450), total energy of lipophilic interactions, and an additional score to represent the loss of conformational entropy of a ligand upon binding, respectively. Meanwhile, a-e are parameters which were obtained from a regression analysis of the energies and score against the experimental ΔGbind for a training set of protein-ligand complexes (Verdonk et al., 2003). The challenge in docking is to obtain correct docking pose(s) of a ligand, and to rank the ligand correctly to afford a good prediction of its relative affinity. For a CYP450 substrate, a docking pose is considered correct if the site of metabolism of the substrate is placed (arbitrarily) within 6 Å from the Fe atom at the heme cofactor in the CYP450's active site, with respects to amino acids in the active site (Kirton et al., 2005). In the following passages, we will discuss factors that influence the docking of CYP450 ligands.

2.1. Effect of CYP450 structure on docking of CYP450 ligands In 2008, Hritz and colleagues reported a docking study of CYP2D6 substrates using the apo(ligandless) crystal structure of CYP2D6 (PDB ID: 2F9Q). They realized that the crystal structure is too tight to accommodate known CYP2D6 substrates. Therefore, they relaxed the structure by customizing it to a known CYP2D6 substrate, (R)-propranolol; and performed a molecular dynamics simulation of the complex by considering the thermal motion of Phe483 of CYP2D6. This motion generally involved only small changes of CYP2D6's conformation. From the molecular dynamics simulation, they extracted 250 conformations of CYP2D6. Then, they docked 65 CYP2D6 substrates with known sites of metabolism into each of the conformations. They discovered that some of the CYP2D6 conformations gave significantly higher percentages of correct docking poses than the others, although the differences between those conformations were small (Figure 2.1). Due to this result, they recommended that the CYP2D6 structure for a docking study should be selected carefully (In this case, they recommended the CYP2D6 conformation that gave the highest percentage of correct docking pose in Figure 2.1).

13

Figure 2.1. Different frames (conformations) of CYP2D6 gave different percentages of correct binding poses of its substrates (Hritz et al., 2008). The highest percentage was given by frame 216 (marked by the longest vertical line).

Previously, Polgar and colleagues (2007) performed a docking study of CYP2C9 ligands using three crystal structures of CYP2C9 (PDB IDs: 1OG2, 1OG5 and 1R90). The 1OG2 structure is an apostructure; the 1OG5 structure is in complex with S-warfarin; while the 1R90 structure is in complex with flurbiproven. Polgar and colleagues aligned the 1OG5 structure with the 1R90 structure by homology sequence, then merged S-warfarin in the 1OG5 structure into the 1R90 structure. The latter was customized to accommodate S-warfarin as well. Afterwards, they docked a dataset of 5,411 compounds (containing 42 CYP2C9 ligands) into the three CYP2C9 structures. They discovered that the customized 1R90 structure gave higher enrichment at 1% of the dataset than the apo and 1OG5 structures (Figure 2.2). This result, together with the one from Hritz et al. (2008), suggest that flexibility of a CYP450 should be taken into account in docking; and that sometimes, a reasonable customization of a CYP450 structure is necessary to afford better results of docking of its ligands.

14

Figure 2.2. Enrichments from docking of 5,411 compounds (containing 42 CYP2C9 ligands) into the 1OG2 (apo), 1OG5, and 1R90 (customized) structures of CYP2C9 (Polgar et al., 2007). FlexX is the docking software. Gold (Goldscore), PMF, and Chem (ChemScore) are the scoring functions.

2.2. Effect of water molecules in CYP450's active site on docking of CYP450 ligands In biological situation, water molecules could be present in the active site of CYP450. As a proof, there are several water molecules trapped in the active site of the crystal structure of CYP2D6 (Rowland et al., 2006). Water molecules influence the docking of a CYP450 ligand in one of two ways (Santos et al., 2010). First, they prevent the ligand from occupying region far from the heme (Figure 2.3, left). Second, they make H-bonds with the ligand that orient the ligand into a correct or an incorrect pose (Figure 2.3, right). Santos and colleagues (2010) investigated the effect of inclusion of water molecules in docking of CYP2D6 substrates. For this purpose, they used the crystal structure of CYP2D6 (PDB ID: 2F9Q), and generated 8 conformations from it; and used MDEA (R-3,4-methylenedioxy-N-ethylamphetamine) to generate a set of most favorable water positions within the active site of CYP2D6. Then, they docked 11 MDEA-like substrates and 53 non-MDEA-like substrates into each CYP2D6 conformation. They discovered that inclusion of water molecules in the active site of 2F9Q improved the percentage of correctly docked MDEA-like substrates, but not non-MDEA-like substrates (Table 2.1). Based on this result, they recommended that water molecules should not be excluded completely in docking. 15

However, they also recommended that different set of water molecules should be used for different class of substrates.

Figure 2.3. Water molecules (balls) influence the docking of a CYP450 ligand (wires) in one of two ways: by preventing the ligand from occupying region far from the heme (green sticks at the bottom) (left), or by making H-bonds with the ligand (right) (Santos et al., 2010). Red wires represent the ligand when water is excluded, while blue wires represent the ligand when water is included.

Table 2.1. Percentage of correctly docked MDEA-like and non-MDEA-like substrates for different CYP2D6 conformations (Santos et al., 2010). “HOH OFF” means water is excluded, while “HOH toggle” means water is included, but allowed to be temporarily displaced by a ligand.

16

2.3. Effect of ligand restraining on docking of CYP450 ligands In docking, ligand conformational sampling is sometimes conducted in the area which is not so relevant to ligand binding, making the sampling inefficient. To improve the sampling efficiency, the ligand can be restrained to interact with important amino acids only. This was exemplified by Polgar and colleagues (2007) in the docking of CYP2C9 ligands. In CYP2C9, Arg108 is supposed to be crucial for binding (Ridderstrom et al., 2000. Dickmann et al., 2004). By restraining CYP2C9 ligands to interact with this amino acid, Polgar and colleagues obtained a high enrichment at 1% of their dataset, compared to almost zero enrichment for not restraining the ligands to the amino acid (Figure 2.4). Alternatively, ligand poses are filtered for their interactions with important amino acids before they are scored. Poses which have such interactions will be passed to the scoring. This filtering step can be performed with interaction fingerprints, as described in the report of Mpamhanga and colleagues (2006). However, there has been no report of the application of this method on the docking poses of CYP450 ligands.

Figure 2.4. Enrichments from docking of 5,411 compounds (containing 42 CYP2C9 ligands) into the 1R90 crystal structure of CYP2C9, with restraining (black and red lines) and without restraining (blue and green lines) to Arg108 of the CYP2C9 (Polgar et al., 2007). FlexX is the docking software and scoring function. Chem (ChemScore) is also a scoring function. “NO_Arg108” means that ligands were not restrained to interact with Arg108. 17

2.4. The issue of scoring function In CYP450, there is a lipophilic environment above the heme group for which some scoring functions perform poorly with their lipophilic term (e.g. Slipo in Equation 2.2) (Kirton et al., 2005). Additionally, the ligand-Fe interaction energy (e.g. Smetal in Equation 2.2) in a scoring function should be balanced with other terms for CYP450 ligands which could coordinate directly to the heme (Kirton et al., 2005). Reparameterization of these terms could improve the performance of the scoring function, as exemplified by Kirton and colleagues (2005) on ChemScore scoring function of GOLD docking program. Because scores of CYP450 ligands determine their ranks in virtual screening, improvement of the scoring function should result in the improvement of the virtual screening.

2.5. Summary In summary, docking can be used to screen CYP450 ligands based on their Gibbs binding energies (ΔGbind), when the CYP450 structure is available. The challenge in docking is to obtain correct docking pose(s) of a ligand, and to rank the ligand correctly to afford a good prediction of its relative affinity. To obtain correct docking pose of a CYP450 ligand, one should consider the effect of: CYP450 structure, inclusion of water molecules in CYP450's active site, and ligand restraining to important CYP450 amino acids. To rank the ligand correctly, one should reparameterize the scoring function to fit the chemical environment of CYP450s' active sites.

18

3 Shape-matching, pharmacophore-matching, and field calculation of CYP450 ligands

The bound state of a ligand, in which it exerts its affinity / activity, can be represented by the three-dimensional “lock and key” model. In this model, a ligand would bind to its protein if its functional groups and shape are complementary to the amino acids and shape of the protein's active / binding site, respectively (Motiejunas et al., 2006). Figure 3.1 illustrates these complementarities. The complementarities can be exploited for virtual screening. They serve as basis for several virtual screening techniques at the interface between protein-based and ligandbased virtual screening (Table 1.2), namely: shape-matching, pharmacophore-matching, and field calculation (Good, 2006. Vistoli et al., 2006).

Figure 3.1. Left: CYP1A2 (blue) in complex with 2-phenyl-4H-benzo(H)chrome-4-one (yellow) (Sansen et al., 2007). The phenylalanines (F226, F256, and F260) of CYP1A2 are complementary to the nearby aromatic ring of the ligand. Reddish sticks represent the heme, and the orange ball at the center of the heme represents the Fe atom. Lone red ball represents a water molecule. Right: Shape complementarity between CYP1A2's active site (represented by a Gaussian surface) and the same ligand (joined balls) (adapted from PDB file 2HI4 from Sansen et al., 2007). Same renderings from the left picture apply for the heme, Fe atom, and water molecule.

19

3.1. Shape-matching of CYP450 ligands Shape-matching can be done in two ways: ligand-based and protein-based. In ligand-based shape-matching, the shape of an active is used as a query (Ebalunode et al., 2010. Putta et al., 2007). While in protein-based shape-matching, the shape of a protein's active / binding site is abstracted to produce the so-called “negative image” of the active / binding site; then this “negative image” is used as the query (Ebalunode et al., 2010). By matching any of these two queries with the shape of compounds which are going to be screened, predictions can be made on the compounds' affinities / activities. If they match, the compounds might have similar affinities / activities with the ligand (the Similarity Principle) (Schneider et al., 2008). Figure 3.2 illustrates a ligand-based shape-matching algorithm which is implemented in ROCS (ROCS 3.0.0 Manual, 2009), the most popular software for shape-matching at the moment according to Laggner et al. (2008). In this algorithm, molecular shapes are represented by Gaussian functions. ROCS overlays Gaussians of a query with those of a molecule in dataset on their centers of mass, then optimizes the overlay to find the best match. The match is expressed with a score (e.g. the ShapeTanimoto score (Equation 3.1)), which ranges from 0 (for the most dissimilar ligand) to 1 (for an exact match).

Figure 3.2. Ligand-based shape-matching algorithm in ROCS (ROCS 3.0.0 Manual, 2009).

Shape = Tanimoto

Overlap Gaussians Gaussians of query - Gaussians of molecules Overlap Gaussians (Equation 3.1) 20

In ROCS, the shape match can also be calculated by considering only particular functional groups of the molecules (represented by Mills Dean force fields) (Figure 3.3), using another score called the ColorTanimoto score, which is similar to the ShapeTanimoto score. Combination of ColorTanimoto score with the aforementioned ShapeTanimoto score is also provided in ROCS, called the TanimotoCombo score, which ranges from 0 (for the most dissimilar ligand) to 2 (for an exact match). Freitas et al. (2010) discovered that consideration of functional groups in ligandbased shape-matching of CYP450 ligands helped to reduce false positives significantly. In ligand-based shape-matching, the query structure plays a crucial role. Different queries deliver different scores of matches, as discovered by Sykes and colleagues (2008) for CYP2C9 substrates (Table 3.1). This notion was confirmed by Freitas and colleagues (2010) for CYP2D6 substrates. These findings highlight the importance of choosing the right query for shape-matching, as recommended by Kirchmair and colleagues (2009).

Figure 3.3. Example of a shape model of functional groups, generated by ROCS (adapted from ROCS 3.0.0 Manual, 2009). “Donor” means H-bond donor group. “Acceptor” means H-bond acceptor group. “Hydrophobe” means hydrophobic group.

Table 3.1. ROCS Combo Scores for CYP2C9 substrates with different queries (Sykes et al., 2008). CYP2C9 substrate Amitriptyline Carvedilol Lansoprazole ROCS Combo Score with Fluoxetine query 1.161 1.251 1.218 ROCS Combo Score with Flurbiprofen query 0.855 0.911 1.028

21

Protein-based shape-matching has never been applied to screen CYP450 ligands. If such application is conducted, however, it would face the challenge from CYP450s' flexibilities. For example, CYP3A4 is known to be able to accommodate multiple ligands simultaneously (Ekroos et al., 2006. Kapelyukh et al., 2008). At such condition, the conformation of CYP3A4 may give a promiscuous query (The query would easily retrieve true and false positives). Schneider and colleagues (2008) suggested that protein-based shape-matching is successful only “if the binding site of the target is small and buried”.

3.2. Pharmacophore-matching of CYP450 ligands As mentioned at the beginning of this chapter, the complementarity between amino acids of a protein's active / binding site and functional groups of its ligand can be exploited for virtual screening. The functional groups can be: a positively-charged group, a negatively-charged one, an H-bond donor, an H-bond acceptor, an aromatic ring, or a hydrophobic one (Schneider et al., 2008). Once these groups are recognized from a ligand, a model of framework can be generated on them that identifies their types and relative positions from each other. This model is called pharmacophore. An example of a pharmacophore is depicted in Figure 3.4. By matching this model with the functional groups of the compounds which are going to be screened, predictions can be made on the compounds' affinities / activities. If they match, the compounds could have similar affinities / activities with the ligand (Schneider et al., 2008). There are four ways to recognize functional groups of a ligand which are involved in the protein-ligand binding, in order to generate a pharmacophore on them: (1) by a structureaffinity/activity relationship study of the ligand; (2) by alignment with an active (preferably rigid); (3) by a crystal structure of the ligand in complex with its protein; (4) by docking the ligand into its protein's active / binding site, with supports from experimental data (e.g. site-directed mutagenesis data) (Schneider et al., 2008. Locuson et al., 2005). In the last 5 years, pharmacophore-matching of CYP450 ligands has been reviewed by de Groot (2006) and reported by Schuster and colleagues (2006). Both de Groot (2006) and Schuster et al. (2006) warned about the issue of multiple binding modes of a CYP450 ligand (as illustrated in Figure 3.5) and concomitant occupation of one CYP450 active site by multiple ligands (Ekroos et al., 2006. Kapelyukh et al., 2008), which eventually lead to more than one pharmacophore. Due to these reasons, pharmacophore-matching might not be the best technique to reliably differentiate between actives and inactives for CYP450 family (Schuster et al., 2006). 22

Figure 3.4. Pharmacophore of CYP1A2 inhibitor, generated from sulconazole (sticks) (Schuster et al., 2006). Blue ball represents a hydrophobic group. Brown ball represents an aromatic ring. Green balls represents H-bond acceptors. Grey surface represents the shape of sulconazole.

Figure 3.5. Possible multiple binding mode of a ligand in CYP3A4 active site, marked by green and purple lines (Mao et al., 2006). Here, the ligand is BFC (7-benzyloxy-4-(trifluoromethyl)-coumarin.

3.3. Field calculation of CYP450 ligands While pharmacophore refers to the binding functional groups of a ligand or a protein's active / binding site, fields refer to the potential forces of the groups (Schneider et al., 2008). By taking fields into account, one could have a more realistic model compared to the shape model and pharmacophore. One prominent method for calculation of fields is CoMFA (Comparative Molecular Field Analysis) (Schneider et al., 2008). The method follows these steps (Schneider et al., 2008): 23

1. Ligands with known affinities / activities are aligned three-dimensionally to obtain their common binding modes (Figure 3.6). 2. A virtual cubic lattice is generated around the ligands, which consists of discrete vertical and horizontal layers (Figure 3.6). Then, a probe is placed at every intersection of the layers (lattice point), usually 1-2 Å away from each other. The probe can be: a positively or negatively charged atom (to represent ionic force), an H-bond donor or acceptor, a hydrophobic group, or an sp3 Carbon atom (to represent a steric force). Each probe is supposed to interact with the nearest atom(s) of the ligands, and the interaction force (field) between them is calculated as energy (Unlike docking, in which the interaction is evaluated between the ligand and its protein).

Figure 3.6. In CoMFA, a virtual cubic lattice is generated around ligands which have been aligned, in order to provide positions for probes that will interact with the ligands. Interaction energies between the probes and the ligands are calculated. The calculation results for all probes and ligands are stored in columns. (Schneider et al., 2008)

3. The whole calculated energies are correlated to the affinity / activity of the ligands using a linear correlation method (e.g. Partial Least Squares). The result is an equation like this:

Affinity / activity = c + Σ Σ aij Eij
i=1 j=1

L

P

(Equation 3.2) 24

where c is a constant; L is lattice point; P is probe; and the coefficients aij correspond to placing probe j at lattice point i yielding the energy value Eij. The weight between the coefficients reflects favorable field(s) over the others, which is useful for drug design. The quality of this equation is judged by correlation between the calculated and experimental affinities / activities, which is expressed with a squarred correlation coefficient (R2). The equation is considered good if its R2 approaches the value of 1.00. For a ligand whose affinity / activity is not known, the first two steps are applied. The resulting energy values can then be used to predict its affinity / activity through Equation 3.2. However, one should realize that the relationship between affinity / activity and the energies is assumed to be linear, while it is not necessarily so (Schneider et al., 2008). Clearly, this technique is dependent on the actives (training set) which are used to generate the fields. Peng and colleagues (2008) showed that for CYP2C9 inhibitors, two training sets with different numbers of ligands and different Ki ranges delivered different standard errors to Equation 3.2 (Table 3.2). Locuson and colleague (2005) suggested that inclusion of highly-active CYP450 ligands would improve Equation 3.2, because they represent strong interactions with CYP450.

Table 3.2. CoMFA statistics of CYP2C9 inhibitors in Peng et al. (2008) Training set Complete Chromenones only Ki range (nM) 1.0 – 48,000 4.2 – 6,740 Number of ligands 83 11 Standard error 0.822 0.243

CoMFA is also dependent on the alignment of the active ligands (Locuson et al., 2005). Since there could be more than one mode of alignment, especially for dissimilar and flexible ligands, the challenge is to get the correct alignment to obtain their common binding modes. In the case of CYP450 ligands, there are two ways to resolve this problem. One is to determine reference atoms or groups in the ligands for the alignment, which represent their characteristics in general. For example, for CYP2D6 substrates which mostly have a protonated amine group, Haji-Momenian and colleagues (2003) used the amine as the reference for their alignment. Additionally, they used the site of metabolism of each CYP2D6 substrate (based on 25

the same metabolism / reaction) as the reference. This method gave them a squared correlation coefficient (R2) of 0.62 for their test dataset. The second way is to involve the CYP450 structure for docking the ligands, after which the docking poses of the ligands are obtained that agree with site-directed mutagenesis and metabolism data. Then, these poses are aligned at hemes (Figure 3.7). The resulting alignment is then extracted from the active site for use in CoMFA. Such method was exemplified by Yasuo and colleagues (2009) for the alignment of CYP2C9 inhibitors, and it gave them a squared correlation coefficient (R2) of 0.941 for their dataset. Of course, when it comes to docking of CYP450 ligands, the challenges and considerations which are mentioned in Chapter 2 (CYP450 flexibilities, inclusion of water molecules, etc.) apply.

Figure 3.7. An alternative way to align CYP450 ligands for CoMFA: by alignment of their docking poses at heme (Yasuo et al., 2009)

Finally, there is also a method beyond CoMFA to generate fields without ligand alignment, by using conformers of ligands. As exemplified by Afzelius and colleagues (2004) with CYP2C9 ligands, 100 conformers were generated for each of the ligands, then all the conformers were used to generate fields, assuming that the true binding modes of the ligands are within those conformers (Figure 3.8). This method gave them an R2 of 0.8 for their dataset. Gunther and colleague (2006) confirmed that the number of conformers per ligand is sufficient to cover binding modes of a ligand.

26

4 1 2 3

Figure 3.8. Generation of fields from conformers (adapted from Afzelius et al., 2004). (1) 3R,5Sfluvastatin, a CYP2C9 inhibitor. (2) Fields generated from original structure. (3) Fields generated from a conformer of the ligand. (4) Fields generated from 100 conformers of the ligand.

3.4. Summary Shape-matching, pharmacophore-matching, and field calculation are virtual screening techniques based on the “lock and key” model of protein-ligand binding. Each has its own challenges and considerations. Protein-based shape-matching is successful only if the active site of the target is small and buried. Ligand-based shape-matching of CYP450 ligands requires careful choice of a query ligand. Pharmacophore-matching might not be suitable to screen CYP450 ligands, due to possible multiple binding modes of a CYP450 ligand, and possible concomitant occupation of one CYP450 active site by multiple ligands. Field calculation can be improved by inclusion of highly-active CYP450 ligands in the training set. Particularly in CoMFA (Comparative Molecular Field Analysis), correct alignment of CYP450 ligands should be considered. While the relationship between affinity / activity and the interaction energies in CoMFA is assumed to be linear, it is not necessarily so.

27

4 QSAR and classification of CYP450 ligands

4.1. QSAR of CYP450 ligands In the absence of a CYP450's structure, its ligands can be virtually screened by QSAR (Quantitative Structure-Activity Relationship). The idea of QSAR is to correlate affinities or activities of the ligands with their molecular descriptors, usually through a linear equation. Examples of such equation are presented below (Equation 4.1 – 4.3), which account for the inhibitory activities of 21 flavonoids on CYP1A2 (Roy et al., 2008).

- log IC50 = 3.48 – 0.09 3Ka – 0.21 3Xc – 0.49 3XcV + 1.32 S_dsCH + 0.17 S_aaCH – 0.20 S_dssC – 0.14 S_aasC - log IC50 = - 0.56 + 0.19 S_aaCH + 0.99 S_dsCH + 1.69 JX - log IC50 = - 0.17 + 1.5 JX + 1.10 S_dsCH + 0.19 S_aaCH (Equation 4.1) (Equation 4.2) (Equation 4.3)

In these examples, the inhibitory activities of the flavonoids are correlated with their descriptors: 3Ka (a kappa shape index); 3Xc and 3XcV (connectivity indices); S_dssC, S_aasC (electrotopological state parameters) (in Equation 4.1), S_dsCH, S_aaCH (electrotopological state parameters), and JX (Balaban J topological parameter) (in Equation 4.1 – 4.3) (For details about these descriptors, refer to: Todeschini et al., 2000). These descriptors can be calculated for a compound whose IC50 is not known, leading to its predicted IC50 through any of the above equations The three equations gave squared correlation coefficients (R2) of: 0.745; 0.801; and 0.840; respectively. These data suggest that different types of descriptors deliver different statistical qualities of correlation. And apparently, more descriptors do not always bring to better quality, as indicated between Equation 1 and 2 or 3. Therefore, the challenge in QSAR of CYP450 ligands is to find the types and numbers of descriptors that would deliver the highest quality of correlation.

28

In the report of Roy and colleagues above (2008), the descriptors were selected automatically by different algorithms. Descriptors in Equation 1 were selected by PLS (Partial Least Squares) algorithm; and descriptors in Equation 2 were selected by GFA (Genetic Function Approximation) algorithm; while descriptors in Equation 3 were selected by a combination of both algorithms called G/PLS (Genetic Partial Least Squares) (For details about these algorithms, refer to Roy et al., 2008). These algorithms proved to give high qualities of correlation in the case of CYP1A2 ligands. However, they do not guarantee to select physico-chemically meaningful descriptors (Li et al., 2007. Li et al., 2008). CYP450s and their substrates have some general characteristics which are summarized in Table 1.4. These characteristics can be described by physico-chemically meaningful types of descriptors such as: size, shape, electrostatic, H-bond donor, H-bond acceptor, and hydrophobic descriptors. It is not surprising that several QSAR studies of CYP450 ligands in the last 5 years eventually come to one of these types of descriptors (Table 4.1). With more understanding of CYP450s' active sites, the search for their ligands' descriptors could be directed to those which are relevant to the active sites and physico-chemically meaningful.

Table 4.1. The uses of several types of descriptors in QSAR studies of CYP450 ligands in the last 5 years Type of descriptors CYP Size Shape Electrostatic H-bond H-bond Hydrophobic donor acceptor 1A2 2C9 2D6 3A4 √ √ √ √ √ √ √ Reference Appiah-Opong et al., 2008 Iori et al., 2005 Roy et al., 2008 Appiah-Opong et al., 2008 Appiah-Opong et al., 2008 Ringsted et al., 2009 Chuman, 2008

Beside of the descriptors, the quality of a QSAR equation is also dependent on the training dataset which is used to generate it. Li and colleagues (2008) recommended that the training dataset should have a sufficient diversity, which can be expressed by a diversity index (DI). An example of a diversity index is given in Equation 4.4. 29

(Equation 4.4) Here, div(A) is the diversity index of A dataset; diss(i,j) is dissimilarity between ligand i and j in the dataset; and N is the number of ligands in the dataset (Perez, 2005). Dissimilarity is simply defined as (1 – similarity), and the similarity can be calculated by a similarity index (e.g. Tanimoto index). As mentioned above, a QSAR equation is usually linear. However, one should realize that the structure-activity relationships of CYP450 ligands are not necessarily so. In the case of CYP3A4, which has more than one ligand binding site (Ekroos et al., 2006. Kapelyukh et al., 2008), its ligands would have more than one common binding mode. Hence, their affinities cannot be represented by one linear-QSAR equation only (Mao et al., 2006). In other words, their structure-activity relationships are not linear. For non-linear structure-activity relationships, machine learning techniques are more suitable, which will be discussed next.

4.2. Classification of CYP450 ligands by machine learning While descriptors are used to make an equation of correlation in QSAR, they can also be used to classify ligands by machine learning. Unlike QSAR, classification is a qualitative prediction of relative affinities / activities of ligands. Table 4.2 lists several machine learning techniques which have been applied on CYP450 ligands in the last 5 years.

Table 4.2. Classification techniques applied on CYP450 ligands in the last 5 years Classification technique 1A2 Decision tree √ √ Hierarchical clustering √ Applied on CYP 2C9 √ √ 2D6 √ √ √ 3A4 √ √ √ √ Choi et al., 2009 Vasanthanathan et al., 2009 Burton et al., 2006 Hudelson et al., 2006 Meslamani et al., 2009 Yamashita et al., 2008 Reference

30

k-nearest neighbour Neural networks Principal Component Analysis (PCA) Recursive partitioning Self-Organizing Maps (SOM) Support Vector Machine (SVM)

√ √ √ √ √ √ -

√ √ √ √ √ √ √ √

√ √ √ √ √ √ √ √ √

√ √ √ √ √ √ √ √

Jensen et al., 2007 Bazeley et al. 2006 Chohan et al., 2005 Nath et al., 2008 Fukunishi et al., 2006 Burton et al., 2009 Hudelson et al., 2008 Veith et al., 2009 Vasanthanathan et al., 2009 Michielan et al., 2009 Eitrich et al., 2007 Terfloth et al., 2007 Koike, 2006 Arimoto et al., 2005 Yap et al., 2005

This chapter will discuss only two most applied machine learning techniques in Table 4.2, namely: Support Vector Machine (SVM) and decision tree.

4.2.1. Classification of CYP450 ligands with Support Vector Machine (SVM) Figure 4.1 illustrates how SVM works. Suppose we have a training set that contains some CYP450 actives (green) and inactives (red). Plotting the two groups by their descriptors two dimensionally results in a non-linearity, which makes them difficult to correlate (In QSAR, some members of these groups would be treated as outliers, which could be excluded to provide a better correlation). Projection of these groups into another dimension by a function (called: kernel (κ)) would position the groups completely separated from each other, that a hyperplane could be inserted between them (Schneider et al., 2008). The best separating hyperplane is the one evenly far away from both groups. The closest actives and inactives to the hyperplane are called support vectors (from which the name of this classification technique came); and their equal distances to the hyperplane are called margins. Each group then get its attribute according to its relative position to the hyperplane.

31

descriptor A

projection

descriptor B

Figure 4.1. SVM works by projecting the two groups (green and red ones) into another dimension, to afford a complete separation of them with a hyperplane (adapted from van Looy et al., 2007). The hyperplane function can be searched with some mathematical and computational efforts. Once this hyperplane is found, the projection can be used as a model to classify CYP450 ligands. Compounds whose affinities / activities against CYP450 are not known can be subjected to this projection to acknowledge which atributes they get: whether the attributes are similar to those of the actives or those of the inactives. Based on the results of this projection, predictions can be made qualitatively for the affinities / activities of the compounds. Obviously, the discriminating power of SVM lies on the kernel which is implemented. Eitrich and colleagues (2007) exemplified how different kernels delivered different Hit Rates (Table 4.3). This finding highlights the importance of choosing the best kernel. Table 4.3. Effect of different kernels of SVM on the virtual screening of CYP2D6 ligands (adapted from Eitrich et al., 2007). Each dataset contains 13 CYP2D6 inhibitors. Numbers in brackets are numbers of descriptors assigned to the dataset. For details of the kernels, refer to Eitrich et al. (2007). Kernel 1a (5) Gaussian Slater 0.85 0.85 1b (10) 0.77 0.62 1c (20) 0.62 0.46 Hit Rate for Dataset 1d (557) 0.77 0.92 1e (5) 0.69 0.69 1f (10) 0.69 0.54 1g (20) 0.38 0.38 1h (557) 0.69 0.62 32

In Figure 4.1, only 2 descriptors were used. Actually, SVM can accommodate hundreds of descriptors at the same time, making its projection unimaginable. However, at some point, the increase of descriptor numbers does not add to Specificity significantly anymore, as discovered by Yap and colleagues (2005) in the SVM application on CYP3A4 substrates (Table 4.4). On the other hand, the more descriptors used, the more difficult it is to acknowledge their contributions to the separation. Therefore, one should consider using as few physico-chemically meaningful descriptors as possible in SVM.

Table 4.4. Effect of number of descriptors to Sensitivity and Specificity in the SVM application on CYP3A4 substrates (Yap et al., 2005). Numbers in brackets are standard deviations. The descriptors were selected automatically by Genetic Algorithm (GA) from 1,497 descriptors available in DRAGON Web 3.0 software. Beyond 400 descriptors, the increase of descriptor numbers did not add to Specificity significantly anymore.

In the above report, Yap and colleagues (2005) applied SVM on CYP2C9, CYP2D6, and CYP3A4 substrates and inhibitors. They discovered that for the classifications of these CYP450 susbtrates and inhibitors, shape and electrostatic types of descriptors were the mostly selected, leading to high Hit Rates (Table 4.5). This result suggested that shape and electrostatic types of descriptors are relevant to the characteristics of CYP450s and their substrates (Table 1.4).

33

Table 4.5. Contributions of descriptors to Hit Rates in SVM applications on CYP2C9, CYP2D6, and CYP3A4 ligands (adapted from Yap et al., 2005). S = substrates; nS = non-substrates; I = inhibitors; nI = non-inhibitors. Numbers in brackets are standard deviations. Highest percentages of descriptors are presented in blue.

Percentage of selected types of descriptors CYP Dataset Size Shape Electrostatic H-bond donor H-bond Hydrophobic acceptor Hit Rate

2C9 2D6 3A4

S and nS I and nS S and nS I and nS S and nS I and nS

6.8 7.1 6.3 7.5 7.5 7.1

58.2 56.8 59.7 57.1 57.2 56.8

19.1 20.4 18.9 20.5 21.0 20.4

3.0 3.3 3.5 2.5 1.9 3.3

3.5 3.6 3.1 2.4 2.8 3.6

9.4 8.8 8.5 8.8 9.5 8.8

99.2 (0.9) 97.3 (1.3) 96.9 (1.5) 96.7 (1.6) 85.2 (3.0) 97.9 (1.5)

Like QSAR, an SVM model is dependent on the quality of the training dataset. Therefore, the same considerations about training dataset in QSAR (diversity and uniformity of assay) apply here as well. It might be temptating to use SVM to address substrate selectivities between CYP450s. However, one should realize that a compound could be metabolized by more than one CYP450 (Michielan et al., 2009); while SVM – in contrast – offers a complete distinction between substrates of different CYP450s. Therefore, the application of SVM to address substrate specificities between CYP450s is not recommended.

4.2.2. Classification of CYP450 ligands with decision tree A decision tree splits a dataset into “leaves” based on thresholds of descriptors in series (Rose, 2003). The types and number of descriptors, their thresholds, the number of resulted leaves, and the next leaves to split can be decided by a user or automatically by a program. The results serve as qualitative predictions of the affinities / activities of the compounds in the dataset. Figure 4.1 presents examples of decision tree which were set up to separate CYP1A2 inhibitors from CYP1A2 non-inhibitors.

34

Here, too, the understandings of a CYP450's characteristics would help to decide which and how many descriptors to use, so that the tree could be built on as few physico-chemically meaningful descriptors as possible. The threshold of each descriptor could be optimized with a training dataset; therefore, a sufficiently diverse training dataset should be provided.

Figure 4.2. Decision trees for CYP1A2 ligands, from Vasanthanathan et al. (2009) (left) and Burton et al. (2006) (right). On the left tree, the numbers are thresholds. On the right tree, SMR_VSA6, SlogP_VSA7, SlogP_VSA9, and PEOE_VSA+4 are descriptors. Inhibitors are symbolized by “+”, and non-inhibitors are symbolized by “-”. Numbers in brackets are true and false positives respectively.

Unlike SVM, a decision tree could produce multiple classes from a dataset. Therefore, this method is useful to address substrate selectivities between CYP450s qualitatively. More than 5 years ago, Lewis (2003) used a decision tree with 5 descriptors (volume, pKa, a/d2 (area/depth2), ELUMO, and log P) to discriminate CYP450 substrates (Figure 4.3). The tree gave an overall correlation of 94%. Of the five descriptors, four (volume, pKa, a/d2, and log P) confirm the characteristics of CYP450s and their substrates in Table 1.4.

35

Figure 4.4. A decision tree to classify CYP450 substrates (Lewis, 2003)

4.3. Summary In the absence of a CYP450's structure, its ligands can be virtually screened by QSAR (Quantitative Structure-Activity Relationship) or machine learning. Both techniques utilize molecular descriptors. In QSAR, the descriptors are correlated to affinity / activity; while in machine learning, they are used to classify ligands. The challenge in QSAR and machine learning is to find the types and numbers of descriptors that would deliver the best correlation or classification, respectively. Physico-chemically meaningful types of descriptors like: shape, electrostatic, H-bond acceptor, and hydrophobic descriptors have proven to be useful in QSAR or machine learning. Support Vector Machine (SVM) is a machine learning that works by projecting two groups in a dataset into another dimension to afford a complete separation of them. Because SVM offers such a complete distinction between two groups, it is not suitable to address selectivity of a substrate which is metabolized by more than one CYP450. Decision tree can split a dataset of CYP450 substrates into multiple classes, so it is more suitable to address the selectivity issue.

36

Chapter 5 Conclusions and Perspectives

The author have presented an overview of the applications of six techniques (docking, shape-matching, pharmacophore-matching, field calculation, QSAR, and machine learning) for virtual screening of CYP450 ligands in the last five years, with focus on challenges and considerations in the application. Throughout this thesis, the challenges and considerations are described as consequences of the chemical natures of the relevant CYP450 isoforms. Flexibility of a CYP450 has an effect on docking pose of its ligands and also the success of protein-based shapematching; and the environment of its active site should be considered for improvement of scoring function, and for selection of descriptors in QSAR and machine learning. Special attention is given to CYP3A4, since this CYP450 has the largest volume of active site. The capability of CYP3A4 to bind multiple ligands implies that protein-based shape model (“negative image”) of this CYP450 is too promiscuous for screening its ligands; and that there can be more than one model of pharmacophore or QSAR for its ligands (non-linear structure-activity relationships). Training and test dataset are other issues to consider in virtual screening, since they determines the quality of validation. As mentioned earlier, ligands in a training and test dataset should be sufficiently diverse, and should have been tested in a uniform way (with the same assay procedure, in the same laboratory). The question remains is: in what order should the virtual screening techniques be applied on drug candidates. To answer this question, the author would like to suggest a hierarchical virtual screening (Schneider et al., 2008). The techniques can be applied in sequence, from machine learning (ligand-based technique) to docking (protein-based technique). Machine learning is chosen for the start because this technique does not involve a CYP450 structure, so its computational cost is expected to be lower than protein-based techniques. Particular machine learning technique like decision tree can classify the drug candidates into multiple classes of CYP450 ligands, while SVM (Support Vector Machine) can deal with non-linear structure-activity relationships which are encountered for CYP3A4 ligands. When the number of the drug candidates left are small, they can be analyzed further by QSAR and protein-based techniques. Protein-based techniques are supposed to offer more accuracies since they involve CYP450 structure in generating their models. With this hierarchical way of virtual screening, a balance between computational cost and accuracy can be provided. 37

Acknowledgements

This literature thesis is presented as part of "Drug Discovery and Safety" master program at the Department of Chemistry & Pharmaceutical Sciences, Faculty of Sciences – Vrije Universiteit, the Netherlands. The author expresses his gratitudes to dr. Daan P. Geerke and Prof. dr. Nico P.E. Vermeulen for their kind supervisions. The Molecular Toxicology Division, the Department of Chemistry & Pharmaceutical Sciences, and the Faculty of Sciences of Vrije Universiteit is appreciated for all the facilities which were utilized for making this thesis. ■

38

References

Afzelius, L.; Zamora, I.; Masimirembwa, C.M.; Karlen, A.; Andersson, T.B.; Mecucci, S.; Baroni, M.; Cruciani, G. 2004. Conformer- and alignment-independent model for predicting structurally diverse competitive CYP2C9 inhibitors. J. Med. Chem., 47, 907-914. Appiah-Opong, R.; de Esch, I.; Commandeur, J.N.M.; Andarini, M.; and Vermeulen, N.P.E. 2008. Structure-activity relationships for the inhibition of recombinant human Cytochrome P450 by curcumin analogues. European Journal of Medicinal Chemistry, 43, 1621-1631. Arimoto, R. 2006. Computational models for predicting interactions with Cytochrome P450 enzyme. Curr. Top. Med. Chem., 6, 1609-1618. Arimoto, R.; Prasad, M.-A.; and Gifford, E.M. 2005. Development of CYP3A4 inhibition models: Comparisons of Machine-Learning techniques and molecular descriptors. Journal of Biomolecular Screening, 10, 197-205. Bazeley, P.S.; Prothivi, S.; Struble, C.A.; Povinelli, R.J.; and Sem, D.S. 2006. Synergistic use of compound properties and docking scores in neural network modeling of CYP2D6 binding: Predicted affinity and conformational sampling. J. Chem. Inf. Model., 46, 2698-2708. Boelsterli, U.A. 2009. Mechanistic toxicology – The molecular basis of how chemicals disrupt biological targets (2nd ed.). Informa Healthcare. Burton, J.; Danloy, E., and Vercauteren, D.P. 2009. Fragment-based prediction of Cytochrome P450 2D6 and 1A2 inhibition by recursive partitioning. SAR and QSAR in Environmental Research, 20(3), 185-205. Burton, J.; Ijjaali, I.; Barberan, O.; Petitet, F.; Vercauteren, D.P.; and Michel, A. 2006. Recursive partitioning for the prediction of Cytochrome P450 2D6 and 1A2 inhibition: Importance of the Quality of the Dataset. J. Med. Chem., 49, 6231-6240. Chohan, K.K.; Paine, S.W.; Mistry, J.; Barton, P.; and Davis, A.M. 2005. A rapid computational filter for Cytochrome P450 1A2 inhibition potential of compound libraries. J. Med. Chem., 48, 5154-5161. Choi, I.; Kim, S.Y.; Kim, H.; Kang, N.S.; Bae, M.A.; Yoo, S.-E.; Jung, J.; and No, K.T. 2009. Classification models for CYP450 3A4 inhibitors and non-inhibitors. European Journal of Medicinal Chemistry, 44, 2354-2360. Chuman, H. 2008. Toward basic understanding of the partition coefficient log P and its application in QSAR. SAR and QSAR in Environmental Research, 19(1), 71-79.

39

de Groot, M.J. 2006. Designing better drugs: Predicting Cytochrome P450 metabolism. Drug Discovery Today, 11(13), 601-606. de Groot, M.J.; Lewis, D.F.V.; and Modi, S. Molecular modeling and Quantitative Structure– Activity Relationship of substrates and inhibitors of drug metabolism enzymes. In: Taylor, J.B. and Triggle, D.J. (Editors). 2006. Comprehensive medicinal chemistry II volume 5: ADME-Toz approaches. Elsevier, Ltd. Dickmann, L.J.; Locuson, C.W.; Jones, J.P.; and Rettie, A.E. 2004. Differential roles of Arg97, Asp293, and Arg108 in enzyme stability and substrate specificity of CYP2C9. Mol. Pharmacol., 65, 842. Ebalunode, J.O. And Zheng, W. 2010. Molecular shape technologies in drug discovery: Methods and applications. Curr. Top. Med. Chem., 10, 669-679. Eitrich, T.; Kless, A.; Druska, C.; Meyer, W.; and Grotendorst, J. 2007. Classification of highly unbalanced CYP450 data of drugs using cost sensitive Machine Learning techniques. J. Chem. Inf. Model., 47, 92-103. Ekroos, M. and Sjogren, T. 2006. Structural basis for ligand promisquity in Cytochrome P450 3A4. Proc. Natl. Acad. Sci., 103(37), 13682-13687. Fukunishi, Y.; Hojo, S.; and Nakamura, H. 2006. An efficient in-silico screening method based on the protein-compound affinity matrix and its application to the design of a focused library for Cytochrome P450 (CYP) ligands. J. Chem. Inf. Model., 46, 2610-2622. Freitas, R.F.; Bauab, R.L.; and Montanari, C.A. 2010. Novel application of 2D and 3D-similarity searches to identify substrates among Cytochrome P450 2C9, 2D6, and 3A4. J. Chem. Inf. Model., 50, 97-109. Good, A. Virtual screening. In: Taylor, J.B. and Triggle, D.J. (Editors). 2006. Comprehensive medicinal chemistry II volume 4: Computer-assisted drug design. Elsevier. Gunther, S.; Senger, C.; Michalsky, E.; Goede, A.; and Preissner, R. 2006. Representation of targetbound drugs by computed conformers: Implications for conformational libraries. BMC Bioinformatics, 7(293). Haji-Momenian, S.; Rieger, J.M.; Macdonald, T.L.; and Brown, M.L. 2003. Comparative molecular field analysis and QSAR on substrates binding to Cytochrome P450 2D6. Bioorg. Med. Chem., 11, 5545-5554. Hritz, J.; de Ruiter, A.; and Oostenbrink, C. 2008. Impact of plasticity and flexibility on docking results for Cytochrome P450 2D6: A combined approach of molecular dynamics and ligand docking. J. Med. Chem., 51, 7469-7477. Hudelson, M.G.; Ketkar, N.S.; Holder, L.B. Carlson, T.J.; Peng, C.-C. 2008. High confidence 40

predictions of drug-drug interactions: Predicting affinities for Cytochrome P450 2C9 with multiple computational methods. J. Med. Chem., 51, 648-654. Hudelson, M.G. And Jones, J.P. 2006. Line-walking method for predicting the inhibition of P450 drug metabolism. J. Med. Chem., 49, 4367-4373. Ingelman-Sundberg, M.; Oscarson, M.; McLellan, R.A. 1999. Polymorphic human Cytochrome P450 enzymes: An opportunity for individualized drug treatment. Trends Pharmacol. Sci., 20(8), 342-349. Iori, F.; da Fonseca, R.; Ramos, M.J.; and Menziani, M.C. 2005. Theoretical Quantitative Structure Activity Relationships of flavone ligands interacting with Cytochrome P450 1A1 and 1A2 isozymes. Bioorganic and Medicinal Chemistry, 13, 4366-4374. Jensen, B.F.; Vind, C.; Padkjaer, S.B.; Brockhoff, P.B.; and Refsgaard, H.H.F. 2007. In-silico prediction of Cytochrome P450 2D6 and 3A4 inhibition using Gaussian kernel weighted knearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. J. Med. Chem., 50, 501-511. Kapelyukh, Y.; Paine, M.J.I.; Marechal, J.-D.; Sutcliffe, M.J.; Wolf, C.R.; and Roberts, G.C.K. 2008. Multiple substrate binding by Cytochrome P450 3A4: Estimation of the number of bound substrate molecules. Drug Metabolism and Disposition, 36(10), 2136-2144. Kirchmair, J.; Distinto, S.; Markt, P.; Schuster, D.; Spitzer, G.M.; Liedl, K.R.; and Wolber, G. 2009. How to optimize shape-based virtual screening: Choosing the right query and including chemical information. J. Chem. Inf. Model., 49(3), 678-692. Kirchmair, J.; Markt, P.; Distinto, S.; Wolber, G.; and Langer, T. 2008. Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection — What can we learn from earlier mistakes? J. Comput. Aided. Mol. Des., 22, 213–228. Kirton, S.B.; Murray, C.W.; Verdonk, M.L.; and Taylor, R.D. 2005. Prediction of binding modes for ligands in the Cytochrome P450 and other heme-containing proteins. PROTEINS: Structure, Function, and Bioinformatics, 58, 836–844. Koike, A. 2006. Comparison of methods for chemical-compound affinity prediction. SAR and QSAR in Environmental Research, 17(5), 497-514. Korhonen, L.E.; Rahnasto, M.; Mahonen, N.J; Wittekindt, C.; Poso, A.; Juvonen, R.O.; and Raunio, H. 2005. Predictive three-dimensional Quantitative Structure-Activity Relationship of Cytochrome P450 1A2 inhibitors. J. Med. Chem., 48, 3808-3815. Laggner, C.; Wolber, G.; Kirchmair, J.; Schuster, D.; and Langer, T. Pharmacophore-based virtual screening in drug discovery. In: Varnek, A. and Tropsha, A. (Editors). 2008. 41

Chemoinformatics

approaches to virtual screening. Royal Society of Chemistry.

Leach, A.R. 2001. Molecular modeling: Principles and applications (2nd ed.). Prentice Hall. Lewis, D.F.V. 2003. Quantitative Structure-Activity Relationships (QSARs) within the Cytochrome P450 system: QSARs describing substrate binding, inhibition, and induction of P450s. Inflammopharmacology, 11(1), 43-73. Lewis, D.F.V. and Dickins, M. 2002. Substrate SARs in human P450s. Drug Discovery Today, 7(17), 918-925. Li, H.; Sun, J.; Fan, X.; Sui, X.; Zhang, L.; Wang, Y.; and He, Z. 2008. Considerations and recent advances in QSAR models for Cytochrome P450-mediated drug metabolism prediction. J. Comput. Aided. Mol. Des., 22, 843-855. Li, H.; Yap, C.W.; Ung, C.Y.; Xue, Y.; Li, Z.R.; Han, L.Y.; Lin, H.H.; and Chen, Y.Z. 2007. Machine-Learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. Journal of Pharmaceutical Sciences, 96(11), 2838-2860. Lin, J.H. and Lu, A.Y. 1998. Inhibition and induction of Cytochrome P450 and the clinical implications. Clin. Pharmacokinet., 35, 361-390. Locuson, C.W. And Wahlstrom, J.L. 2005. Three-dimensional Quantitative Structure-Activity Relationship analysis of Cytochrome P450: Effect of incorporating higher-affinity ligands and potential new applications. Drug Metabolism and Disposition, 33(7), 873-878. Mao, B.; Gozalbes, R.; Barbosa, F.; Migcon, J.; Merrick, S.; Kamm, E.; Wong, E.; Costales, C.; Shi, W.; Wu, C.; and Froloff, N. 2006. QSAR modeling of in-vitro inhibition of Cytochrome P450 3A4. J. Chem. Inf. Model., 46, 2125-2134. Meslamani, J.E.; Andre, F.; and Petitjean, M. 2009. Assessing the geometric diversity of Cytochrome P450 ligand conformers by hierarchical clustering with a stop criterion. J. Chem. Inf. Model., 49, 330-337. Michielan, L.; Terfloth, L.; Gasteiger, J.; and Moro, S. 2009. Comparison of multilabel and single label classification applied to the prediction of the isoform specificity of Cytochrome P450 substrates. J. Chem. Inf. Model., 49, 2588-2605. Motiejunas, D. and Wade, R.C. Structural, energetic, and dynamic aspects of ligand-receptor interactions. In: Taylor, J.B. and Triggle, D.J. (Editors). 2006. Comprehensive medicinal chemistry II volume 4: Computer-assisted drug design. Elsevier. Mpamhanga, C.P.; Chen, B. McLay, I.M.; Willet, P. 2006. Knowledge-based interaction fingerprint scoring: A simple method for improving the effectiveness of fast scoring functions. J. Chem. Inf. Model., 46, 686-698. Nath, A. and Atkins, W. 2008. Principal Component Analysis of CYP2C9 and CYP3A4 probe 42

substrate/inhibitor panels. Drug Metabolism and Disposition, 36(11), 2151-2155. Peng, C.C.; Rushmore, T.; Crouch, G.J.; and Jones, J.P. 2008. Modeling and synthesis of novel tight-binding inhibitors of Cytochrome P450 2C9. Bioorg. Med. Chem., 16, 4064-4074. Perez, J.J. 2005. Managing molecular diversity. Chemical Society Reviews, 34, 143-152. Polgar, T.; Menyhard, D.K.; and Keseru, G.M. 2007. Effective virtual screening protocol for CYP2C9 ligands using a screening site constructed from flurbiproven and S-warfarin pockets. J. Comput. Aided Mol. Des., 21, 539-548. Putta, S. and Beroza, P. 2007. Shapes of things: Computer modeling of molecular shape in drug discovery. Curr. Top. Med. Chem., 7, 1514-1524. Ridderstrom M.; Masimirembwa, C.; Trump-Kallmeyer, S.; Ahlefelt, M.; Otter, C.; and Andersson, T.B. 2000. Arginines 97 and 108 in CYP2C9 are important determinants of the catalytic function. Biochem. Biophys. Res. Commun., 270, 983. Ringsted, T.; Nikolov, N.; Jensen, G.E.; Wedebye, E.B.; and Niemela, J. 2009. QSAR models for P450 (2D6) substrate activity. SAR and QSAR in Environmental Research, 20(3), 309-325. Rock, D.; Wahlstrom, J.; and Wienkers, L. Cytochrome P450s: Drug-drug interactions. In: Vaz, R.J. and Klabunde, T. (Editors). 2008. Antitargets. WILEY-VCH Verlag GmbH & Co. KgaA, Germany. ROCS 3.0.0 Manual. Http://www.eyesopen.com/docs/rocs/3.0.0/pdf/ROCS.pdf, accessed on May 2010. Rose, J.R. Machine Learning techniques in chemistry. In: Gasteiger, J. (Ed.). 2003. Handbook of Chemoinformatics. Wiley-VCH. Rowland, P.; Blaney, F.E.; Smyth, M.G.; Jones, J.J.; Leydon, V.R.; Oxbrow, A.K.; Lewis, C.J.; Tennant, M.G.; Modi, S.; Eggleston, D.S.; Chenery, R.J.; Bridges, A.M. 2006. Crystal structure of human cytochrome P450 2D6. J.Biol.Chem., 281, 7614-7622. Roy, K. and Roy, P.P. 2008. Comparative QSAR studies of CYP1A2 inhibitor flavonoids using 2 and 3D descriptors. Chem. Biol. Drug Des., 72, 370-382. Sansen, S.; Yano, J.K.; Reynald, R.L.; Schoch, G.A.; Griffin, K.J.; Stout, C.D.; and Johnson, E.F. 2007. Adaptations for the oxidation of polycyclic aromatic hydrocarbons exhibited by the structure of human P450 1A2. J. Biol. Chem., 282, 14348-14355. Santos, R.; Hritz, J.; and Oostenbrink, C. 2010. Role of water in molecular docking simulations of Cytochrome P450 2D6. J. Chem. Inf. Model., 50, 146-154. Schneider, G. and Baringhaus, K.-H. 2008. Molecular design: Concepts and applications. WILEY VCH Verlag GmbH & Co. KgaA, Germany. J

43

Stjernschantz, E.; Vermeulen, N.P.E.; and Oostenbrink, C. 2008. Computational prediction of drug binding and rasionalisation of selectivity towards Cytochrome P450. Expert Opin. Drug Metab. Toxicol., 4(5), 513-527. Schuster, D.; Laggner, C.; Steindl, T.M.; and Langer, T. 2006. Development and validation of an in silico P450 profiler based on pharmacophore models. Current Drug Discovery Technologies, 3, 1-48. Sykes, M.J.; McKinnon, R.A.; and Miners, J.O. 2008. Prediction of metabolism by Cytochrome P450 2C9: Alignment and docking studies of a validated database of substrates. J. Med. Chem., 51, 780-791. Todeschini, R. and Consonni, V. 2000. Handbook of molecular descriptors. Wiley-VCH. Triballeau, N.; Bertrand, H.-O.; and Acher, F. Are You Sure You Have a Good Model? In: Langer, T. and Hoffmann, R.D. (Editors). 2006. Pharmacophores and Pharmacophore Searches. WILEY-VCH Verlag GmbH & Co. KgaA, Germany. Van Looy, S.; Verplancke, T.; Benoit, D.; Hoste, E.; van Maele, G.; de Turck, F.; Decruyenaere, J. 2007. A novel approach for prediction of tacrolimus blood concentration in liver transplantation patients in the intensive care unit through support vector regression. Critical Care, 11(R83). Vasanthanathan, P.; Olsen, L.; Jorgensen, F.S.; Vermeulen, N.P.E.; and Oostenbrink, C. 2010. Computational prediction of binding affinity for CYP1A2-ligand complexes using empirical free energy calculations. Drug Metabolism and Disposition, 38(7) (E-pub ahead of print). Vasanthanathan, P.; Taboureau, O.; Oostenbrink, C.; Vermeulen, N.P.E.; Olsen, L.; and Jorgensen, F.S. 2009. Classification of Cytochrome P450 1A2 inhibitors and noninhibitors by Machine Learning techniques. Drug Metabolism and Disposition, 37(3), 658-664. Veith, H.; Southall, N.; Huang, R.; James, T.; Fayne, D.; Artemenko, N.; Shen, M.; Inglese, J.; Austin, C.P.; Lloyd, D.G.; Auld, D.S. 2009. Comprehensive characterization of Cytochrome P450 isozyme selectivity across chemical libraries. Nature Biotechnology, 27(11), 10501057. Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; and Taylor, R.D. 2003. Improved protein-ligand docking using GOLD. PROTEINS: Structure, Function, and Genetics, 52, 609–623. Vistoli, G. and Pedretti, A. Molecular fields to assess recognition forces and property spaces. In: Taylor, J.B. and Triggle, D.J. (Editors). 2006. Comprehensive medicinal chemistry II volume 5: ADME-Tox approaches. Elsevier. Williams, J.A.; Hyland, R.; Jones, B.C.; Smith, D.A.; Hurst, S.; Goosen, T.C.; Peterkin, V.; Koup, 44

J.R.; Ball, S.E. 2004. Drug-drug interactions for UDP-glucuronosyltransferase substrates: A pharmacokinetic explanation for typically observed low-exposure (AUCI/AUC) ratios. Drug Metabolism and Disposition, 32(11), 1201-1208. Yasuo, K.; Yamaotsu, N.; Gouda, H.; Tsujishita, H.; Hirono, S. 2009. Structure-based CoMFA as a predictive model – CYP2C9 inhibitors as a test case. J. Chem. Inf. Model., 49, 853-864. Yamashita, F.; Hara, H.; Ito, T.; and Hasida, M. 2008. Novel hierarchical classification and visualization method for multiobjective optimization of drug properties: Application to Structure-Activity Relationship analysis of Cytochrome P450 metabolism. J. Chem. Inf. Model., 48, 364-369. Yap, C.W. And Chen, Y.Z. 2005. Prediction of Cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using Support Vector Machine. J. Chem. Inf. Model., 45, 982-992. Young, D.C. 2009. Computational drug design: A guide for computational and medicinal chemists. John Wiley & Sons, Inc. Zlokarnik, G.; Grootenhuis, P.D.; Watson, J.B. 2005. High throughput P450 inhibition screens in early drug discovery. Drug Discovery Today, 10(21), 1443-1450.

45

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.