You are on page 1of 37

Quantitative Structure-Activity

Relationships (QSAR)

Comparative Molecular Field


Analysis (CoMFA)

Gijs Schaftenaar
Outline

• Introduction

• Structures and activities

• Analysis techniques:
Free-Wilson, Hansch

• Regression techniques:
PCA, PLS

• Comparative Molecular Field Analysis


QSAR: The Setting

Quantitative structure-activity relationships are used

when there is little or no receptor information, but

there are measured activities of (many) compounds


From Structure to Property

O 9
O
O 8
H O
H O 7
H O
H OH
H
H OH 6
HO H OH
HO H 5
H OH
HO H OH
H
HO H OH 4
HO
H OH
H
HO OH
HO H
H OH 3
H
HO H OH
HO H
H H 2
HO H H
HO H
H H 1
HO H H
HO H H
HO H H
0
HO 1 3 5 7 9 11 13 15
HO
EC50
From Structure to Property

O
O
O
H O
H O
H O
H OH
H
H OH
HO H OH
HO H
H OH
HO H OH
H
HO H OH
HO
H OH
H
HO H OH
HO H OH
H
HO H OH
HO H
H H
HO H H
HO H
H H
HO H H
HO H H
HO H H
HO
HO

LD50
From Structure to Property

O
O
O
H O
H O
H O
H OH
H
H OH
HO H OH
HO H
H OH
HO H OH
H
HO H OH
HO
H OH
H
HO H OH
HO H OH
H
HO H OH
HO H
H H
HO H H
HO H
H H
HO H H
HO H H
HO H H
HO
HO
QSAR: Which Relationship?

Quantitative structure-activity relationships

correlate chemical/biological activities

with structural features or atomic, group or

molecular properties.

within a range of structurally similar compounds


Free Energy of Binding and
Equilibrium Constants

The free energy of binding is related to the


reaction constants of ligand-receptor complex
formation:
Gbinding = –2.303 RT log K
= –2.303 RT log (kon / koff)

Equilibrium constant K
Rate constants kon (association) and koff (dissociation)
Concentration as Activity Measure

• A critical molar concentration C


that produces the biological effect
is related to the equilibrium constant K

• Usually log (1/C) is used (c.f. pH)

• For meaningful QSARs, activities need


to be spread out over at least 3 log units
Free Energy of Binding

Gbinding = G0 + Ghb + Gionic + Glipo + Grot

G0 entropy loss (translat. + rotat.) +5.4


Ghb ideal hydrogen bond –4.7
Gionic ideal ionic interaction –8.3
Glipo lipophilic contact –0.17
Grot entropy loss (rotat. bonds) +1.4

(Energies in kJ/mol per unit feature)


Basic Assumption in QSAR

The structural properties of a compound contribute

in a linearly additive way to its biological activity

provided there are no non-linear dependencies of transport or

binding on some properties


An Example: Capsaicin Analogs

X EC50(M) log(1/EC50)

H 11.80 4.93
Cl 1.24 5.91
HO X
NO2 4.58 5.34
H
N
MeO CN 26.50 4.58
O C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
An Example: Capsaicin Analogs

X log(1/EC50) MR   Es

H 4.93 1.03 0.00 0.00 0.00


Cl 5.91 6.03 0.71 0.23 -0.97
NO2 5.34 7.36 -0.28 0.78 -2.52
CN 4.58 6.33 -0.57 0.66 -0.51
C6H5 6.62 25.36 1.96 -0.01 -3.82
NMe2 5.36 15.55 0.18 -0.83 -2.90
I 6.46 13.94 1.12 0.18 -1.40
NHCHO ? 10.31 -0.98 0.00 -0.98
MR = molar refractivity (polarizability) parameter;  = hydrophobicity parameter;
 = electronic sigma constant (para position); Es = Taft size parameter
An Example: Capsaicin Analogs

HO X
H
N
MeO
O

log(1/EC50) = -0.89 +
0.019 * MR +
0.23 *  +
-0.31 *  +
-0.14 * Es
An Example: Capsaicin Analogs

X EC50(M) log(1/EC50)

H 11.80 4.93
Cl 1.24 5.91
HO X
NO2 4.58 5.34
H
N
MeO CN 26.50 4.58
O C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
First Approaches: The Early Days

• Free- Wilson Analysis

• Hansch Analysis
Free-Wilson Analysis

log (1/C) =  aixi + 


xi: presence of group i (0 or 1)
ai: activity group contribution of group i
: activity value of unsubstituted compound
Free-Wilson Analysis

+ Computationally straightforward

– Predictions only for substituents already included

– Requires large number of compounds


Hansch Analysis

Drug transport and binding affinity


depend nonlinearly on lipophilicity:

log (1/C) = a (log P)2 + b log P + c  + k

P: n-octanol/water partition coefficient


: Hammett electronic parameter
a,b,c: regression coefficients
k: constant term
Hansch Analysis

+ Fewer regression coefficients needed for


correlation

+ Interpretation in physicochemical terms

+ Predictions for other substituents possible


Molecular Descriptors

• Simple counts of features, e.g. of atoms, rings,


H-bond donors, molecular weight

• Physicochemical properties, e.g. polarisability,


hydrophobicity (logP), water-solubility

• Group properties, e.g. Hammett and Taft constants,


volume

• 2D Fingerprints based on fragments

• 3D Screens based on fragments


2D Fingerprints

HO Br
H
N
MeO
O

C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO C=C CΞC C=N Am Im


1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0
Regression Techniques

• Principal Component Analysis (PCA)

• Partial Least Squares (PLS)


Principal Component Analysis (PCA)

• Many (>3) variables to describe objects


= high dimensionality of descriptor data

• PCA is used to reduce dimensionality

• PCA extracts the most important factors


(principal components or PCs) from the data

• Useful when correlations exist between descriptors

• The result is a new, small set of variables (PCs)


which explain most of the data variation
PCA – From 2D to 1D
PCA – From 3D to 3D-
Different Views on PCA

• Statistically, PCA is a multivariate analysis technique


closely related to eigenvector analysis

• In matrix terms, PCA is a decomposition of matrix X


into two smaller matrices plus a set of residuals:
X = TPT + R

• Geometrically, PCA is a projection technique in which


X is projected onto a subspace of reduced dimensions
Partial Least Squares (PLS)

y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1 (compound 1)

y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2 (compound 2)

y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3 (compound 3)




(compound n)
yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en

X = independent variables
Y = XA + E
Y = dependent variables
PLS – Cross-validation

• Squared correlation coefficient R2


• Value between 0 and 1 (> 0.9)
• Indicating explanative power of regression equation

With cross-validation:

• Squared correlation coefficient Q2


• Value between 0 and 1 (> 0.5)
• Indicating predictive power of regression equation
PCA vs PLS

• PCA:
The Principle Components describe the variance
in the independent variables (descriptors)
• PLS:
The Principle Components describe the variance
in both the independent variables (descriptors)
and the dependent variable (activity)
Comparative Molecular Field Analysis
(CoMFA)

• Set of chemically related compounds

• Common substructure required

• 3D structures needed (e.g., Corina-generated)

• Bioactive conformations of the active


compounds are to be aligned
CoMFA Alignment
OH MeO d3 OMe
LA BL
d3 L
D d1 C1
L d2

LA B d2
C7
L Cl Cl
HO d1 Cl

d3 L

d2
L
d1 L

"Pharmacophore"
L
C
OH
OH O d3
d3 L
B d2
NMe2
A
L d2 C
L1
C A
L d1 B
HO d1O L 7
HO O
CoMFA Grid and Field Probe

(Only one molecule shown for clarity)


Electrostatic Potential Contour Lines
CoMFA Model Derivation

• Molecules are positioned in a regular grid


according to alignment

• Probes are used to determine the molecular field:

Electrostatic field Van der Waals field


(probe is charged atom) (probe is neutral carbon)

Ec =  qiqj / Drij Evdw =  (Airij-12 - Birij-6)


3D Contour Map for Electronegativity
CoMFA Pros and Cons

+ Suitable to describe receptor-ligand interactions

+ 3D visualization of important features

+ Good correlation within related set

+ Predictive power within scanned space

– Alignment is often difficult

– Training required

You might also like