You are on page 1of 29

STATISTICAL METHODS USED

IN QSAR

Dr. Chirag J. Patel


Professor
Department of Pharmaceutical Chemistry
Shree Swaminarayan Sanskar Pharmacy College
Zundal - Gandhinagar
Contact no: 8469565669
chirag999patel@gmail.com

1
CONTENTS

 INTRODUCTION TO QSAR
 STATISTICAL METHODS USED IN QSAR
 STATISTICAL PARAMETERS AND ITS IMPORTANCE

2
3
QUANTITATIVE STRUCTURE
ACTIVITY RELATIONSHIP
(QSAR)
• The QSAR approach attempts to identify
and quantify the physicochemical
properties of a drug and to see whether
any of these properties has an effect on
the drug’s biological activity by using
mathematical equations.

4
QSAR

5
PHYSICOCHEMICAL PARAMETER/
INDEPENDENT VARIABLE

 Hydrophobic parameter
 Electronic parameter
 Steric parameter

6
STEPS INVOLVED IN
QSAR
 Designing different analogs by modification of lead.

 Calculation of various physicochemical parameters.


 Correlation of physicochemical properties with biological
activity by QSAR method.

 Getting an equation.

 Designing the compound based on QSAR equation.

 Predicting the biological activity of the designed compound.

 Synthesis of compounds.
7
2D QSAR METHODS

1. Free energy models


a) Hansch analysis
2. Mathematical models
a) Free Wilson analysis
3. Statistical methods
a) Discriminant Analysis (DA)
b) Simple linear regression analysis
c) Multiple regression analysis
d) Cluster Analysis (CA)
e) Partial least square method

8
9
APPLICATION OF STATISTICS

(i) A number of equations may be generated for one case study.


Statistics helps in selecting one suitable best fit equation out
of them.
(ii) It is done by checking different statistical parameters of the
equations.

10
CLASSIFICATION OF
STATISTICAL METHODS

REGRESSION CLASSIFICATION
BASED BASED
METHODS METHODS

1. Simple linear regression analysis 1. Linear discriminant analysis


2. Multiple regression analysis 2. Cluster analysis
3. Partial least square method

11
REGRESSION BASED METHODS

 Regression-based approaches are employed when the response


data of compounds are entirely numerical that is quantitative.
 This method enable the quantitative prediction of the response
(activity/property).

12
1. SIMPLE LINEAR REGRESSION
ANALYSIS
• It give information about a relationship between one
dependent variable (y) and one independent variable (x).

13
GRAPH OF LINEAR REGRESSION
ANALYSIS

14
Antibacterial activity of N'-(R-pheny1)
sulfanilamides

Regression coefficient, r = 0.935


Standard deviation, s = 0.240

15
2. MULTIPLE REGRESSION ANALYSIS

• Multiple regression analysis is a multivariate technique.

• This is used to predict the value of a dependent variable


(activity) based on the value of more than one independent
variable (physicochemical parameters).

• The purpose of this method is to learn about the relationship


between several independent variables and a dependent
variable.

16
Eg: Adrenergic blocking activity of β-
halo- β–arylamines
Y X

CH CH 2 NRR'

•Activity increases if the π is positive ( hydrophobic substituents)


•Activity increases if the σ is negative ( electron donating substituents)

17
3. PARTIAL LEAST SQUARE METHOD

• Used when large number of descriptors is correlated with


activity for a limited number of data points.

• It determine the latent variables (LV).

• Latent variables are variables that are not directly observed, it


is obtained from directly measured variables.

18
II) CLASSIFICATION BASED METHOD

• Qualitative or semi quantitative chemical response(s) are


modeled in classification techniques.

• This method allow categorization of the data points into


several groups or classes such as highly active and less active.

19
1. LINEAR DISCRIMINANT ANALYSIS

• This method is used to analyze the relationship between a


dependent variable and independent variable.

• It can predict the group to which it belongs (active or inactive)


from a set of physicochemical parameters.

• Advantage of this method is that inactive compounds can also


be included in the analysis and several activities can be
considered at the same time.

20
2. CLUSTER ANALYSIS

• Computational batch selection method, and was introduced by


Hansch et al.

• Substituents are grouped into clusters with similar properties


according to descriptors.

• One member of each cluster would be selected for substitution


into the lead compound and the compound would be
synthesized and tested.

21
• Process of dividing a molecules into groups (cluster) such that
the molecules within a cluster are highly similar whereas
molecules in different clusters are dissimilar.

22
STATISTICAL PARAMETERS USED
IN REGRESSION ANALYSIS

 Regression or correlation coefficient (r)


 r2
 Standard deviation (s)
 F value

23
REGRESSION OR CORRELATION
COEFFICIENT (r)

• It is the measure of how well the physicochemical parameters


present in the equation explain the observed variance in
activity.
• For a perfect fit r=1, so the ‘r’ value greater than 0.9 are
considered acceptable.

24
SQUARE OF REGRESSION
COEFFICIENT (r2)

• The value of r2 is 0.8 is considered as a good fit.


• If the r2 is multiplied by 100, it indicates the percentage
variations in biological activity.

25
STANDARD DEVIATION (S)

• The standard deviation is an absolute measure of the quality of


fit.
• Ideally ‘s’ should approach zero

26
F VALUE

• This is used as a measure of the level of statistical


significance of the regression model.
• A larger value of ‘F’ implies a more significant correlation
has been reached.

27
REFERENCE
• Medicinal Chemistry and Drug Discovery by Berger ; VI
edition; volume I; page no: 8-10.
• An Introduction to Medicinal Chemistry by Graham .L.
Patrick; V edition; page no: 383-384 , 707-710
• The Organic Chemistry of the Drug Design and Drug action
by Richard .B. Silverman.
• https://statistics.laerd.com/spss-tutorials/multiple-regression-
using-spss-statistics.php
• http://www.esi.umontreal.ca/accelrys/life/cerius46/qsar/theory
methods.html
• https://www.slideshare.net/search/slideshare;page=2&q=statist
ical+methods+used+in+QSAR

28
29

You might also like