You are on page 1of 33

Linear Regression

Dr Menaal Kaushal
JR II
Department of S P M
S N Medical College, Agra
1 22-11-2013
Statistical Analysis can be:
 Univariate: When Only one variable is studied. E.g
Heights of all the IV graders, ages of mothers
delivering at a DH, etc. (Measures of Central
Tendency, Measures of Dispersion)
 Bivariate: When relationship between two variables
are studied. e.g. Relationship between height and
weight of Every Child in the IV grade; relation
between mother’s age & birth weight of her baby, etc.
 Multivariate: When relationship between more than
two variables are studied. E.g Relationship between
height, weight and MAC of every child in the IV grade
2 22-11-2013
Bivariate Regression

 Linear Regression: When the data is


continuous

 Logistic Regression: When the data is


categorical, e.g. the research question can
be answered as either yes or no category
3 22-11-2013
Levels (Types) of Data

 Nominal (Categorical) Measures: Are exhaustive


and mutually exclusive (e.g., religion), gender
 Ordinal Measures: All of the above plus can be
rank-ordered (e.g., social class).
 Interval Measures: All of the above plus equal
differences between measurement points
(temperature in ℃ or ℉ ).
 Ratio Measures: All of the above plus a true zero
point (weight, Absolute Temperature in Kelvin).
4 22-11-2013
Relationship Between Two
Variables
 Association: any relation between variables

 Positive association: above average values of one variable


tend to go with above average values of the other; the scatter
slopes up

 Negative association: above average values of one variable


tend to go with below average values of the other; the scatter
slopes down

 Linear association: roughly, the scatter diagram is clustered


around a straight line. This is Correlation
5 22-11-2013
6 22-11-2013
[‘p-0

7 22-11-2013
8 22-11-2013
The “Football” Bivariate
Normal Scatter Plot

9 22-11-2013
Can you identify any
difference?

10 22-11-2013
How Tightly Clustered
Are these Data?

11 22-11-2013
Calculating the Correlation
Coefficient

12 22-11-2013
So, How to Calculate r

13 22-11-2013
Formula of Correlation
Coefficient

Lets Simplify:
 Convert the data into Standard units.
 Multiply the corresponding standard unit values
of x and y
 r is the mean of this product
14 22-11-2013
Properties of Correlation
Coefficient
 The calculations uses only standard units so r is a pure
number with no units

 -1≤ r ≤ 1

 In the extreme cases, r = -1 when the scatter diagram is a


perfect straight line sloping down. If r = 1, the scatter
diagram is a perfect line sloping up

 Switching the variables x and y does not change r. it


remains the same

15 22-11-2013
 Adding a constant to one of the lists just slides the
scatter diagram so r stays the same

 Multiplying one of the lists by a positive constant does


not change standard units so r stays the same

 Multiplying just one (not both) of the lists by a negative


constant switches the signs of the standard units of that
variable, so r has the same absolute value but its sign gets
switched.

16 22-11-2013
Heteroscadastic Curve

17 22-11-2013
What r can not tell?
 Association is not causation. r does not tell “Why”

 r is only used for linearly correlated variables. It


measures linear association.

 This diagram shows a strong relation

between x& y, but it is not linear. But r

for this diagram comes out to be Zero

18 22-11-2013
Beware of:

 Outliers

 Tendency for Ecological correlations

19 22-11-2013
Deal with the outliers

20 22-11-2013
Can you find the outlier?

21 22-11-2013
Avoid “Ecological
Correlation”:

Replacing students by averages


can artificially increase
clustering. This is not desirable.

22 22-11-2013
Regression

 The technique to estimate dependent variable

“y”, for a given value of variable “x” when they

are linearly associated and the correlation

coefficient “r” is known.

23 22-11-2013
Each estimate is at the center of the vertical strip
22-11-2013 24
25 22-11-2013
The slope of the green line= r

26 22-11-2013
The Equation of Regression
 Estimate of y = r* given x (in Standard units)

 ⇒ estimate of y- µy = r (x- µx)


SDy SDx

 Estimate of y= Slope* (x) + intercept

 (Here Slope= r* SDy / SDx and intercept= µy-slope*x)

27 22-11-2013
Why call “Regression”
 Sir Francis Galton 1822- 1911: “The Galton Effect”
 “Those who have high values in one variable tend to
be not as high in the second variable”
 A eugenicist, who gave the idea of SD and regression
 “Fathers who are tall, tend to have sons who are not
quite that tall on average”
 All data regresses towards “mediocrity”
 i.e. regresses towards mean
 The Regression Fallacy or Sophomore Slump
28 22-11-2013
29 22-11-2013
Univariate Normal Bivariate Normal

+1 r.m.s.
error
68%
68% r

µx
+1 SD

30 22-11-2013
Residual Plot

Regardless of the shape of the scatter diagram:


the average of the residuals is Always 0,
There is No linear association between residuals and x.
The residual plot should not show any trend or linear
relation.
Good regression: Residual plot should look like a formless
31 22-11-2013
blob around the horizontal axis
Residual Plot as a Diagnostic
Tool

32 22-11-2013
Questions??
33 22-11-2013

You might also like