You are on page 1of 21

Scatter Plots

Correlation
Fitting a Line to Bivariate Data
Nonlinear Relationships
Using More Than One Predictor
Joint Distributions

Chapter 3: Multivariate Data & Distributions

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 1
Scatter Plots
Correlation
Fitting a Line to Bivariate Data
Nonlinear Relationships
Using More Than One Predictor
Joint Distributions

Outline of Chapter 3

3.1 Scatter Plots


3.2 Correlation
3.3 Fitting a Line to Bivariate Data
3.4 Nonlinear Relationships
3.5 Using more than one Predictor
3.6 Joint Distributions

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 2
Scatter Plots
Correlation
Fitting a Line to Bivariate Data
Nonlinear Relationships
Using More Than One Predictor
Joint Distributions

Scatter Plots

A multivariate data set consists of measurements or


observations on each of two or more variables.
One important special case, bivariate data, involves only two
variables, 𝑥 and 𝑦.
The most important picture based on bivariate numerical data
is a scatter plot.
Each observation (pair of numbers) is represented by a point
on a rectangle coordinate system
Example/figure ...

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 3
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Pearson’s sample correlation coefficient
Using More Than One Predictor
Joint Distributions

Introduction

A scatter plot of bivariate numerical data gives a visual


impression of how strongly 𝑥 values and 𝑦 values are related.
To make precise statement and draw reliable conclusion from
data, one must go beyond pictures.
A correlation coefficient is a quantitative assessment of the
strength of relationship between 𝑥 and 𝑦 values in a set of
(𝑥, 𝑦) pairs.
Example/figure ...

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 4
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Pearson’s sample correlation coefficient
Using More Than One Predictor
Joint Distributions

Pearson’s sample correlation coefficient

Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), ..., (𝑥𝑛 , 𝑦𝑛 ) denote a sample of (𝑥, 𝑦)


pairs.
Consider the 𝑥 deviations (𝑥1 − 𝑥), ..., (𝑥𝑛 − 𝑥) and the 𝑦
deviations (𝑦1 − 𝑦), ..., (𝑦𝑛 − 𝑦)
Multiply each 𝑥 deviation by the corresponding 𝑦 deviation to
obtain products of deviations of the form (𝑥 − 𝑥)(𝑦 − 𝑦).
Example/figure ...

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 5
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Pearson’s sample correlation coefficient
Using More Than One Predictor
Joint Distributions

Definition

Pearson’s sample correlation 𝑟 is given by



(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) 𝑆𝑥𝑦
𝑟 = √∑ √∑ =√ √ (1)
(𝑥𝑖 − 𝑥)2 (𝑦𝑖 − 𝑦)2 𝑆𝑥𝑥 𝑆𝑦𝑦
where
∑ ∑
( 𝑥𝑖 ) 2
𝑆𝑥𝑥 = − 𝑥2𝑖
𝑛
∑ ∑ 2
( 𝑦𝑖 )
𝑆𝑦𝑦 = 𝑦𝑖2 −
𝑛
∑ ∑
∑ ( 𝑥𝑖 ) ( 𝑦 𝑖 )
𝑆𝑥𝑦 = 𝑥𝑖 𝑦 𝑖 −
𝑛
(2)

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 6
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Pearson’s sample correlation coefficient
Using More Than One Predictor
Joint Distributions

Properties and interpretation of 𝑟

The value of 𝑟 does not depend on the unit of measurement


for either variable.
The value of 𝑟 does not depend on which of the two variables
is labeled as 𝑥.
The value of 𝑟 is between −1 and +1
𝑟 = 1 only when all the points in a scatter plot of the data lie
exactly on a straight line that slopes upward. Similarly,
𝑟 = −1 only when all the points lie exactly on a
downward-sloping line.
The value of 𝑟 is a measure of the extent to which 𝑥 and 𝑦
are linearly related.

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 7
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Fitting a straight line
Using More Than One Predictor
Joint Distributions

Introduction

Given two numerical variables 𝑥 and 𝑦, one would need to use


information about 𝑥 to draw some type of conclusion
concerning 𝑦.
The different roles played by the two variables 𝑥 and 𝑦 are
reflected as:
𝑦 is called the dependent or response variable.
𝑥 is referred as the independent, predictor or explanatory
variable.
If a scatter plot of 𝑦 versus 𝑥 exhibits a linear pattern, it is
natural to summarize the relationship between the variables by
finding a line that is as close as possible to the points in the
plot.

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 8
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Fitting a straight line
Using More Than One Predictor
Joint Distributions

Definition

The most widely used criterion for assessing the goodness of


fit of a line 𝑦 = 𝑎 + 𝑏𝑥 to bivariate data (𝑥1 , 𝑦1 ), ..., (𝑥𝑛 , 𝑦𝑛 ) is
the
∑𝑛 sum of the squared deviation about the line:
2
[𝑦
𝑖=1 𝑖 − (𝑎 + 𝑏𝑥 𝑖 )]

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 9
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Fitting a straight line
Using More Than One Predictor
Joint Distributions

Slope 𝑏 and vertical intercept 𝑎 of the line

The slope of the least squares line is given by:


∑ ∑ ∑
𝑥𝑖 𝑦𝑖 − ( 𝑥𝑖 ) ( 𝑦𝑖 ) /𝑛 𝑆𝑥𝑦
𝑏= ∑ 2 ∑ 2 = (3)
𝑥𝑖 − ( 𝑥𝑖 ) /𝑛 𝑆 𝑥𝑥

The vertical intercept 𝑎 of the least squares line is

𝑎 = 𝑦 − 𝑏𝑥 (4)

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 10
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Fitting a straight line
Using More Than One Predictor
Joint Distributions

Assessing the fit of the least squares line: Definitions

Residual sum of squares, denoted by SSResid, is given by


∑𝑛
SSResid = (𝑦𝑖 − 𝑦ˆ𝑖 )2 (5)
𝑖=1
where 𝑦ˆ𝑖 = 𝑎 + 𝑏𝑥𝑖 . (Alternatively, called error sum of squares
and denoted by SSE)
Total sum of squares, denoted by SSTo, is defined as
∑ 𝑛
SSTo = (𝑦𝑖 − 𝑦)2 (6)
𝑖=1

The coefficient of determination, denoted by 𝑟 2 , is given by


SSResid
𝑟2 = 1 − (7)
SSTo
Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 11
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Introduction
Nonlinear Relationships Fitting a straight line
Using More Than One Predictor
Joint Distributions

Standard deviation about the least squares line

The standard deviation about the least squares line is given by


SSResid
𝑠𝑒 = (8)
𝑛−2
A residual plot is a plot of the (𝑥,residual) pairs-that is, of the
points (𝑥1 , 𝑦1 − 𝑦ˆ1 ), (𝑥2 , 𝑦2 − 𝑦ˆ2 ), ..., (𝑥𝑛 , 𝑦𝑛 − 𝑦ˆ
𝑛 ) or of the
residuals versus predicted values-the points
(𝑦1 , 𝑦1 − 𝑦ˆ1 ), (𝑦2 , 𝑦2 − 𝑦ˆ2 ), ..., (𝑦𝑛 , 𝑦𝑛 − 𝑦ˆ
𝑛 ).

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 12
Scatter Plots
Correlation
Power transformations
Fitting a Line to Bivariate Data
Fitting a polynomial function
Nonlinear Relationships
Smoothing a scatter plot
Using More Than One Predictor
Joint Distributions

Power transformations

A scatter plot of bivariate data frequently shows curvature


rather than a linear pattern.
If a scatter plot is curved and monotonic-either strictly
decreasing or strictly increasing, it is possible to find a power
transformation for 𝑥 and 𝑦 so that there is a linear pattern in
a scatter plot of the transformed data.
With a power transformation, one has 𝑥′ = 𝑥𝑝 and/or 𝑦 ′ = 𝑦 𝑞
and the relevant scatter plot is of the (𝑥′ , 𝑦 ′ ) pairs.

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 13
Scatter Plots
Correlation
Power transformations
Fitting a Line to Bivariate Data
Fitting a polynomial function
Nonlinear Relationships
Smoothing a scatter plot
Using More Than One Predictor
Joint Distributions

Fitting a polynomial function

If the general pattern of curvature in a scatter plot is not


monotonic, a quadratic function 𝑎 + 𝑏1 𝑥 + 𝑏2 𝑥2 can be used
in fitting the data.
The least squares coefficients 𝑎, 𝑏1 , and 𝑏2 are the values of ˜
𝑎,
𝑏˜1 , and 𝑏˜2 that minimizes the following:
∑( )2
𝑎, 𝑏˜1 , 𝑏˜2 ) =
𝑔(˜ 𝑎 + 𝑏˜1 𝑥𝑖 + 𝑏˜2 𝑥2𝑖 )
𝑦𝑖 − (˜ (9)
𝑖

There are popular statistical computer packages that can


numerically find the coefficient values.

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 14
Scatter Plots
Correlation
Power transformations
Fitting a Line to Bivariate Data
Fitting a polynomial function
Nonlinear Relationships
Smoothing a scatter plot
Using More Than One Predictor
Joint Distributions

Smoothing a scatter plot

Sometimes the pattern in a scatter plot is too complex for a


line or curve of a particular type (e.g., exponential of
parabolic) to give a good fit.
Statisticians have recently developed some more flexible
methods that permit a wide variety of patterns.
One such method is LOWESS (or LOESS), short for locally
weighted scatter plot smoother:
Let (𝑥∗ , 𝑦 ∗ ) denote a particular one of the 𝑛 (𝑥, 𝑦) pairs in the
sample.
The 𝑦ˆ value corresponding to (𝑥∗ , 𝑦 ∗ ) is obtained by fitting a
straight line using only a specific percentage of the data (e.g.,
25%) whose 𝑥 values are closest to 𝑥∗ .
𝑥 values closer to 𝑥∗ are more heavily weighted than those are
farther away.
This process is repeated for each of the 𝑛 points, and the
Applied Probability and fitted
Statistics points are and
for Engineering connected
Science to produce
Chapter a LOWESS
3: Multivariate curve.
Data & Distributions 15
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Fitting a linear function
Nonlinear Relationships General additive fitting
Using More Than One Predictor
Joint Distributions

Fitting a linear function

Consider fitting a relation of the form

𝑦 ≈ 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ... + 𝑏𝑘 𝑥𝑘 (10)

The least squares coefficients 𝑎, 𝑏1 , ..., 𝑏𝑘 are the values of


𝑎, ˜𝑏1 , ..., ˜𝑏𝑘 that minimize
˜
𝑛 (
∑ )2
𝑎, ˜𝑏1 , ..., ˜𝑏𝑘 ) =
𝑔(˜ 𝑎 + ˜𝑏1 𝑥1𝑗 + ˜𝑏2 𝑥2𝑗 + ... + ˜𝑏𝑘 𝑥𝑘𝑗 )
𝑦𝑗 − (˜
𝑗=1
(11)

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 16
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Fitting a linear function
Nonlinear Relationships General additive fitting
Using More Than One Predictor
Joint Distributions

General additive fitting

A more flexible type of relation is

𝑦ˆ = 𝑎 + 𝑓1 (𝑥1 ) + 𝑓2 (𝑥2 ) + ... + 𝑓𝑘 (𝑥𝑘 ) (12)

where the forms of 𝑓1 (), ..., 𝑓𝑘 () are left unspecified.


The statistical package S-Plus, among others, will execute this
general additive fit by calculating 𝑎 and the individual 𝑓𝑖 ()′ 𝑠.
One method for carrying out this later task is based on the
LOWESS technique as described previously.

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 17
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Distributions for two variables
Nonlinear Relationships Correlation and the Bivariate Normal Distribution
Using More Than One Predictor
Joint Distributions

Distributions for two discrete variables

If 𝑥 and 𝑦 are both discrete, their joint distribution is specified


by a joint mass function 𝑓 (𝑥, 𝑦) satisfying:

𝑓 (𝑥, 𝑦) ≥ 0 and 𝑓 (𝑥, 𝑦) = 1. (13)
all (𝑥,𝑦)

Often, there is no nice formula for 𝑓 (𝑥, 𝑦).


When there are only a few possible values of 𝑥 and 𝑦, the mass
function is most conveniently displayed in a rectangular table.
Examples/illustrations...

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 18
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Distributions for two variables
Nonlinear Relationships Correlation and the Bivariate Normal Distribution
Using More Than One Predictor
Joint Distributions

Distributions for two continuous variables

If 𝑥 and 𝑦 are both continuous, their joint distribution is


specified by a joint mass function 𝑓 (𝑥, 𝑦) satisfying:
∫ ∞∫ ∞
𝑓 (𝑥, 𝑦) ≥ 0 and 𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1. (14)
−∞ −∞

Examples/illustrations...

Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 19
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Distributions for two variables
Nonlinear Relationships Correlation and the Bivariate Normal Distribution
Using More Than One Predictor
Joint Distributions

Correlation

The covariance between 𝑥 and 𝑦 is defined by:


∫ ∞∫ ∞
𝜎𝑥𝑦 = (𝑥 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )𝑓 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 (15)
−∞ −∞

where 𝜇𝑥 and 𝜇𝑦 denote the mean values of 𝑥 and 𝑦,


respectively.
the population correlation coefficient 𝜌 is defined by
𝜎𝑥𝑦
𝜌= (16)
𝜎𝑥 𝜎𝑦
𝜌 does not depend on the 𝑥 or 𝑦 units of measurement.
−1 ≤ 𝜌 ≤ 1
The closer 𝜌 is to +1 or -1, the stronger the linear relationship
between the two variables.
Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 20
Scatter Plots
Correlation
Fitting a Line to Bivariate Data Distributions for two variables
Nonlinear Relationships Correlation and the Bivariate Normal Distribution
Using More Than One Predictor
Joint Distributions

The bivariate normal distribution


The bivariate normal joint density function is given by
[ ( 𝑦−𝜇 ) ( 𝑦−𝜇 )2 ]
𝑥 2
1 − 1
2 ( 𝑥−𝜇
𝜎𝑥 ) −2𝜌( 𝜎𝑥 )
𝑥−𝜇𝑥 𝑦
+ 𝑦

√ 𝑒 2(1−𝜌 )
𝜎𝑦 𝜎𝑦
𝑓 (𝑥, 𝑦) =
2𝜋𝜎𝑥 𝜎𝑦 1 − 𝜌2
(17)
where −∞ < 𝑥 < ∞ and −∞ < 𝑦 < ∞
when 𝑥 and 𝑦 are statistically independently, the joint density
function 𝑓 (𝑥, 𝑦) must satisfy
𝑓 (𝑥, 𝑦) = 𝑓1 (𝑥)𝑓2 (𝑦) (18)
where 𝑓1 (𝑥) and 𝑓2 (𝑦) denote the marginal distributions of 𝑥 and 𝑦,
respectively.
Note that once independence is assumed, one has only to select
appropriate distributions for 𝑥 and 𝑦 separately and then use (18) to
yield the joint distribution.
Applied Probability and Statistics for Engineering and Science Chapter 3: Multivariate Data & Distributions 21

You might also like