BIVARIATE ANALYSIS

The major differentiating point between univariate and bivariate analysis, in addition to looking at more than one variable, is that the purpose goes beyond simply descriptive: it is the analysis of the relationship between the two variables

Univariate Data 1 2 3 4

Bivariate Data

Involving a single variable Does not deal with causes or relationships The major purpose of univariate analysis is to DESCRIBE Central tendency - mean, mode, median, Dispersion - range, variance, max, min, quartiles, standard deviation. frequency distributions Bar graph, histogram, pie chart, line-graph, box-and-whisker plot

Involving two variables Deals with causes or relationships The major purpose of bivariate analysis is to EXPLAIN Analysis of two variables simultaneously Correlations, comparisons,

Independent and dependent variables

BIVARIATE ANALYSIS

Relationship simply refers to the extent to which it becomes easier to know/predict a value for the Dependent variable if we know a case's value on the Independent variable.

BIVARIATE ANALYSIS

1. 2. 3. 4. SCATTER DIAGRAMS COVARIANCE CORRELATION REGRESSION

SCATTER DIAGRAMS

BIVARIATE ANALYSIS

Scatter diagrams are of use for variables that are closely related and have a relatively very high covariance

UNIVARIATE

BIVARIATE

SUM OF SQUARES

SUM OF PRODUCTS

SP =

SS =

SS =

SP =

Covariance =

Variance =

Variance =

Covariance =

Covariance

Covariance is the joint variation of two variables about their common mean The covariance is sometimes called a measure of "linear dependence" between the two random variables. When the covariance is normalized, one obtains the correlation coefficient. From it, one can obtain the Pearson coefficient, which gives us the goodness of the fit for the best possible linear function describing the relation between the variables.

Covariance

Cr, Ni and V (ppm) in an Upper Pennsylvanian Shale Uncorrected Sum of Products = (XY) from Kansas Corrected Sum of Products = (X- )(Y- ) X Y XY Cr Ni V 205 255 195 130 165 100 180 215 135 26650 Sum of Products (SP):(XY) (X)(Y)/n 42075 Covariance = SP/n-1 19500

Sum of Products (SP):(XY) (X)(Y)/n =132000 (1110)(675)/5 220 135 200 29700 = 2150 235 145 205 34075 Covariance = SP/n-1 = 1110 675 935 152000 Mean=222 135 187 = 2150/4 = 537.5

S2 = 570 562.5 SD = 23.88 23.71

Covariance provides a measure of the strength of the correlation between two or more sets of random variables.

Cr X 205 255 195 220 235 X X2 42025 65025 38025 48400 55225 X2 Ni y 130 165 100 135 145 Y Y2 16900 27225 10000 18225 21025 93375 Y2 Cr*Ni XY 26650 42075 19500 29700 34075 152000 XY Z V Z Ni*V Cr*Z VARIANCE 2 Z YZ XZ Cr 180 32400 23400 36900 Cr Ni 215 46225 35475 54825 V 663.75 135 18225 13500 26325 200 205 935 40000 42025 178875 Z2 YZ 27000 29725 129100 XZ 44000 48175 210225 COVARIANCE Ni V 537.5 663.75 562.5 718.75 718.75 1007.5

SSCr = (248700) - (1110)2 /5 = 248700-246420 =2280 SSNi = 93375 (675)2 /5 = 93375 91125 =2250

Interpretation of covariance values must proceed in the same manner as an interpretation of variances. Individual values are not too meaningful because they are dependent upon the units of measurement.

In practice, the sample correlation coefficient r is commonly calculated by the equation, r jk r jk = =

Cr

Cr 1

Ni

1

Ni 0.949248

r CrNi =

In order to estimate the degree of interrelation between variables in a manner not influenced by the measurement units, the correlation coefficient r is used . Correlation is the ratio of the covariance of two variables to the product of their standard deviations

Correlation can have a value: 1 is a perfect positive correlation 0 is no correlation (the values don't seem linked at all) -1 is a perfect negative correlation

If r measures the linear relationship between two variables, it should be possible to compute the line of dependence between them. Linear Regression

Output Persons in employed Units X Y 1 1 3 2 5 3 6 4 5 5

Calculations X 1 3 5 6 5 20 Y 1 2 3 4 5 15 1 X2 1 XY

9 25

36 25 96

6 15

24 25 71

Y = a + bX

Y = Na + bX

15 = 5a + 20b

XY = aX +

bX2

b = 0.6875

71 = 20a + 96b

a = 0.25

Y = 0.25 + 0.6875x

Alternate way of finding the regression equations is by using deviations from respective means, instead of using normal equations. The regression of Y on X is given by X Y x=(X- ) y= (Y ) -3 -1 1 2 1 0 -2 -1 0 1 2 0 xy 6 1 0 2 2 11 x2 9 1 1 4 1 16 y2 xy = 11 1 1 3 2 5 3 6 4 5 5 20 15 4 1 0 1 4 10 x2 =16 byx = 11/16 = 0.6875

= 20/5 = 4 = 15/5 = 3

Output in Units X 1 3 5 6 5

Persons employed Y 1 2 3 4 5

10

7.5625

Total sum of squares (SST) of Y: SST = (y - Y)2 = 10 Sum of squares due to regression (SSR):

SSR = ( -Y)2 = 7.5625

The left over variation can be called the sum of squares due to deviation (SSD): The goodness-of-fit-of the line to the points can be defined by

R2 =

= 7.5625 / 10 = 0.75625

SUMMARY OUTPUT Regression Statistics Multiple R 0.869626 R Square 0.75625 Adjusted R Square 0.675 Standard Error 0.901388 Observations 5 ANOVA df Regression Residual Total 1 3 4 SS 7.5625 2.4375 10 Standard Error 0.987421 0.225347 MS 7.5625 0.8125 Significance F F 9.307692 0.055391

Coefficients Intercept 0.25 X Variable 1 0.6875 RESIDUAL OUTPUT Observation Predicted Y Residuals 1 0.9375 0.0625 2 2.3125 -0.3125 3 3.6875 -0.6875 4 4.375 -0.375 5 3.6875 1.3125

t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.253185 0.816484 -2.89241 3.392414 -2.89241 3.392414 3.050851 0.055391 -0.02965 1.404655 -0.02965 1.404655 PROBABILITY OUTPUT Percentile Y 10 1 30 2 50 3 70 4 90 5

Thank you

V. Hanumantha Rao Director (Retd.), GSI

