© All Rights Reserved

11 views

© All Rights Reserved

- Enhanced Data Analysis Using SAS ODS Graphics and Statistical Graphics
- The Effects of Employer Knowledge and Product Awareness on Job Seekers’ Application Decisions
- Projection and Regression
- Predicting Student Proficiency Test Scores
- accounting and finace
- MPRA Paper 13560
- Thera-band Elastic Band Tension Reference
- Curve Fitting
- 6_257-275
- Introduction to SPSS
- FHMM1034 Chapter 5 Correlation and Regression (Student Version)
- pdf (11)
- nfsc440-and evidence analysis worksheet
- Research Paper-Correlation Between Math and Science and Visual Arts
- QNT 561 Final Exam : QNT 561 Final Exam Answer | Studentehelp
- Chapter 13
- 2898-11934-1-PB
- 154
- Uji Reabilitas,Path Analysis,Asumsi Klasik,Sem Pls
- sss

You are on page 1of 3

CORRELATION ANALYSIS

Aivaz Kamer-Ainur

Mirea Marioara

Ovidius University of Constanta, Faculty of Economics Sciences, Dumbrava Rosie St. 5, code 900613, E-

mail: elenacondrea2003@yahoo.com

Abstract

This paper describes the main errors and limitation associated with the methods of regression and correlation

analysis. Those methods have been developed specifically to study statistical relationships in data series.

Key Words: Assumption, linear regression, linear correlation, multiple regressions, multiple correlations.

Introduction

Regression analysis is concerned with developing the linear regression equation by which the value of a

dependent variable Y can be estimated given a value of an independent variable X.

If simple regression analysis is used, the assumptions for this technique should be satisfied. The assumption

required to develop the linear regression equation and to estimate the value of dependent variable by point

estimation is:

1. The relationship between the two variables is linear.

2. The value of the independent variable is a set at various values, while the dependent variable is a

random variable.

3. The conditional distributions of the dependent variable have equal variances.

If any interval estimation or hypothesis testing is done, additional required assumptions are:

1. Successive observations of the dependent variable are uncorrelated.

2. The conditional distributions of the dependent variable are normal distributions.

The scatter diagram is a graph that portrays the relationship between the two variables and can be used to

observe whether there is general compliance with the assumptions underlying regression analysis. An alternative

graph to determine such compliance is the residual plot, which is a plot of the residuals e = ( ) with

respect to the fitted values . The mathematical criterion generally used to determine the linear regression

equation is the least squares criterion by which the sum of the squared deviations between the actual and

estimated values of the dependent variable is minimized. The standard error of estimate s y , x is the measure of

variability, or scatter, with respect to the regression line. It is used to establish prediction intervals for the

dependent variable.

Interval estimation of the conditional mean of the dependent variable is based on use of the standard error of the

conditional mean sy,x, The standard error of forecast sy(next) is used to construct a complete prediction interval for

an individual value of the dependent variable. By complete, we mean that the uncertainty regarding the value

of the conditional mean is considered in addition the uncertainty represented by the scatter with respect

regression line. When the sample is relatively large, approximate prediction intervals based on use of only the

standard error of estimate are considered acceptable. A final area of inference that we considered was interval

estimation and hypothesis testing concerning the slope 1 of the linear regression model.

The most frequently used measure of relationship for sample data is the sample correlation coefficient r. The

sign of the correlation coefficient indicates the nature (direct or inverse) of the relationship between the two

variables, while the absolute value of the correlation coefficient indicates the extent of the relationship. The

coefficient of determination r2 indicates the proportion of variance in the dependent variable that is explained

statistically by knowledge of the independent variable (and vice versa). The null hypothesis most frequently

tested in correlation analysis is that the population correlation is zero, represented by = 0 . Rejection of this

hypothesis leads to the conclusion that 0 and that the two variables are related.

The value of the dependent variable cannot be legitimately estimated if the value of the independent variable is

outside the range of values in the sample data that served as the basis for determining the linear regression

equation. There is no statistical basis to assume that the linear regression model applies outside of the range of

the sample data.

710

If the estimate of the dependent variable in fact concerns prediction, the historical data used to determine the

regression equation might not be appropriate to represent future relationships. Unfortunately, one can only

sample past data, not future data.

The standard error estimate is by itself not a complete basis for constructing prediction intervals, because the

uncertainly concerning the accuracy of the regression equation, and specifically of the conditional mean is not

considered. The standard error forecast is the complete measure of variability. However, when the sample is

large, use of the standard error of estimate is generally considered acceptable.

If correlation analysis is used, all the assumptions for this technique should be satisfied. These assumptions are:

1. the relationship is linear

2. Both variables are random variables.

3. For each variable, the conditional distributions have equal variances.

4. Successive observations are uncorrelated for both variables.

5. The joint distribution is a bivariate normal distribution.

A significant correlation does not necessarily indicate causation, but rather may indicate a common linkage in a

sequence of events. One type of significant correlation situation is when both variables are influenced by a

common cause and therefore are correlated with one another. For example, individuals with a higher level of

income have both a higher level of savings and a higher level of spending. We might therefore find that there is

a positive relationship between level of savings and level of spending, but this does not mean that one of this

variable cause the other. Another type of situation is one in which two related variables are separated by several

steps in a cause-effect chain of events. An interesting example in the medical field is the following sequence of

events: warm winter, appearance of viruses, and release of the flu.

The existence of warm, the climatic conditions is not itself the cause of the flu, but these conditions is several

steps removed in the cause-effect sequence. For many years, the climatic conditions themselves were thought to

be the cause, and so this disease was called flu.

A significant correlation is not necessarily an important correlation. There is much confusion regarding the

meaning of significant in the popular press. It is usually implied that a relationship that is significant is also

thereby important. However, from the statistical point of view, a significant correlation simply indicates that a

true relationship exists and that the correlation coefficient for the population is different from 0. Significance

is necessary to conclude that a relationship exists, but the coefficient of determination r2 is more useful in

judging the importance of the relationship.

Given a very large sample, a correlation of, say, r = 0,10 can be significantly different from 0 at = 0,05 . Yet

the coefficient of determination of r2 = 0,01 from this example indicated that only 1 percent of the variance of

the dependent variable is statistically explained by knowledge of the independent variable.

In a comparison of simple linear regression analysis and correlation analysis, the principal difference in the

assumption is that in regression analysis there is one random variable, while in correlation analysis, both

variables have to be random. The sapling design for a study therefore should consider the analysis to be

performed. Regression analysis is used when the main objective is to estimate values of the dependent variable,

whereas correlation analysis is used when the main objective is to measure and express the degree of

relationship between the two variables. When both variables are random variables, either regression analysis or

correlation analysis, or both, can usually be applied to the data.

In multiple regression analysis the value of the dependent variable is estimated on the basis of know values of

two or more independent variables, while the extent of the relationship between the independent variables, while

the extent of the relationship between the independent variables taken as a group and the dependent variable is

measured in multiple correlation analysis. For multiple regression analysis the principal assumption is:

1. The relationship can be represented by a linear model

2. The dependent variable is a continuous random variable

3. The variances of the conditional distributions of the dependent variable are all equal (homoscedasticity)

4. Successive observed values of the dependent variable are uncorrelated

5. The conditional distributions of the dependent variable are all normal distributions.

On the general level, the assumptions associated with multiple regressions and multiple correlation analysis has

to be satisfied if the results are to be meaningful. As in simple analysis involving one independent variable, the

assumption of linearity and equality of conditional variances can be investigated by obtaining a residual plot

based on the multiple regression models.

A specific area of concern when there are several independent variables is the possible existence of

multicollinearity. This term describes the situation in which two or more independent variables are highly

correlated with one another. Under such conditions, the meaning of the partial regression coefficient in the

multiple regression equation is unclear. Similarly, the meaning of the coefficients of partial correlation with a

given independent variable is highly negative even through the simple correlation is highly positive. The

statistical procedures that represent attempts to handle the problem of multicollinearity it is sometimes to

711

eliminate one of two highly correlated independent variables from the analysis, recognizing that the two

variables essentially are measuring the same factors. When correlated variables must be included in the analysis,

care must be taken in ascribing practical meaning to the partial regression coefficients and to the coefficients of

partial correlation. However, multicollinearity causes no special problem for inferences associated with the

overall regression model, such as F test for the significance of the regression effect, confidence intervals for the

mean of the dependent variable, and prediction intervals for individual values of the dependent variable. Of

course, for any interval estimates the values of the independent variables should be within the ranges of values

included in the sample data.

Another area of specific concern in multiple regressions and multiple correlation analysis is the possibility that

successive observed values of the dependent variables are correlated rather than uncorrelated. The existence of

such a correlation is called autocorrelation. The assumption that the successive values of the dependent variable

are uncorrelated has already been identified as a principal assumption in simple regression and simple

correlation analysis. However, in simple analysis the existence of such a correlation is easier to observe than a

multiple analysis. Typically, autocorrelation occurs when values of the dependent variable are collected as time

series values, that is, when they are collected in a series of time periods. When successive values of the

dependent variable are correlated values, the point estimate of the dependent variable based on the multiple

regression equation is not affected. However, the standard error associated witch each partial regression

coefficient bk is understated, and the value of the standard error of estimate is understated. The results is that the

prediction and confidence intervals are narrower (more precise) than they should be, and null hypotheses

concerning the absence of relationship are rejected too frequently. In terms of correlation analysis, the

coefficients of multiple determinations and multiple correlations are both overstated in value.

Bibliography

1. Andrei T.,Stancu S.,Pele T.D.- Statistic. Teorie i aplicaii, Editura Economic, 2002

2. Baron, T.Biji, E.Tovissi, L.Isaic-Maniu, A.- Statistic teoretic i economic, Editura Didactic i

Pedagogic, Bucureti, 1996

3. Biji E.,Baron T.- Statistic teoretic i economic, Editura Didactic i Pedagogic, Bucureti, 1966

4. Biji M., Biji E., Lilea E., Anghelache C.- Tratat de statistic, Editura Economic, Bucureti, 2002

5. Bdi, M., Cristache, S.- Statistic- Aplicaii practice, Editura Mondan, 1998

6. Isaic-Maniu, Al, Mitru, C.,Voineagu V.- Statistica pentru managementul afacerilor,Editura

economic, Bucureti, 1997

7. Jaba, E.- Statistic,Editura Economic, Bucuresti, 1998

8. Mihoc, Gh., Craiu, V.- Tratat de statistica matematica, Editura Academiei, Bucuresti, 1976

9. Trebici, V.- Mic enciclopedie de statistic,Editura tiinific, Bucureti, 1985

712

- Enhanced Data Analysis Using SAS ODS Graphics and Statistical GraphicsUploaded byarijitroy
- The Effects of Employer Knowledge and Product Awareness on Job Seekers’ Application DecisionsUploaded bySunway University
- Projection and RegressionUploaded byapi-26344229
- Predicting Student Proficiency Test ScoresUploaded byJobeer Dahman
- accounting and finaceUploaded byLan Anh
- MPRA Paper 13560Uploaded byhuweida
- Thera-band Elastic Band Tension ReferenceUploaded byLuiz Santos
- Curve FittingUploaded byStephen Winhoven
- 6_257-275Uploaded byAmi Ahmad
- Introduction to SPSSUploaded byganeshantre
- FHMM1034 Chapter 5 Correlation and Regression (Student Version)Uploaded byAllen Fourever
- pdf (11)Uploaded byclaudia_claudia1111
- nfsc440-and evidence analysis worksheetUploaded byapi-242439244
- Research Paper-Correlation Between Math and Science and Visual ArtsUploaded byaloyswims
- QNT 561 Final Exam : QNT 561 Final Exam Answer | StudentehelpUploaded bystudentehelp8
- Chapter 13Uploaded byLaura Matevosyan
- 2898-11934-1-PBUploaded byPatrickdz
- 154Uploaded byCarlos Oliva
- Uji Reabilitas,Path Analysis,Asumsi Klasik,Sem PlsUploaded byMurdhiyati Hilma Purba
- sssUploaded byravirajmistry
- nlreg10EUploaded byflgrhn
- Gpaand SatUploaded byJason Xie
- ifr_jdaUploaded bydineshwarramdhony
- 00158Uploaded byGiora Rozmarin
- Statistic Affect of Youtube and Product Variation.docxUploaded byMichel Vincencia
- econ4400wpaperfinal2.docxUploaded byGerald Gutwein
- Regression Lecture 9Uploaded bydan
- Intro to Linear Regression.pdfUploaded bydiegocue
- Instrumente de Plata Bancare CardulUploaded byDenis Ioniţă
- Factors Affecting the Vitality of Employees in Water and Wastewater Company of Tehran Province, IranUploaded byTI Journals Publishing

- Managing ChangeUploaded byhassan_401651634
- 6556-24662-1-PBUploaded byhassan_401651634
- postgradcvsresearch_08.pdfUploaded bydaskhago
- PhD Academic CVUploaded byAndreeaAlexandraAnton
- Strategic Analysis of Marks and Spencer (M&S) GroupUploaded byhassan_401651634
- swot analysisUploaded byhassan_401651634
- Consultancy and How to Write a Consultancy ReportUploaded byhassan_401651634
- DBA SIA Module BookletUploaded byhassan_401651634
- Human Resource managementUploaded byhassan_401651634
- Fiancial System and AuditingUploaded byhassan_401651634
- 1.pdfUploaded byhassan_401651634
- Ryanair FY2017 Annual ReportUploaded byhassan_401651634
- Harvard ReferencingUploaded byhassan_401651634
- Business OrganisationUploaded byhassan_401651634
- corporate financial managementUploaded byhassan_401651634
- Reward Strategy ImplementationUploaded byhassan_401651634
- computer platform........new.docUploaded byhassan_401651634
- Marks And Spencer One Of UK's Leading RetailersUploaded byhassan_401651634
- Business Analysis ReportUploaded byhassan_401651634
- f2Uploaded byhassan_401651634
- Assignment Computer PlatformsUploaded byhassan_401651634
- New Microsoft Word Document (3)Uploaded byhassan_401651634
- Microsoft WordUploaded byhassan_401651634

- 1. FunctionsUploaded bykabil86
- Maths Quest 11 Advanced General Mathematics (Spec) Classpad EditionUploaded byJames Glare
- FFTUploaded byAmol Jagtap
- IB CHALLENGE - DifferentiationUploaded bymakunjap
- 79665_CH03_PASS02Uploaded byDamonMagdali
- Algorithm & Data Structures Lec1(BET)Uploaded byXafran Khan
- GEOMERTY OF THE PHYSICAL PHASE SPACE IN QUANTUM GAUGE SYSTEMSUploaded byahsbon
- polynomialsUploaded byCacait Rojanie
- Mat187 Textbook Ch16Uploaded byTushar Gupta
- sexagenry 1Uploaded byGolden Path Exponential
- math 1010 linear programming project-form bUploaded byapi-242801115
- CH.2 VectorsUploaded byMohammed B. Ahmed
- Anne Watson, John H. Mason,-Mathematics As A Constructive Activity_ Learners Generating Examples (Studies in Mathematical Thinking and Learning Series) (2005).pdfUploaded byHudi Tama
- Laplace Transform Good RevisionUploaded byraymondushray
- ANOVA Matlab Instructions.pdfUploaded byDEEPAK
- Levene Test of Variances (Simulation)Uploaded byscjofyWFawlroa2r06YFVabfbaj
- Coordinate GeometryUploaded byitsankurz
- 2015_Jee-Advanced_Paper-1 and Paper-2 Weightage Analysis.pdfUploaded byrajesh laddha
- demelo-strienUploaded byYeltsin Acahuana
- Maths formula TrignometryUploaded byVivek Singh
- symmative assesment 2 mathsUploaded bysourav_ravsou_dash
- VectorsUploaded bydam_allen85
- journalofphiloso07lancuoft_bwUploaded byPantheon Pantheon
- azlynn end of year reportUploaded byapi-155615284
- affine.pdfUploaded byKabibKun
- 03_Planes.docxUploaded byJose
- 02 - Employability Skills ManualUploaded byAnton Fatoni
- Assignment on Random VariablesUploaded byRajaRaman.G
- 9_9. Finite Element Theory for Nonlinear MaterialsUploaded bysaxlamag
- L8 - Example for Jacobian of RobotsUploaded byZul Fadhli