11 views

Uploaded by km3197

STAT 101

- ch03-solns-all_skuce_2e
- 10 the Role of Service Marketing Mix and Its
- Regression Analysis in SPSS
- Correlation and Linear Regression
- g 010224348
- Basic Concepts of Measurement
- Global Autopoietic University PDF
- Biostatistics Notes: Correlation and simple linear Regression
- Hatting h 2013
- Untitled
- Quinn Kojis Estuarine, Coastal & Shelf Science 1985
- BS
- Variance Lecture
- Regression
- Chapter 10
- Introduction Toe Views
- Relationship between Nutrients and Calories
- Rm Jury Rhea
- Chapter 4
- math 1040 final project

You are on page 1of 30

Correlation

and

regression

Introduction to Simple Linear Regression Analysis

[Motivation]

Many studies are concerned with the analysis of the relationship between two variables.

Some focus on studying the degree and the type and direction of association. Others go

beyond describing the relationship, and aim at predicting the value of one variable using

the value of the other.

temperature

advertising

costs sales

reliability of

exam gross components

score domestic

number product (GDP)

of hours

of sleep your

carbon you

dioxide (CO2) crush

emissions

[Learning|Objectives]

By the end of this chapter, each student is expected:

To know the properties and limitations of correlation;

To find the equation of a regression line;

To clear some misconceptions on correlation and regression.

To model the world

[introduction]

Last chapter, we studied how to analyze the relationship between categorical variables.

Now, we will look more closely at the analysis of the relationship between two continuous

variables. Specifically, we will discuss Correlation analysis and simple linear Regression

analysis.

Correlation

Analysis

regression

Analysis

aims to gain an insight on focuses on revealing the

the strength of the linear form of the linear

relationship between relationship between

variables variables

[correlation|analysis]

Objective: To measure the strength and direction of a linear association between two

variables; To measure the covariation that is present between the two variables (i.e. how

the two variables change relative to each other)

temperature, in reliability of

number of hours exam score (Yi) advertising sales, in million degree Celsius component (Yi)

of sleep (Xi) costs, in pesos (Yi) (Xi)

3.4 82 thousand pesos

45 0.78

(Xi)

5.6 89 210 5.32 32 0.54

973 9.12

761 3.76

[SCATTER|DIAGRAM]

Objective: To help you visualize the possible underlying linear relationship between two

variables.

Example: The following data were obtained in a study of the relationship between the

number of hours of sleep of a student and score in an examination.

number of hours exam score (Yi) number of hours exam score (Yi) number of hours exam score (Yi)

of sleep (Xi) of sleep (Xi) of sleep (Xi)

[SCATTER|DIAGRAM]

number of hours exam score (Yi) Using Microsoft Excel,

of sleep (Xi)

1. Highlight data.

2.75 89.5 2. Click Insert, then choose Scatter.

2.15 86.3

4.41 92.2

5.52 96.5

3.21 87.2

4.32 87.7

2.31 88.3

We can see from the scatter diagram that the points form an

4.3 90.3

upward trend. By visual inspection, we can say that the

3.71 88.7 number of hours of sleep (X) and score in the exam (Y) are

possibly linearly related with each other.

[Linear|correlation|coefficient]

A summary measure that can be used to describe the degree and direction of the linear relationship between two

continuous variables is the linear correlation coefficient.

strength of the linear relationship existing between two variables, X and Y, that is

independent of their respective scales of measurement.

= =

X Y Var(X) Var(Y

[Linear|correlation|coefficient]

The linear correlation coefficient possesses the following interesting properties:

1. A linear correlation coefficient can only assume values between -1 and 1, inclusive of end points.

o -1 < < 1.

2. The sign of describes the direction of the linear relationship between X and Y.

o A positive means that the line slopes upward to the right, and so as X increases, the value of Y also increases.

o A negative means that it slopes downward to the right, and so as X increases, the value of Y decreases.

o A value of = 0, however, does not mean a lack of association. It is possible to obtain a zero correlation even if

the two variables are related, though their relationship is nonlinear, such as a quadratic relationship.

o All the points (x,y) fall on a straight line.

o A close to 1 or 1 indicates a strong linear relationship

5. A strong linear relationship does not necessarily imply that X causes Y or Y causes X.

o It is possible that a third variable may have caused the change in both X and Y, producing the observed

relationship

[ON|CORRELATION|&|CAUSALITY]

It is of interest to differentiate correlation from causality as this

is a common mistake.

necessarily imply causation.

Criteria for Causality

1. Covariation correlation

3. Nonspuriousness no alternative

explanations

4. *Specification of a mechanism

[Linear|correlation|coefficient]

A point estimator of is the Pearson product moment correlation coefficient, which is

denoted by r.

n

n n

n XiYi Xi Yi

r

i1 i1 i1

n n

2

n

n

2

n Xi2 Xi n Yi2 Yi

i1 i1 i1 i1

Just like , when r is -1 or 1, all the collected data points fall on a straight line.

Similarly, when r is 0, the points are scattered and give no evidence of a linear relationship.

Any other value of r suggests the degree to which the points tend to be linearly related.

An alternative form for r is (Xi X)(Yi Y)

.

2 2

(Xi X) (Yi Y)

[illustrations]

Positive No apparent

Linear Linear

Correlation Correlation

(r is near 1) (r is near 0)

Negative

Linear Quadratic

Correlation Relation

(r is near -1) (r is near 0)

[EXAMPLE]

number of exam score Compute for the Pearson product moment correlation coefficient

hours of sleep (Yi)

(Xi) and interpret.

9 9

2.75 89.5 n=9 Xi = 32.68 Yi = 806.7

2.15 86.3 9 i=1 i=1

Xi Yi = 2951.068 9 9

4.41 92.2

i=1 Xi 2 = 128.6602 Yi 2 = 72384.83

5.52 96.5 i=1 i=1

3.21 87.2 9 2951.068 (32.68)(806.7)

r= = 0.7845

4.32 87.7 2 2

(9 128.6602 32.68 ) (9 72384.83 806.7 )

2.31 88.3

The value of r = 0.7845 supports our earlier claim based on the scatter

4.3 90.3 diagram that X and Y are positively linearly correlated. Being positively

correlated, as the number of hours of sleep increases, the score in the

3.71 88.7 examination also increases.

[TEST|OF|HYPOTHESIS|for|rho]

Null hypothesis Alternative hypothesis Test Statistic Critical Region

(Ho) (Ha)

> 0 t= t > t, n-2

0 1 r2 |t| > -t/2, n-2

Consider the Sleep-Exam example. Suppose that the linear correlation between X and Y in the

past is 0.75. We want to determine if the correlation has significantly increased compared to the

past Use a 0.05 level of significance.

Ho: = 0.75

Ha: > 0.75

= 0.05

[TEST|OF|HYPOTHESIS|for|rho]

Test Statistic:

r 0 n2 0.7845 0.75 92

t= = = 0.147193

1 r2 2

1 0.7845

Conclusion: At 5% level of significance, we do not have sufficient evidence to say that the

correlation has significantly increased compared to the past.

[APPLICATION]

Check this out! www.guessthecorrelation.com.

[SIMPLE|LINEAR|REGRESSION|ANALYSIS]

Objective: To evaluate the relative impact of a predictor on a particular outcome.

In this section, we deal with the case where one continuous variable is linearly regressed

with another continuous variable.

temperature, in reliability of

number of hours exam score (Yi) degree Celsius component (Yi)

advertising sales, in million

of sleep (Xi) (Xi)

costs, in pesos (Yi)

3.4 82 thousand pesos 45 0.78

(Xi)

5.6 89 210 5.32 32 0.54

973 9.12

761 3.76

[SIMPLE|LINEAR|REGRESSION|ANALYSIS]

The simple linear regression model is given by the equation:

Yi = o + 1Xi + i

where Yi is the value of the response variable for the ith element;

Xi is the value of the explanatory variable for the ith element;

o is a regression coefficient that gives the y-intercept of the regression line;

1 is a regression coefficient that gives the slope of the line;

i is the random error term for the ith element

where the i s are independent, normally distributed with mean 0 and

variance 2 (constant) for i = 1,2,,n

n is the number of elements.

[SIMPLE|LINEAR|REGRESSION|ANALYSIS]

E(Y) = o + 1Xi

This function is known as the regression equation, and this function makes it easy to

interpret the parameters o and 1.

1 gives the amount of change in the mean of Y (whether positive or negative, depending

on the sign) for every unit increase in the value of X, hence the name slope.

E(Y) = o + 1Xi

y = b+mx

[SIMPLE|LINEAR|REGRESSION|ANALYSIS]

i

A random error term may be though of as a representation of the effect of other factors,

that is, apart from X, not explicitly stated in the model but do affect the response variable

to some extent.

Now, even if a response variable can be predicted adequately by using only one

explanatory variable, there remains an inherent and inevitable variation present in the

response variable.

Lastly, the random error term accounts for the measurement errors in recording the value

of the response variable.

In short, we dump into the random error term the effects of all other factors apart from X

that explains the variation that we observe in the realized values of Y.

[SIMPLE|LINEAR|REGRESSION|ANALYSIS]

The random error is the vertical gap

between the ith observation and the blue

line. i is a random variable and we will

never know its realized value because 0

and 1 are unknown.

random variables. For any fixed value of X,

these random variables are normally

distributed. The mean of any i is 0 and its

variance is 2. That is, we do not allow that

the variation in the values of is to differ

for the different values of X.

[SIMPLE|LINEAR|REGRESSION|ANALYSIS]

Steps in doing Simple Linear Regression Analysis

2. Evaluate the equation to determine the strength of the relationship for prediction and estimation.

3. Determine if the assumptions on the error terms are satisfied.

4. If the model fits the data adequately, use the equation for prediction and for describing the nature

of the relationship between the variables.

The process of obtaining the equation that best fits the data requires estimating the unknown

regression coefficients, 0 and 1.

There are several ways of deriving estimates for these regression coefficients but we will use the

method of least squares.

[METHOD|OF|LEAST|SQUARES]

In the method of least squares, the best-fitting line is selected as the one that minimizes the sum of

squares of the deviations of the observed value of Y from its expected value. Thus, the least

squares criterion considers the deviation:

i = Yi E(Yi) = Yi (0 + 1Xi)

and requires that our estimates for 0 and 1 are those values for which the sum of the squares of

these deviations, i 2 , is smallest. Based on this criterion, the following formulas are obtained:

n

n n

n XiYi Xi Yi

i1 i1 i1

b1 2

bo y b1 x

n

n

n Xi Xi

2

i1 i1

Thus, the estimated regression equation is given by Y= bo + b1 X.

Chapter 10 Correlation and regression Introduction to Simple Linear Regression Analysis

[EXAMPLE]

Find the estimated regression equation of the data on the number of hours of sleep (X) and score in

an examination (Y). Interpret the coefficients. Predict the score of the student if his hours of sleep is

5. Lastly, compute for the coefficient of determination and interpret.

9 9

Recall: n=9 Xi = 32.68 Yi = 806.7

9 i=1 i=1

Xi Yi = 2951.068 9 9

i=1 Xi 2 = 128.6602 Yi 2 = 72384.83

i=1 i=1

9 2951.068 (32.68)(806.7)

b1 = = 2.1861

9 128.6602 32.68 2

b0 = 2.1861 = 81.6954 score= 81.6954 + 2.1861(hours of sleep).

9 9

[EXAMPLE]

Find the estimated regression equation of the data on the number of hours of sleep (X) and score in

an examination (Y). Interpret the coefficients. Predict the score of the student if his hours of sleep is

5. Lastly, compute for the coefficient of determination and interpret.

Interpretation:

For every unit increase in the students number of hours of sleep, there is a 2.19 unit increase in the

mean score in the examination.

When the student has no sleep (that is, X = 0), the mean score in the examination is 81.70.

The predicted score of the student having 5 hours of sleep is given by:

[graphical|representation]

[PREDICTING|THE|VALUE|OF|Y]

The estimated regression equation is appropriate only for the relevant range of X. This

includes only the values of X used in developing the regression model. Hence, when

predicting Y for a given value of X, one may interpolate only within the relevant range of the

X values. On the other hand, extrapolation to predict Y for values of X outside the relevant

range can result in a serious prediction error.

[COEFFICIENT|of|determination]

The coefficient of determination, denoted by R2, is defined as the proportion of the variability in the

observed values of the response variable that can be explained by the explanatory variable through

their linear relationship.

The Pearson correlation coefficient between two variables X and Y may be used in simple linear

regression analysis as a descriptive statistic to measure the strength of the linear relationship

between two variables.

However, a more meaningful descriptive statistic that may be used to assess the goodness-of-fit

of the linear regression model is obtained by squaring the Pearson correlation, r.

This value is expressed in terms of percentage so that we may interpret the value to be the

percentage of variability in the response variable that is explained by the explanatory variable

through the model.

Although the term explained may seem to imply causality, we clarify that the relationship between

the variables need not be causal.

0 R2 1.

If a model has perfect predictability, then R2 = 1.

If a model has no predictive capability, then R2 = 0.

[EXAMPLE]

Find the estimated regression equation of the data on the number of hours of sleep (X) and score in

an examination (Y). Interpret the coefficients. Predict the score of the student if his hours of sleep is

5. Lastly, compute for the coefficient of determination and interpret.

Interpretation: 65.14% of the variability in the examination score can be explained by the number of

hours of sleep of the student through the model.

[EXAMPLE]

Find the estimated regression equation of the data on the number of hours of sleep (X) and score in

an examination (Y). Interpret the coefficients. Predict the score of the student if his hours of sleep is

5. Lastly, compute for the coefficient of determination and interpret.

[EXercise]

Suppose a researcher wishes to investigate the relationship between the achieved grade-point index (GPI) and the

starting salary of recent graduates majoring in business. A random sample of 30 recent graduates majoring in

Business is drawn, and the data pertaining to the GPI and starting salary (in thousands of dollars) are recorded for

each individual in the following table:

Starting Starting 1. Construct a scatter diagram for the given dataset.

Individual GPI Salary Individual GPI Salary What can you say about the relationship of GPI and

No. (X) (Y) No. (X) (Y) starting salary based on your visual inspection?

1 2.7 17.0 16 3.0 17.4

2 3.1 17.7 17 2.6 17.3

2. Compute and interpret the correlation coefficient.

3 3.0 18.6 18 3.3 18.1 3. Find the equation of the regression line. Interpret the

4 3.3 20.5 19 2.9 18.0 significant coefficients (at 10% level of significance)

5 3.1 19.1 20 2.4 16.2 4. Find an estimate for the starting salary if the

6 2.4 16.4 21 2.8 17.5 individuals GPI is 2.5.

7 2.9 19.3 22 3.7 21.3 5. Compute for the coefficient of determination. What

8 2.1 14.5 23 3.1 17.2 can you say about the models goodness-of-fit?

9 2.6 15.7 24 2.8 17.0

10 3.2 18.6 25 3.5 19.6

11 3.0 19.5 26 2.7 16.6

12 2.2 15.0 27 2.6 15.0

13 2.8 18.0 28 3.2 18.4

14 3.2 20.0 29 2.9 17.3

15 2.9 19.0 30 3.0 18.5

- ch03-solns-all_skuce_2eUploaded bygainesboro
- 10 the Role of Service Marketing Mix and ItsUploaded byVenkatesh Hegde
- Regression Analysis in SPSSUploaded byriungumartin
- Correlation and Linear RegressionUploaded byMudita Chawla
- Basic Concepts of MeasurementUploaded byRishab Mehta
- g 010224348Uploaded byIOSRjournal
- Global Autopoietic University PDFUploaded byAnte Lauc
- Biostatistics Notes: Correlation and simple linear RegressionUploaded bylauren smith
- Hatting h 2013Uploaded byAyban Wan
- UntitledUploaded byapi-262707463
- Quinn Kojis Estuarine, Coastal & Shelf Science 1985Uploaded byJacque C Diver
- BSUploaded bysareenck
- Variance LectureUploaded byLennard Pang
- RegressionUploaded byluispedro1985
- Chapter 10Uploaded byCHloe Pang
- Introduction Toe ViewsUploaded byImas Sayu Idris
- Relationship between Nutrients and CaloriesUploaded byUtkarsh Sengar
- Rm Jury RheaUploaded byVishal Choudhary
- Chapter 4Uploaded byHow doyoudo
- math 1040 final projectUploaded byapi-242645250
- efeitos da difusão das ticUploaded byanasilveiro
- statsUploaded byapi-325732489
- Chapter 2 correlation and regression.docUploaded byNilesh Mandlik
- 0411018Uploaded byHassanMubasher
- MCC 202.docxUploaded byRon Opulencia
- T-tests(1)Uploaded byWork Place
- Doctoral Seminar1Uploaded byabekasilvance
- New Methods of Dating in ArchaeologyUploaded bylord azrael
- CombinedGroupQuestions-exam1Uploaded byIftekharul Mahdi
- 17cbUploaded byShubhamAgarwal

- FN 110 Syllabus AY 2018-2019 1st SemUploaded bykm3197
- Suspension TrainingUploaded bykm3197
- Ss-161 Argonza Nartates Santos Group-rrlUploaded bykm3197
- SS161 Case Study Research Consent Form (2)Uploaded bykm3197
- FN 110 Meal Plan Using FELUploaded bykm3197
- NTP Agenda Initial Assessment Sheet2Uploaded bykm3197
- November 8, 2017 SS 105Uploaded bykm3197
- coreUploaded bykm3197
- ChemUploaded bykm3197
- RRL Sentence Outline TemplateUploaded bykm3197
- Coenzyme Q10Uploaded bykm3197
- ES10 LP2Uploaded bykm3197
- Paper 1_ Class Facilitation ReportUploaded bykm3197
- Paper 2_ Correlation StudyUploaded bykm3197
- Correlation and RegressionUploaded bykm3197
- Lec 3.5 Impulse and MomentumUploaded bykm3197
- 104 grp 11111Uploaded bykm3197
- 160 Discussing BoredomUploaded bykm3197
- Lec 3.3 Newtons Second LawUploaded bykm3197
- chap14 (1)Uploaded bykm3197
- Lec 3.4 Work and Energy PrinciplesUploaded bykm3197
- Lec 3.1 Kinematics of ParticleUploaded bykm3197
- Ss 105Uploaded bykm3197
- Sq 3 and Sq 4 Answer KeyUploaded bykm3197
- ES10 Write UpUploaded bykm3197
- Research Poster ContentUploaded bykm3197
- 1.0 Eiffel Tower vs FemurUploaded bykm3197
- FITT Principle SS 140Uploaded bykm3197
- ATP SS 140Uploaded bykm3197
- Energy Transfer Primer SS 140Uploaded bykm3197

- Mean Absolute ErrorUploaded byCinthya
- CHAPTER 6_correlation and RegressionUploaded byJeetu Chauhan
- Bayesian methods in data analysisUploaded byjoehague
- ACC Letter - By - 159 - Cube Test ResultUploaded byNawar Zaino
- statistical_toolbox_manual.pdfUploaded byLyly Magnan
- Lampiran AnalisisUploaded byRaja Abdul Afif
- Contents - MLI-02 Information Sources, Systems and ServicesUploaded byRajeswari
- ENTRY PLAN FOR A SOLAR PRODUCT MANUFACTURING COMPANY.pdfUploaded bychdi
- Data StyleUploaded byhb40779
- OPM Topic 6 ForecastUploaded bySyed Yawer Imam
- Saurabh KumarUploaded byshannbaby22
- ch07Uploaded byPetru Madalin Schönthaler
- DS CourseCurriculum v2Uploaded byluip1234
- Chapter 6.pdfUploaded byLevi Pogi
- ACCA Relevance of accounting research 2010Uploaded byElzi Rahma
- Analysis Of Nutrition Facts In Food ProductsUploaded byshivam wadhwa
- beanplotUploaded byDiego Chávez
- Quiz 9 Hypothesis Testing for Two PopulationsUploaded byvuduyduc
- Quiz c345 a161.pdfUploaded bySyai Genj
- Qualitative Studies in Information SystemsUploaded bymirmahdi4728
- 00 datacodingUploaded byAndrés Abad
- 31. Regression Analysis 1-1-11Uploaded byAni Krishna
- Lab9Uploaded bysaras
- 231431084 Krajewski Ism Ch13 SolutionUploaded byBryan Seow
- MODULE 6_7 Application AssignmentsUploaded byLROBEIII
- DISS 700 Homework 8 Ezana D. AimeroUploaded byaddislibro
- Machine Learning With MatlabUploaded byAjit Kumar
- Doherty 2010 - An Examination of Impediments to Knowledge Sharing in an IntelligenceUploaded byNiemand88
- Adam Sloope ResumeUploaded byasloope
- Factor AnalysisUploaded byhsrinivas_7