Simple Regression Model: Conference Paper

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/329611627
Simple regression model
Conference Paper · May 2014
CITATIONS READS
0 576
2 authors:
Mercedes Orús-Lacort Christophe Jouis

Independent researcher. Université de la Sorbonne Nouvelle Paris 3 & EHESS & CAMS-CNRS
228 PUBLICATIONS 6 CITATIONS 226 PUBLICATIONS 185 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
The story of my preprint (future article) tittled “Fermat Last Theorem Revisited” View project
Multidisciplinary researches and articles View project
All content following this page was uploaded by Mercedes Orús-Lacort on 13 December 2018.
The user has requested enhancement of the downloaded file.

Simple linear regression
1. What is the purpose of the simple linear regression?
Occasionally, we have two quantitative variables that may be related, and what we
intend to study is: can we predict the value of one of them from the known values of the
other?.
To study it, the steps that we follow are:
 Draw a graph where appear each variable data, this graph is called "Scatter
plot".
 Calculate the correlation coefficient of Pearson.
 Calculate a formula which will allow us to predict the value of one of these
variables from the another, this formula "Regression line" is called.
 We studied if we can consider the regression line as valid. For do it, we resolve
hypothesis test, and we calculate a ratio called "Adjustment coefficient of
goodness" (or also called R-squared, or coefficient of determination).
Let's see then what are the scatterplots.
Suppose we want to provide the Benefits of a company from Spending on Advertising.

We will call Y to the variable Benefits (which I expected) and X to the variable
Advertising.
The variable Y is called dependent variable and the variable X is called independent
variable.
The values of the two variables that we are studying are represented in this diagram.
And we may find with situations like that you will see below:
First situation:
In this case you may observe that:
- The points are close together: This means that there is a strong relationship between
the two variables.
- Also you may observe they are right-oriented: This means that both variables
are related directly proportional, i.e. when it increases spending on Advertising,
also increase the Benefits.
Second situation:
- The points are not very close together: This means that there is not a strong relation
between the two variables, but if we calculate the regression line, this will not adjust
very well.
- Also you may observe the right-oriented: This means that both variables are related
directly proportional, i.e. when it increases spending on Advertising, also increase the
Benefits.
Third situation:
- The points are very dispersed: This means that there is no relation between the two
variables, and that it wouldn't make any sense calculate a regression model.
Fourth situation:
- The points are close together: This means that there is a strong relationship between
the two variables.
- Also you may observe they are left-oriented: This means that both variables are related
inversely proportional, i.e. when it increases spending on Advertising, then decrease the
Benefits.
2. Calculation of the correlation Pearson coefficient
If we have data from two random variables that we think that they may be related, the
mode to confirm if that relationship exists or not, is to calculate the correlation
coefficient of Pearson rxy. The value of this coefficient is always between - 1 and 1.
To calculate it, we use the following formula:

1
S n1
 (xi  x)(yi  y)
rxy  XY  
SX S Y 1 1
n1
 (xi  x) n  1  (yi  y)
2 2
1
n1
 (xi  x)(yi  y)
 
1 1
n1
 (xi  x) n  1  (yi  y)
2 2
1
n1
 (xi  x)(yi  y)  (xi  x)(yi  y) 
 
1
n1
 (x i  x)2  (yi  y)2  (x i  x)2  (yi  y)2

 xiyi  y xi  x  yi nxy
 
 xi2  2x  xi  nx  yi2  2y yi  ny
2 2

If rxy is close to 1  X and Y correlated directly proportional.
If rxy is close to - 1  X and Y correlated inversely proportional.
If rxy is close to 0  X and Y not correlated.
Important: The sign (positive or negative) of this coefficient, depends on how it

came out focused our scatter diagram: If it came out to the right-oriented, then
the sign of the coefficient is positive, while if it came out the left-oriented then
the sign of the coefficient is negative, and if the diagram was dispersed, this
coefficient will have a value close to 0. That is to say:
First situation:
In this case, rxy will have positive sign, and its value would be close to 1, e.g. rxy = 0976.
Second situation:
In this case, rxy will have positive sign, and its value would be not more close to 1, e.g.
rxy = 0,676.
Third situation:
In this case, rxy will have positive or negative sign and its value would be more close to
0 than 1, e.g. rxy = 0.215 or rxy = - 0.215.
Fourth situation:
In this case, rxy will have a negative sign, and its value would be close to - 1, for
example rxy = - 0,915.
3. Calculation of the simple linear regression model
It makes sense compute it when the correlation coefficient is close to 1 or – 1.
Using the regression line we can predict the value of one of the variables from the
other.
To the variable which we are going to predict its value (say it is Y), is called dependent
variable, and the other variable (say it is X) is called independent variable.
We intend, therefore, to find a formula of the type Y = a + b·X that will allow us to
predict the value of Y from the value of the X, so that, it fits the maximum
possible cloud dispersion plot points.
For example, and according to the 4 situations we have seen above, we could
have:
First situation:
Second situation:
Third situation:
Fourth situation:
Calculation of the values of "a" and "b"
"b" is called a slope of a line, and its formula to calculate it is:
1
SXY n  1  (xi  x)(yi  y)  (x  x)(y  y) 
i i
b 2  
SX 1
 i(x  x) 2  (x  x)
i
2
n1

 x y  y x  x y nxy
i i i i
 x  2x x  nx 
i
2
i
2
And if we know the rxy value, we can calculate it as follows:

SY
b  rxy
SX
Once calculated the "b", "a" called y-intercept, it’s calculated as follows:
a  y  bx
4.- Hypothesis tests for the slope
To know if we can give valid regression model, we must resolve the following
hypotheses test:
Ho: β = 0
Ha: β ≠ 0
Where β represents the slop of the regression line.
To resolve this test, we calculate the statistic test which is a Student's t with
n - 2 degrees of freedom, by the following formula:
b b
t 
Sb 1 n
 (y  a  bxi )2
n  2 i1 i
n
 (x
i 1
i  x)2
where :
 b is the slope of regression line.

 Sb is the standard error estándar of the slope.
Let us note, that if give us the total values of the sums, and I do not know the values of
each value of the variable X and the Y, then, we will calculate the standard error as
shown below:
1 n
 (y  a  bxi )2
n  2 i1 i
Sb  
n
 (x
i 1
i  x) 2
1  n 2 n n n n

 
n  2  i1
y i  n·a 2
 b 2
 x i
2
 2a  y i  2b  x y
i i  2ab  xi 
i 1 i 1 i 1 i 1 

 x i
2
 2x  x i  nx
2

Then we take a decision:
 Through areas of acceptance and rejection of the null hypothesis:

We seek in the table statistics critics tn-2, α/2 and - tn-2, α/2, being α level of
significance.
 Calculate P Value:
P Value = 2·P (tn-2 > |t test|)
Therefore:
P Value > α  Accept Ho

P Value < α  Reject Ho and accept alternative
 Calculating the confidence interval for the slope of the regression line:
  tn2, /2·Standard Error of the slope
So, if 0 falls within the interval, the null hypothesis is accepted.
5.- Calculation coefficient R2
Another way to see if the model "fit well or not", is by calculating the coefficient R
square, or also called coefficient of determination or coefficient of goodness of fit. To
calculate it, we use the following formula:
R2 = rxy2
This ratio takes values between 0 and 1, so that:
If R2 is close to 0  the model doesn’t fit well

If R2 is close to 1  the model fits well
View publication stats

Simple Regression Model: Conference Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Regression Model: Conference Paper

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Simple regression model

Conference Paper · May 2014

Mercedes Orús-Lacort Christophe Jouis

SEE PROFILE SEE PROFILE

Multidisciplinary researches and articles View project

The user has requested enhancement of the downloaded file.

1. What is the purpose of the simple linear regression?

To study it, the steps that we follow are:

 Calculate the correlation coefficient of Pearson.

Let's see then what are the scatterplots.

Suppose we want to provide the Benefits of a company from Spending on Advertising.

In this case you may observe that:

In this case you may observe that:

In this case you may observe that:

In this case you may observe that:

2. Calculation of the correlation Pearson coefficient

To calculate it, we use the following formula:

If rxy is close to 1  X and Y correlated directly proportional.

If rxy is close to - 1  X and Y correlated inversely proportional.

If rxy is close to 0  X and Y not correlated.

Important: The sign (positive or negative) of this coefficient, depends on how it

3. Calculation of the simple linear regression model

It makes sense compute it when the correlation coefficient is close to 1 or – 1.

"b" is called a slope of a line, and its formula to calculate it is:

And if we know the rxy value, we can calculate it as follows:

4.- Hypothesis tests for the slope

Where β represents the slop of the regression line.

 b is the slope of regression line.

Then we take a decision:

 Through areas of acceptance and rejection of the null hypothesis:

P Value > α  Accept Ho

  tn2, /2·Standard Error of the slope

So, if 0 falls within the interval, the null hypothesis is accepted.

5.- Calculation coefficient R2

This ratio takes values between 0 and 1, so that:

If R2 is close to 0  the model doesn’t fit well

View publication stats

You might also like