Professional Documents
Culture Documents
Regression is a statistical technique that helps in qualifying the relationship between the
interrelated economic variables. The first step involves estimating the coefficient of the
independent variable and then measuring the reliability of the estimated coefficient. This
requires formulating a hypothesis, and based on the hypothesis, we can create a function.
If a manager wants to determine the relationship between the firm’s advertisement expenditures
and its sales revenue, he will undergo the test of hypothesis. Assuming that higher advertising
expenditures lead to higher sale for a firm. The manager collects data on advertising
expenditure and on sales revenue in a specific period of time. This hypothesis can be translated
into the mathematical function, where it leads to −
Y = A + Bx
Where Y is sales, x is the advertisement expenditure, A and B are constant.
After translating the hypothesis into the function, the basis for this is to find the relationship
between the dependent and independent variables. The value of dependent variable is of most
importance to researchers and depends on the value of other variables. Independent variable is
used to explain the variation in the dependent variable. It can be classified into two types −
Simple regression − One independent variable
Multiple regression − Several independent variables
Simple Regression:
Simple linear regression is a statistical method that allows us to summarize and study
relationships between two continuous (quantitative) variables:
The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Because the other terms are used less frequently today, we'll use the "predictor" and "response"
terms to refer to the variables encountered in this course. The other terms are mentioned only to
make you aware of them should you encounter them. Simple linear regression gets its adjective
"simple," because it concerns the study of only one predictor variable. In contrast, multiple linear
regression, which we study later in this course, gets its adjective "multiple," because it concerns
the study of two or more predictor variables.
As you may remember, the relationship between degrees Fahrenheit and degrees Celsius is
known to be:
F=95C+32
That is, if you know the temperature in degrees Celsius, you can use this equation to determine
the temperature in degrees Fahrenheit exactly.
Here are some examples of other deterministic relationships that students from previous
semesters have shared:
Circumference = π × diameter
Hooke's Law: Y = α + βX, where Y = amount of stretch in a spring, and X = applied weight.
Boyle's Law: For a constant temperature, P = α/V, where P = pressure, α = constant for each gas,
and V = volume of gas.
For each of these deterministic relationships, the equation exactly describes the relationship
between the two variables. This course does not examine deterministic relationships. Instead, we
are interested in statistical relationships, in which the relationship between the variables is not
perfect.
Here is an example of a statistical relationship. The response variable y is the mortality due to
skin cancer (number of deaths per 10 million people) and the predictor variable x is the latitude
(degrees North) at the center of each of 49 states in the U.S. (skincancer.txt) (The data were
compiled in the 1950s, so Alaska and Hawaii were not yet states, and Washington, D.C. is
included in the data set even though it is not technically a state.)
R2 =
RSS TSS
= 1 -
ESS TSS
R2 measures the proportion of the total deviation of Y from its mean which is explained by the
regression model. The closer the R2 is to unity, the greater the explanatory power of the
regression equation. An R2 close to 0 indicates that the regression equation will have very little
explanatory power.
5 Types of Regression and their
properties
Linear and Logistic regressions are usually the first modeling algorithms
that people learn for Machine Learning and Data Science. Both are great
since they’re easy to use and interpret. However, their inherent simplicity
also comes with a few drawbacks and in many cases, they’re not really
the best choice of regression model. There are in fact several different
types of regressions, each with their own strengths and weaknesses.
Linear Regression:
Polynomial Regression:
We can have some variables have exponents, others without, and also
select the exact exponent we want for each variable. However, selecting
the exact exponent of each variable naturally requires some knowledge
of how the data relates to the output. See the illustration below for a
visual comparison of linear vs polynomial regression.
Linear vs Polynomial Regression with data that is non-linearly separable
Each sample from the population generates its own intercept. To calculate the statistical
difference following methods can be used −
Two tailed test −
Null Hypothesis: H0: b = 0
Alternative Hypothesis: Ha: b ≠ 0
One tailed test −
Null Hypothesis: H0: b > 0 (or b < 0)
Alternative Hypothesis: Ha: b < 0 (or b > 0)
Statistic Test −
t =
(b - E(b)) SEb
b = estimated coefficient
E (b) = b = 0 (Null hypothesis)
SEb = Standard error of the coefficient
.
Value of t depends on the degree of freedom, one or two failed test, and level of significance.
To determine the critical value of t, t-table can be used. Then comes the comparison of the t-
value with the critical value. One needs to reject the null hypothesis if the absolute value of the
statistic test is greater than or equal to the critical t-value. Do not reject the null hypothesis, I the
absolute value of the statistic test is less than the critical t–value.