You are on page 1of 6

Correlation and Causation

Many studies and surveys consider data on more than one variable. For example, suppose
a study finds that, over the years, the prices of burgers and fries have both increased. Does this
mean that an increase in the price of burgers causes an increase in the price of fries? To answer
questions like this, we need to understand the difference between correlation and causation.

Correlation means there is a relationship or pattern between the values of two


variables. A scatterplot displays data about two variables as a set of points in the xy, y-plane and
is a useful tool for determining if there is a correlation between the variables.

Causation means that one event causes another event to occur. Causation can only be
determined from an appropriately designed experiment. In such experiments, similar groups
receive different treatments, and the outcomes of each group are studied. We can only conclude
that a treatment causes an effect if the groups have noticeably different outcomes.

Correlation
Bivariate Data
✔ Bivariate data is paired data. It is usually represented by an independent variable and a
dependent variable.

Scatter Diagram
✔ Rectangular coordinate
✔ Two quantitative variables, one variable is called independent (X) and the second is
called dependent (Y)
✔ It can be used to determine whether a linear (straight line) correlation exists between two
variables.

Types of Scatter Diagram


1. Positive (+) Correlation (As x increases, y tends to increase)

2. Negative (-) Correlation (As x increases, y tends to decrease)


3. No Correlation]
Correlation Analysis
Correlation Analysis
✔ The term correlation is used to describe the relationship between two variables
✔ Correlation quantifies the degree and direction to which two variables are related.
✔ Computing a correlation coefficient that tells how much one variable tends to change when the
other one does. When r is zero, there is no relationship. When r is positive, there is a trend that
one variable goes up as other one goes up. When r is negative, there is a trend that one variable
goes up as the other one goes down.

Pearson Product-Moment Correlation Coefficient (r)


✔ It measures the nature and strength between two variables of the quantitative type.
✔ The value of r ranges between ( -1) and ( +1)
✔ The value of r denotes the strength of the association as illustrated by the diagram.
Referring to our table of qualitative interpretation of r, we can conclude that the heights and weights of
the 20 students in the sample has a very high positive correlation.

Linear Regression
Regression is a descriptive statistical technique for finding the best-fitting straight line
between two variables. It is the line drawn through the points on a scatter plot to summarize the
relationship between the variables being studied.
Calculation of a Regression Line
In statistics, we can calculate a regression line for two variables if their correlation is “very
strong”, and their scatter plot shows a linear pattern. A regression line is a single line that best
fits the data (in terms of having smallest overall distance from the line to the points). This
technique for finding the best-fitting line is called simple linear regression analysis using the
least square method.
The Least-Squares Regression Line
The least-squares regression line for the set of bivariate data is the line that minimizes the
sum of the squares of the vertical deviations from each data point in line.

You might also like