You are on page 1of 9

Basic terms

• Independent variables: Data that can be controlled directly.


• Dependent variables: Data that cannot be controlled directly.
Experiment 1: You want to figure out which brand of microwave popcorn pops the most kernels so you can
get the most value for your money. You test different brands of popcorn to see which bag pops the most
popcorn kernels.
•Independent Variable: Brand of popcorn bag (It’s the independent variable because you are actually
deciding the popcorn bag brands)
•Dependent Variable: Number of kernels popped (This is the dependent variable because it's what you
measure for each popcorn brand)

Experiment 2: You want to see which type of fertilizer helps plants grow fastest, so you add a different brand
of fertilizer to each plant and see how tall they grow.
•Independent Variable: Type of fertilizer given to the plant
•Dependent Variable: Plant height
Regression Analysis?
• predictive modelling technique
• Used for analysing data
• investigates the relationship between dependent and independent
variable
Example : student performance in exam and number of lectures he
attends in class.
Why do we use Regression Analysis?

• You want to predict whether a particular person will be affected by


cancer in future or not based on current data.
• There will be parameters / attributes
• You need to find out which parameter plays a vital role in identifying
cancer.
• significant relationships
• strength of impact
Linear Regression
• dependent variable is continuous, independent variable(s) can
be continuous or discrete, and nature of regression line is linear
• Linear Regression establishes a relationship between dependent
variable (Y) and one or more independent variables (X) using a best
fit straight line
Last year, five randomly selected students took a math aptitude test before they began

their statistics course. The Statistics Department has three questions.

• What linear regression equation best predicts statistics performance, based on math
aptitude scores?

• If a student made an 80 on the aptitude test, what grade would we expect her to make
in statistics?

• How well does the regression equation fit the data?


Assume that there are 5 students selected
Student xi yi (xi-x) i.e. A (yi-y) i.e.B
1 95 85 17 8
2 85 95 7 18
3 80 70 2 -7
4 70 65 -8 -12
5 60 70 -18 -7
Sum 390 385
Mean 78 77

xi == scores on the aptitude test A == Xi-xbar


yi ==statistics grades. B== Yi-ybar
Stud B 2(y -
ent xi yi A2 i.e(xi-x) y)2 i Student xi yi (xi-x)(yi-y)

1 95 85 289 64 1 95 85 136

2 85 95 49 324 2 85 95 126

3 80 70 4 49 3 80 70 -14

4 70 65 64 144 fcs4 70 65 96

5 60 70 324 49
5 60 70 126
Sum 390 385 730 630
Sum 390 385 470
Mea 78 77
n Mean 78 77
The regression equation is a linear equation of the form: ŷ = b0 + b1x . To conduct a regression
analysis, we need to solve for b0 and b1

we solve for the regression coefficient (b1):


b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]
b1 = 470/730
b1 = 0.644
the regression equation is: ŷ = 26.768 + 0.644x .

Once we know the value of the regression coefficient (b1),


we can solve for the regression slope (b0):
b0 = y - b1 * x
b0 = 77 - (0.644)(78)
b0 = 26.768
The dependent variable is the student's statistics grade. If a
student made an 80 on the aptitude test, the estimated
statistics grade (ŷ) would be:
• ŷ = b0 + b1x
ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80
ŷ = 26.768 + 51.52 = 78.288

the aptitude test scores used to create the regression equation ranged from 60 to 95. Therefore,
only use values inside that range to estimate statistics grades. Using values outside that range
(less than 60 or greater than 95) is problematic.

You might also like