You are on page 1of 22

Lesson 3.

1
Linear Regression

Commission on Higher Education


in partnership with University of Negros Occidental-Recoletos
College of Information Technology
SPECIAL TRAINING ON BUSINESS ANALYTICS
The best way to predict the future is to study the
past, or prognosticate.

- Robert Kiyosaki, Businessman


What is Bordeaux Wine?

https://www.youtube.com/watch?v=cRCWYj5f6z0
Bordeaux Wine
• Large difference in price and
quality between years,
although wine in produced in
a similar way
• Meant to be aged, so hard to
tell if wine will be good when
it is on the market
• Expert tasters predict which
ones will be good
• Can analytics be used to
come up with the different
system for judging wine?
Bordeaux Wine (cont.)
• On March 1990, Orley
Ashenfelter, a Princeton
economics professor, claims
he can predict wine quality
without tasting the wine.
• Ashenfelter uses a method
called linear regression
• Predicts an outcome variable,
or dependent variable
• Predicts using a set of
independent variables
Bordeaux Wine (cont.)
Dependent variable: typical price in 1990-1991
wine auctions (approximates quality)

Independent variables:
• Age (older wines are more expensive)
• Weather (Average Growing Season Temperature,
Harvest Rain, and Winter Rain)
Getting Started
Go to schoology.com > DAY THREE and
download wine.csv, then load it into RStudio
using the following command:

wine <- read.csv(file.choose())


str(wine) #view data structure
Summary(wine)#view stat summary
Getting Started (cont.)
model1 <- lm(Price ~ AGST, data=wine)
summary(model1)
Getting Started (cont.)
model2 <- lm(Price ~ AGST +
HarvestRain, data=wine)
summary(model2)
Getting Started (cont.)
model3 <- lm(Price ~ AGST +
HarvestRain + WinterRain + Age +
FrancePop, data=wine)
summary(model3)
Getting Started (cont.)
Getting Started (cont.)
model4 <- lm(Price ~ AGST +
HarvestRain + WinterRain + Age,
data=wine)
summary(model4)
Correlation
A measure of a linear relationship between
varaibles

+1 perfect positive linear relationship


0 no linear relationship
-1 perfect negative linear relationship
Correlation (cont.)
Correlation (cont.)
cor(wine$WinterRain, wine$Price)
[1] 0.1366505

cor(wine$Age, wine$FrancePop)
[1] -0.9944851

cor(wine) #all variables


Correlation (cont.)
model5 <- lm(Price ~ AGST +
HarvestRain + WinterRain,
data=wine)
Correlation (cont.)
Predictive Ability
• Our wine model had a value of R2 = 0.83

• Tells us our accuracy on the data that we


used to build the model

• But how well does the model perform on


new data?
• Bordeaux wine buyers profit from being
able to predict the quality of a wine years
before it matures
Predictive Ability (cont.)
Go back to schoology.com and download
wine_test.csv, then load to RStudio using the
following command:

wineTest <- read.csv(file.choose())


str(wineTest)

predictTest <- predict(model4,


newdata=wineTest)
predictTest
Predictive Ability (cont.)
R_SQUARED = 1 – (SSE/SST)

SSE <- sum((wineTest$Price –


predictTest)^2)
SST <- sum((wineTest$Price –
mean(wine$Price))^2)
1 – SSE/SST
[1] 0.7944278
Predictive Ability (cont.)
Variables Model Test
R2 R2
AGST 0.44 0.79
AGST, HarvestRain 0.71 -0.08
AGST, HarvestRain, Age 0.79 0.53
AGST, HarvestRain, Age, WinterRain 0.83 0.79
AGST, HarvestRain, Age, WinterRain, FrancePop 0.83 0.76
END
Thank You for Listening!

Commission on Higher Education


in partnership with University of Negros Occidental-Recoletos
College of Information Technology
SPECIAL TRAINING ON BUSINESS ANALYTICS

You might also like