You are on page 1of 88

Data Analytics

(BE-2015 Pattern)
Unit III-
Association Rules and
Regression
Syllabus
Advanced Analytical Theory and Methods:
Association Rules- Overview, a-priori algorithm,
evaluation of candidate rules, case study-
transactions in grocery store, validation and
testing, diagnostics.

Regression- linear, logistics, reasons to choose and


cautions, additional regression models.
Syllabus
Advanced Analytical Theory and Methods:
Association Rules- Overview, a-priori algorithm,
evaluation of candidate rules, case study-
transactions in grocery store, validation and
testing, diagnostics.

Regression- linear, logistics, reasons to choose and


cautions, additional regression models.
Association Rules
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Example: Transactions in a Grocery Store

Validation and Testing

Diagnostics
Overview
Association rules method
• Unsupervised learning method
• Descriptive (not predictive) method
• Used to find hidden relationships in data
• The relationships are represented as rules
Questions association rules might
answer
• Which products tend to be purchased together
• What products do similar customers tend to buy
Overview
• Example – general logic of association rules
Overview
Rules have the form X -> Y
• When X is observed, Y is also observed

Itemset
• Collection of items or entities
• k-itemset = {item 1, item 2,…,item k}
• Examples
• Items purchased in one transaction
• Set of hyperlinks clicked by a user in one session
Association Rules
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Example: Transactions in a Grocery Store

Validation and Testing

Diagnostics
Definition: Association Rule
Let D be database of transactions
– e.g.: Transaction ID Items
2000 A, B, C
1000 A, C
4000 A, D
5000 B, E, F

• Let I be the set of items that appear in the


database, e.g., I={A,B,C,D,E,F}
• A rule is defined by X  Y, where XI, YI,
and XY=
Definition: Association Rule
TID Items
 Association Rule
 An implication expression of the
1 Bread, Milk
form X  Y, where X and Y are 2 Bread, Diaper, Beer, Eggs
non-overlapping itemsets 3 Milk, Diaper, Beer, Coke
 Example:
4 Bread, Milk, Diaper, Beer
{Milk, Diaper}  {Beer}
5 Bread, Milk, Diaper, Coke
 Rule Evaluation Metrics
Example:
 Support (s)
 Number of transactions that {Milk, Diaper}  Beer


contain both X and Y out of
total number of transactions (
Milk
,
Diape
Beer
)2
 Confidence (c)

s 0
.
4
|T| 5

 Number of transactions that

contain both X and Y out of (Milk,


Diape
Beer
)2
total number of transactions
that contains X


c
(
Milk
,

Diape
) 3
0
.
67
Rule Measures: Support and
Confidence
Find all the rules X  Y with minimum confidence and
support
– support, s, probability that a transaction contains {X  Y}
– confidence, c, conditional probability that a transaction
having X also contains Y

TID Items Let minimum support 50%, and


100 A,B,C minimum confidence 50%, we have
200 A,C
300 A,D  A  C (50%, 66.6%)
400 B,E,F  C  A (50%, 100%)
Overview – Apriori Algorithm
• Apriori is the most fundamental algorithm
• Given itemset L, support of L is the percent of
transactions that contain L
• Frequent itemset – items appear together “often
enough”
• Minimum support defines “often enough” (% transactions)
• If an itemset is frequent, then any subset is frequent
Overview – Apriori Algorithm
• If {B,C,D} frequent, then all subsets frequent
The Apriori Algorithm
• Join Step: C is generated by joining L with itself
k k-1

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a

1/19/22
frequent k-itemset
• Pseudo-code:

Data Mining: Concepts and


Techniques
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
14
end
return k Lk;
The Apriori Algorithm — Example 1
MinSupp=0.5 (or 50% or 2)
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
Scan D

1/19/22
200 235 {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3

Data Mining: Concepts and


Techniques
C2 itemset sup C2 itemset
L2 itemset sup {1 2}
{1 2} 1 Scan D
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
15
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
The Apriori Algorithm — Example 2

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
The Apriori Algorithm — Example 2

Item Count Items (1-itemsets)


Bread 4
Coke 2
Milk 4 Pairs (2-itemsets)
Itemset Count
Beer 3
{Bread,Milk} 3 (No need to generate
Diaper 4
Eggs 1 {Bread,Beer} 2 candidates involving Coke
{Bread,Diaper} 3 or Eggs)
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
minsup = 3/5=0.6 Triplets (3-itemsets)

Ite m s e t C ount
{ B r e a d ,M ilk ,D ia p e r } 3
Association Rules
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Example: Transactions in a Grocery Store

Validation and Testing

Diagnostics
Evaluation of Candidate Rules
• Frequent itemsets from the previous
section can form candidate rules such
as X implies Y (X → Y).
• This section discusses how measures
such as confidence, lift, and leverage
can help evaluate the appropriateness
of these candidate rules
Evaluation of Candidate Rules
Confidence: says how likely item Y is purchased when item X is purchased, expressed as
{X -> Y}.


 Confidence( X ⇒ Y ) =

Lift: says how likely item Y is purchased when item X is purchased, while controlling for
how popular item Y is.


 lift( X ⇒ Y ) =

Leverage: Same as lift but instead of using a ratio, leverage uses the difference


 Leverage( X ⇒ Y ) = -
Evaluation of Candidate Rules
Confidence
•  Confidencemeasures the certainty of a rule
• Mathematically, confidence is the percent of transactions
that contain both X and Y out of all the transactions that
contain X
• Confidence( X ⇒ Y ) =

• Minimum confidence – predefined threshold


• Problem with confidence
• Given a rule X->Y, confidence considers only the antecedent (X) and
the co-occurrence of X and Y
• Cannot tell if a rule contains true implication
Evaluation of Candidate Rules
Lift
•• Liftmeasures
  how much more often X and Y occur together than expected
if statistically independent

• lift( X ⇒ Y ) =

• Lift = 1 if X and Y are statistically independent


• Lift>1 indicates the degree of usefulness of the rule

• Example – in 1000 transactions,


• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk-
>eggs) = 0.3/(0.5*0.4) = 1.5
• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk-
>bread) = 0.4/(0.5*0.4) = 2.0

• Therefore it can be concluded that milk and bread have a stronger association than
milk and eggs.
Evaluation of Candidate Rules
Leverage
• Leverage
  measures the difference in the probability of X and Y appearing
together compared to statistical independence

• Leverage( X ⇒ Y ) = -

• Leverage = 0 if X and Y are statistically independent


• Leverage > 0 indicates degree of usefulness of rule
• Example – in 1000 transactions,
• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400,
then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1
• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400,
then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2
• It again confirms that milk and bread have a stronger association
than milk and eggs.
Applications of Association Rules

The term market basket analysis refers to a specific


implementation of association rules
• For better merchandising – products to include/exclude from inventory
each month
• Placement of products
• Cross-selling
• Promotional programs—multiple product purchase incentives managed
through a loyalty card program

Association rules also used for


• Recommender systems – Amazon, Netflix
• Clickstream analysis from web usage log files
• Website visitors to page X click on links A,B,C more than on links D,E,F
Association Rules
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Example: Transactions in a Grocery Store

Validation and Testing

Diagnostics
>

Example: Grocery Store Transactions


1 The Groceries Dataset
Packages -> Install -> arules, arulesViz # don’t enter next line
> install.packages(c("arules", "arulesViz")) # appears on console
> library('arules')
> library('arulesViz')
> data(Groceries)
> summary(Groceries) # indicates 9835 rows

Class of dataset Groceries is transactions, containing 3 slots


1. transactionInfo # data frame with vectors having length of transactions
2. itemInfo # data frame storing item labels
3. data # binary evidence matrix of labels in transactions

> Groceries@itemInfo[1:10,]
> apply(Groceries@data[,10:20],2,function(r)
paste(Groceries@itemInfo[r,"labels"],collapse=", "))
Example: Grocery Store Transactions
2 Frequent Itemset Generation
To illustrate the Apriori algorithm, the code below does each iteration separately.
Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total

First, get itemsets of length 1


> itemsets<-
apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent
itemsets"))
> summary(itemsets) # found 59 itemsets
> inspect(head(sort(itemsets,by="support"),10)) # lists top 10
Second, get itemsets of length 2
> itemsets<-
apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent
itemsets"))
> summary(itemsets) # found 61 itemsets
> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Third, get itemsets of length 3


> itemsets<-
apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent
itemsets"))
> summary(itemsets) # found 2 itemsets
> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top


10 supported items
Example: Grocery Store Transactions
3 Rule Generation and Visualization

The Apriori algorithm will now generate rules.


Set minimum support threshold to 0.001 (allows more
rules, presumably for the scatterplot) and minimum
confidence threshold to 0.6 to generate 2,918 rules.
> rules <- apriori(Groceries,parameter =list (support=0.001, confidence=0.6, target="rules"))
> summary(rules) # finds 2918 rules
> plot(rules) # displays scatterplot

The scatterplot shows that the highest lift occurs at a low


support and a low confidence.
Example: Grocery Store Transactions
3 Rule Generation and Visualization
Example: Grocery Store Transactions
3 Rule Generation and Visualization

Get scatterplot matrix to compare the support, confidence, and lift of


the 2918 rules

> plot(rules@quality) # displays scatterplot matrix

Lift is proportional to confidence with several linear groupings.


Note that Lift = Confidence/Support(Y), so when support of Y remains
the same, lift is proportional to confidence and the slope of the linear
trend is the reciprocal of Support(Y).
Example: Grocery Store Transactions
3 Rule Generation and Visualization
Example: Grocery Store Transactions
3 Rule Generation and Visualization

Compute the 1/Support(Y) which is the slope


> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))

Display the number of times each slope appears in dataset


> unlist(lapply(split(slope,f=slope),length))

Display the top 10 rules sorted by lift


> inspect(head(sort(rules,by="lift"),10))

Rule {Instant food products, soda} -> {hamburger meat}


has the highest lift of 19 (page 154)
Example: Grocery Store Transactions
3 Rule Generation and Visualization

Find the rules with confidence above 0.9


> confidentRules<-rules[quality(rules)$confidence>0.9]
> confidentRules # set of 127 rules

Plot a matrix-based visualization of the LHS v RHS of rules


>
plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reor
der=TRUE))

The legend on the right is a color matrix indicating the lift and the
confidence to which each square in the main matrix corresponds
Example: Grocery Store Transactions
3 Rule Generation and Visualization
Example: Grocery Store Transactions
3 Rule Generation and Visualization

Visualize the top 5 rules with the highest lift.


> highLiftRules<-head(sort(rules,by="lift"),5)
> plot(highLiftRules,method="graph",control=list(type="items"))

In the graph, the arrow always points from an item on the LHS
to an item on the RHS.
For example, the arrows that connects ham, processed cheese, and white bread
suggest the rule

{ham, processed cheese} -> {white bread}

Size of circle indicates support and shade represents lift


Example: Grocery Store Transactions
3 Rule Generation and Visualization
Association Rules
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Example: Transactions in a Grocery Store

Validation and Testing

Diagnostics
Validation and Testing

• The frequent and high confidence itemsets are found by pre-


specified minimum support and minimum confidence levels
• Measures like lift and/or leverage then ensure that
interesting rules are identified rather than coincidental ones
• However, some of the remaining rules may be considered
subjectively uninteresting because they don’t yield
unexpected profitable actions
• E.g., rules like {paper} -> {pencil} are not interesting/meaningful
• Incorporating subjective knowledge requires domain experts
• Good rules provide valuable insights for institutions to
improve their business operations
Association Rules
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Example: Transactions in a Grocery Store

Validation and Testing

Diagnostics
Diagnostics
• Although the Apriori algorithm is easy to understand and implement, some of
the rules generated are uninteresting or practically useless.
• Additionally, some of the rules may be generated due to coincidental
relationships between the variables.
• Measures like confidence, lift, and leverage should be used along with human
insights to address this problem
• Another problem with association rules is that, in Phase 3 and 4 of the Data
Analytics Lifecycle , the team must specify the minimum support prior to
the model execution, which may lead to too many or too few rules.
• In related research, a variant of the algorithm can use a predefined target
range for the number of rules so that the algorithm can adjust the minimum
support accordingly.
• Algorithm requires a scan of the entire database to obtain the result.
Accordingly, as the database grows, it takes more time to compute in each
run.
Diagnostics- Approaches to
improve Apriori’s efficiency
Partitioning:
• Any itemset that is potentially frequent in a transaction database must be
frequent in at least one of the partitions of the transaction database.

Sampling:
• This extracts a subset of the data with a lower support threshold and uses the
subset to perform association rule mining.

Transaction reduction:
• A transaction that does not contain frequent k-itemsets is useless in subsequent
scans and therefore can be ignored.

Hash-based itemset counting:


• If the corresponding hashing bucket count of a k-itemset is below a certain
threshold, the k-itemset cannot be frequent.

Dynamic itemset counting:


• Only add new candidate itemsets when all of their subsets are estimated to be
frequent.
Syllabus
Advanced Analytical Theory and Methods:
Association Rules- Overview, a-priori algorithm,
evaluation of candidate rules, case study-
transactions in grocery store, validation and
testing, diagnostics.

Regression- linear, logistics, reasons to choose and


cautions, additional regression models.
Regression

Linear Regression

Logical Regression

Reasons to Choose and Cautions

Additional Regression Models


Regression
• Regression analysis attempts to explain the influence that
input (independent) variables have on the outcome
(dependent) variable
• Questions regression might answer
• What is a person’s expected income?

• What is probability an applicant will default on a loan?

• Regression can find the input variables having the greatest


statistical influence on the outcome
• E.g. – if 10-year-old reading level predicts students’ later success, then
try to improve early age reading levels
Linear Regression
• Models the relationship between several input variables
and a continuous outcome variable
• Assumption is that the relationship is linear

• Various transformations can be used to achieve a linear


relationship
• Linear regression models are probabilistic

• Involves randomness and uncertainty


• Not deterministic like Ohm’s Law (V=IR)
Use Cases (Applications)
Real estate example
• Predict residential home prices
• Possible inputs – living area, #bathrooms,
#bedrooms, lot size, property taxes
Demand forecasting example
• Restaurant predicts quantity of food needed
• Possible inputs – weather, day of week, etc.

Medical example
• Analyze effect of proposed radiation treatment
• Possible inputs – radiation treatment duration, freq
Linear Equations
Y
Y = m X + b
Change
m = S lo p e

EPI 809/Spring 2008


in Y
C h a n g e in X
b = Y - in te r c e p t
X

47
© 1984-1994 T/Maker Co.
Linear Regression Model
Relationship Between Variables Is a Linear Function

RANDOM
Y-INTERCEPT SLOPE ERROR

YI   0  1X I   I
DEPENDENT INDEPENDENT
(RESPONSE (EXPLANATORY)
VARIABLE) VARIABLE
(EG. INCOME) (E.G., AGE)
Model Description
For one input and one output variable

For more than one input variable:


Model Description
Example
•  Predict person’s annual income as a function of age and education

Income=

• The jS represent the unknown p parameters

• There is considerable variation in income levels for a group of people


with identical ages and years of education. This variation is
represented by in the model.

• Ordinary Least Squares (OLS) is a common technique to estimate


the parameters
Model Description
Example

•   OLS=
Model Description
Example

With OLS, the objective is to find the line through these points that
minimizes the sum of the squares of the difference between each
point and the line in the vertical direction

  The vertical lines represent the distance between each observed y value
and the line
Model Description
With Normally Distributed Errors
• Making
  additional assumptions on the error term provides further
capabilities
• It is common to assume the error term is a normally distributed
random variable with
• Mean equal to zero and constant variance
• Thus, the linear regression model is expressed as

• Y=β0+β1X1+...+βpXp+ϵ
• )
Model Description
With Normally Distributed Errors
• With
  this assumption, the expected value E(y) of the linear regression
model is:

• And the variance is

• Thus for a given(X , is normally distributed with Mean


Model Description
With Normally Distributed Errors
• Following Figure illustrate Regression Model with one input variable,
normality assumption on the error terms and the effect on the outcome
variable, Y , for a given value of X.

• E.g., for x=8, E(y)~20 but varies 15-25


Model Description
Example in R
Be sure to get publisher's R downloads: http
://www.wiley.com/WileyCDA/WileyTitle/productCd-111887613X.html

> income_input = as.data.frame(read.csv(“c:/data/income.csv”))


> income_input[1:10,]
> summary(income_input)

> library(lattice)
> splom(~income_input[c(2:5)], groups=NULL, data=income_input,
axis.line.tck=0, axis.text.alpha=0)
Model Description-Example in R

A strong positive linear trend is observed for Income as a function of Age.


Education, a slight positive trend may exist.
Lastly, there is no observed effect on Income based on Gender
Model Description
Categorical Variables
• In the example in R, Gender is a binary variable

• Variables like Gender are categorical variables in contrast


to numeric variables where numeric differences are
meaningful
Model Description
R Functions

• confint() function- Confidence Intervals on the Parameters

• predict() function with interval=“confidence”-Confidence Interval on


Expected Outcome
• predict() function also provides upper/lower bounds on a
particular outcomewith interval=“Prediction”- Prediction Interval on
a Particular Outcome
>

Diagnostics
Evaluating the Linearity Assumption
• A major assumption in linear regression modeling is that
the relationship between the input and output variables is
linear
• The most fundamental way to evaluate this is to plot the
outcome variable against each income variable
• If the relationship between Age and Income is represented
as illustrated in Figure in next slide, a linear model would
not apply. In such a case, it is often useful to do any of the
following:
• Transform the outcome variable.
• Transform the input variables.
• Add extra input variables or terms to the regression model.
Diagnostics
>

Evaluating the Linearity Assumption

Income as a quadratic function of Age


• Common transformations include taking square roots or the
logarithm of the variables.
• Another option is to create a new input variable such as the age
squared and add it to the linear regression model to fit a quadratic
relationship between an input variable and the outcome.
>

Diagnostics
Evaluating the Residuals
• Residuals are the difference between the observed outcome
variables and the fitted value based on the OLS parameter
estimates.
• For residuals, the lm() function in R automatically calculates
and stores the fitted values and the residuals, in the
components fitted.values and residuals in the output of the
lm() function
>

Diagnostics
Evaluating the Residuals
• The residual plots are useful for confirming that the residuals
were centered on zero and have a constant variance

Nonlnear
trend in
residuals

Residuals
not centered
on zero
>
Diagnostics
Evaluating the Residuals
• The residual plots are useful for confirming that the residuals
were centered on zero and have a constant variance

Residuals
not centered
on zero

Variance not
constant
>

Diagnostics
Evaluating the Normality Assumption
• From the histogram, it is seen that the residuals are
centered on zero and appear to be symmetric about zero, as
one would expect for a normally distributed random
variable.

Residuals centered on
zero and appear
normally distributed
>

Diagnostics
Evaluating the Normality Assumption
• Another option is to examine a Q-Q plot
comparing observed data against quantiles (Q)
of assumed dist

> qqnorm(results2$residuals)
> qqline(results2$residuals)
>

Diagnostics
Evaluating the Normality Assumption

Normally
distributed
residuals

Non-normally
distributed
residuals
>

Diagnostics
N-Fold Cross-Validation
• To prevent overfitting, a common practice splits the
dataset into training and test sets, develops the model on
the training set and evaluates it on the test set
• If the quantity of the dataset is insufficient for this, an N-
fold cross-validation technique can be used
• Dataset randomly split into N dataset of equal size
• Model trained on N-1 of the sets, tested on remaining one
• Process repeated N times
• Average the N model errors over the N folds
• Note: if N = size of dataset, this is leave-one-out procedure
>

Diagnostics
Other Diagnostic Considerations
• The model might be improved by including additional
input variables
• However, the adjusted R2 applies a penalty as the number of
parameters increases
• Residual plots should be examined for outliers
• Points markedly different from the majority of points
• They result from bad data, data processing errors, or actual rare
occurrences
• Finally, the magnitude and signs of the estimated
parameters should be examined to see if they make
sense
Regression

Linear Regression

Logical Regression

Reasons to Choose and Cautions

Additional Regression Models


>

Logistic Regression
Introduction
• In linear regression modeling, the outcome
variable is continuous – e.g., income ~ age and
education
• In logistic regression, the outcome variable is
categorical, example two-valued outcomes like
• True/false,
• pass/fail,
• yes/no
>

Logistic Regression
Use Cases
Medical
• Probability of a patient’s successful response to a specific medical
treatment – input could include age, weight, etc.

Finance
• Probability an applicant defaults on a loan

Marketing
• Probability a wireless customer switches carriers (churns)

Engineering
• Probability a mechanical part malfunctions or fails
>

Logistic Regression
Model Description
• Logical regression is based on the logistic function

• As y -> infinity, f(y)->1; and as y->-infinity, f(y)->0


>

Logistic Regression
Model Description
• With the range of f(y) as (0,1), the logistic function
models the probability of an outcome occurring

In contrast to linear regression, the values of y


are not directly observed; only the values of
f(y) in terms of success or failure are observed.
>

Logistic Regression
Model Description: customer churn example

• A wireless telecom company estimates


probability of a customer churning (switching
companies)
• Variables collected for each customer: age (years),
married (y/n), duration as customer (years), churned
contacts (count), churned (true/false)
• After analyzing the data and fitting a logical regression
model, age and churned contacts were selected as the
best predictor variables
>

Logistic Regression
Model Description: customer churn example
>

Diagnostics
Model Description: customer churn example
> head(churn_input) # Churned = 1 if cust churned
> sum(churn_input$Churned) # 1743/8000 churned
• Use the Generalized Linear Model function glm()
> Churn_logistic1<-
glm(Churned~Age+Married+Cust_years+Churned_contacts,data=churn_in
put,family=binomial(link=“logit”))
> summary(Churn_logistic1) # Age + Churned_contacts best
> Churn_logistic3<-
glm(Churned~Age+Churned_contacts,data=churn_input,family=binomial(l
ink=“logit”))
> summary(Churn_logistic3) # Age + Churned_contacts
>

Diagnostics
Deviance and the Pseudo-R2

• In statistics, deviance is a goodness-of-fit(describes how well it


fits a set of observations) statistic for a statistical model
• In logistic regression, deviance = -2logL
• where L is the maximized value of the likelihood function
used to obtain the parameter estimates
• Two deviance values are provided
• Null deviance = deviance based on only the y-intercept term
• Residual deviance = deviance based on all parameters
• Pseudo-R2 measures how well fitted model explains the data
• Value near 1 indicates a good fit
>

Diagnostics
Receiver Operating Characteristic (ROC) Curve
• Logistic regression is often used to classify
• In the Churn example, a customer can be classified as
Churn if the model predicts high probability of churning
• Although 0.5 is often used as the probability threshold
• For two classes, C (Churn) and nC (notChurn), we have
• True Positive: predict C, when actually C
• True Negative: predict nC, when actually nC
• False Positive: predict C, when actually nC
• False Negative: predict nC, when actually C
>

Diagnostics
Receiver Operating Characteristic (ROC) Curve

• The Receiver Operating Characteristic (ROC) curve


• Plots TPR against FPR
Diagnostics
Receiver Operating Characteristic (ROC) Curve

> library(ROCR)
> Pred = predict(Churn_logistic3, type=“response”)
>

Diagnostics
Receiver Operating Characteristic (ROC) Curve
>

Diagnostics
Histogram of the Probabilities

It is interesting to visualize the counts


of the customers who churned and
who didn’t churn against the estimated
churn probability.
Regression

Linear Regression

Logical Regression

Reasons to Choose and Cautions

Additional Regression Models


>

Reasons to Choose and Cautions

Linear regression – outcome variable continuous

Logistic regression – outcome variable categorical

Both models assume a linear additive function of the inputs


variables
• If this is not true, the models perform poorly
• In linear regression, the further assumption of normally distributed error
terms is important for many statistical inferences

Although a set of input variables may be a good predictor of


an output variable, “correlation does not imply causation”
Regression

Linear Regression

Logical Regression

Reasons to Choose and Cautions

Additional Regression Models


>

Additional Regression Models


• Multicollinearity is the condition when several input
variables are highly correlated
• This can lead to inappropriately large coefficients
• To mitigate this problem
• Ridge regression applies a penalty based on the size of
the coefficients
• Lasso regression applies a penalty proportional to the
sum of the absolute values of the coefficients
• Multinomial logistic regression – used for a more-than-two-
state categorical outcome variable
References
• http://www.csis.pace.edu/~ctappert/cs816-15fall/slides/
• http
://srmnotes.weebly.com/it1110-data-science--big-data.h
tml
• http://www.csis.pace.edu/~ctappert/cs816-15fall/books/
2015DataScience&BigDataAnalytics.pdf

You might also like