You are on page 1of 50

Chapter

An Introduction to
Logistic Regression

Shirin Aslani, Summer 2021

1
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Modeling the Expert

• Claims data include information at the patient


encounter level regarding diagnoses, treatments,
drugs, and billed and paid amounts

• Can we assess healthcare quality using claims


data?

• Why is Healthcare Quality Assessment


Important?

2
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Modeling the Expert

• Healthcare Quality Assessment


• Need to assess quality for proper medical interventions
• Good quality care educates patients and controls costs
• No single set of guidelines for defining quality of
healthcare
• Health professionals are experts in quality of care
assessment
• Experts are limited by memory and time
• Healthcare Quality Assessment
• Expert physicians can evaluate quality by examining a
patient’s records
• This process is time consuming and inefficient
• Physicians cannot assess quality for millions of patients

3
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Modeling the Expert

• Can we develop analytical tools that replicate


expert assessment on a large scale?

• Learn from expert human judgment


• Develop a model, interpret results, and adjust the model

• Make predictions/evaluations on a large scale

• Healthcare Quality Assessment


• Let’s identify poor healthcare quality using analytics

4
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Claims Data

• Electronically available
• Standardized
• Not 100% accurate
• Under-reporting is
common
• Claims for hospital visits
can be vague

5
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Claims Sample

• Large health insurance claims database


• Randomly selected 131 diabetes patients
• Ages range from 35 to 55
• Costs $10,000 – $20,000
• September 1, 2003 – August 31, 2005
• Expert physician reviewed claims and wrote
descriptive notes:
• “Ongoing use of narcotics”
• “Only on Avandia, not a good first choice drug”
• “Had regular visits, mammogram, and immunizations”
• “Was given home testing supplies”
• Expert Assessment, Rated quality on a two-point
scale (poor/good)
• “I’d say care was poor – poorly treated diabetes”
• “No eye care, but overall I’d say high quality” 6
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Variable Extraction

• Dependent Variable
• Quality of care
• Independent Variables
• ongoing use of narcotics
• only on Avandia, not a good first choice drug
• Had regular visits, mammogram, and immunizations
• Was given home testing supplies
• Diabetes treatment
• Patient demographics
• Healthcare utilization
• Providers
• Claims
• Prescriptions

7
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Logistic Regression

• Predicts the probability of poor care


• Denote dependent variable “PoorCare” by y
• P(y=1)
• P(y=0)=1-P(y=1)
• Independent variables
• Uses the Logistic Response Function

• Nonlinear transformation of linear regression


equation to produce number between 0 and 1

8
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Logistic Function

9
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Logistic Function

• We can instead talk about Odds (like in


gambling)

• Odds > 1 if y = 1 is more likely


• Odds < 1 if y = 0 is more likely
10
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Logistic Function

• It turns out that

• This is called the “Logit” and looks like linear


regression
• The bigger the Logit is, the bigger P(y=1)

11
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Threshold Value

• The outcome of a logistic regression model is a


probability

• We can do this using a threshold value t


• If P(PoorCare = 1) ≥ t, predict poor quality
• If P(PoorCare = 1) < t, predict good quality

• What value should we pick for t?

12
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Threshold Value

• Often selected based on which errors are “better”


• If t is large, predict poor care rarely
• More errors where we say good care, but it is actually
poor care
• Detects patients who are receiving the worst care
• If t is small, predict good care rarely
• More errors where we say poor care, but it is actually
good care
• Detects all patients who might be receiving poor care
• With no preference between the errors, select t =
0.5
• Predicts the more likely outcome

13
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Threshold Value

• Compare actual outcomes to predicted outcomes


using a confusion matrix (classification matrix)

14
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Receiver Operator Characteristic (ROC) Curve

• Captures all
thresholds
simultaneously
• High threshold
• High specificity
• Low sensitivity
• Low Threshold
• Low specificity
• High sensitivity

15
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Selecting a Threshold using ROC

• Choose best
threshold for best
trade off
• cost of failing to
detect positives
• costs of raising false
alarms

16
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Note

• Multicollinearity could be a problem


• Do the coefficients make sense?
• Check correlations

• Measures of accuracy

17
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Confusion Matrix

18
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Making Prediction

• Just like in linear regression, we want to make


predictions on a test set to compute out-of-
sample metrics
• If we use a threshold value of 0.3, we get the
following confusion matrix

19
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Area Under the ROC Curve (AUC)

• Take the area under the


curve
• interpretation
• Given a random positive
and negative, proportion of
the time you guess which
is which correctly
• Less affected by sample
balance than accuracy

20
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Area Under the ROC Curve (AUC)

• What is a good AUC?


• Maximum of 1 (perfect
prediction)
• Minimum of 0.5 (just
guessing)

21
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Conclusion

• An expert-trained model can accurately identify


diabetics receiving low-quality care
• Out-of-sample accuracy of 78%
• Identifies most patients receiving poor care
• In practice, the probabilities returned by the
logistic regression model can be used to prioritize
patients for intervention
• Electronic medical records could be used in the
future
• While humans can accurately analyze small
amounts of information, models allow larger
scalability
• Models do not replace expert judgment
• Models can integrate assessments of many experts into
one final unbiased and unemotional prediction 22
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Logistic Regression

• Imbalanced Data
• Weighted Logistic Regression
• Resampling the data-set
• Generating synthetic samples

23
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study

Frankline Delano Roosvelt


• President of the United States, 1933-1945
• Longest-serving president
• Led country through Great Depression
• Commander in Chief of U.S. military in World War II
• Died while president, April12, 1945

24
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study

Frankline Delano Roosvelt Blood Pressure


• Before presidency, blood pressure of 140/100
• Healthy blood pressure is less than 120/80
• Today, this is already considered high blood pressure
• One year before death, 210/120
• Today, this is called Hypertensive Crisis, and emergency
careis needed
• FDR’s personal physician:
“A moderate degree of arteriosclerosis, although no more
than normal for a man of his age”
• Two months before death: 260/150
• Day of death: 300/190
25
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study

Frankline Delano Roosvelt Blood Pressure


• Before presidency, blood pressure of 140/100
• Healthy blood pressure is less than 120/80
• Today, this is already considered high blood pressure
• One year before death, 210/120
• Today, this is called Hypertensive Crisis, and emergency
careis needed
• FDR’s personal physician:
“A moderate degree of arteriosclerosis, although no more
than normal for a man of his age”
• Two months before death: 260/150
• Day of death: 300/190
26
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study

• In late 1940s, U.S. Government set out to better


understand cardiovascular disease (CVD)
• Plan: track large cohort of initially healthy
patients over time
• City of Framingham, MA selected as site for study
• Appropriate size
• Stable population
• Cooperative doctors and residents
• 1948: beginning of Framingham Heart Study

27
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study

• 5,209 patients aged 30-59 enrolled


• Patients given questionnaire and exam every 2
years
• Physical characteristics
• Behavioral characteristics
• Test results
• Exams and questions expanded over time
• We will build models using the Framingham data
to predict and prevent heart disease

28
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Coronary Heart Disease (CHD)

• We will predict 10-year risk of CHD


• Subject of important 1998 paper, introducing the
Framingham Risk Score
• CHD is a disease of the blood vessels supplying
the heart
• Heart disease has been the leading cause of
death worldwide since 1921
• 7.3 million people died from CHD in 2008
• Since 1950, age-adjusted death rates have declined
60%

29
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Analytics to Prevent Heart Disease

30
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Factors

• Risk factors are variables that increase the


chances of a disease
• Term coined by William Kannell and Roy Dawber
from the Framingham Heart Study

• We will investigate risk factors collected in the


first data collection for the study
• Anonymized version of original data
• Demographic risk factors
• male: sex of patient
• age: age in years at first examination
• education: Some high school (1), high school/GED
(2),some college/vocational school (3), college (4)
31
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Factors

• Behavioral risk factors


• currentSmoker, cigsPerDay: Smoking behavior
• Medical history risk factors
• BPmeds: On blood pressure medication at time of first
examination
• prevalentStroke: Previously had a stroke
• prevalentHyp: Currently hypertensive
• diabetes: Currently has diabetes

32
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Factors

• Risk factors from first examination


• totChol: Total cholesterol (mg/dL)
• sysBP: Systolic blood pressure
• diaBP: Diastolic blood pressure
• BMI: Body Mass Index, weight (kg)/height (m)2
• heartRate: Heart rate (beats/minute)
• glucose: Blood glucose level (mg/dL)

33
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
An Analytical Approach

• Randomly split patients into training and testing


sets
• Use logistic regression on training set to predict
whether or not a patient experienced CHD within
10 years of first examination
• Evaluate predictive power on test set

34
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Model Strength

• Model rarely predicts 10-year CHD risk above


50%
• Accuracy very near a baseline of always predicting no
CHD
• Model can differentiate low-risk from high-risk
patients (AUC = 0.74)
• Some significant variables suggest interventions
• Smoking
• Cholesterol
• Systolic blood pressure
• Glucose

35
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Model Validation

• So far, we have used internal validation


• Train with some patients, test with others
• Weakness: unclear if model generalizes to other
populations
• Framingham cohort white, middle class
• Important to test on other populations
• Framingham Risk Model tested on diverse cohorts

• Cohort studies collecting same risk factors


36
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Model Validation –Black Men

• Validation Plan
• Predict CHD risk for each patient using FHS model
• Compare to actual outcomes for each risk decile
• 1,428 black men in ARIC study
• Similar clinical
characteristics, except higher
diabetes rate

37
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Model Validation –Japanese American
Men

• Validation Plan
• Predict CHD risk for each patient using FHS model
• Compare to actual outcomes for each risk decile
• 2,755 Japanese American men in HHS
• Lower CHD rate

38
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Recalibrated Model

• Recalibration adjusts model to new population


• Changes predicted risk, but does not reorder
predictions
• More accurate risk
estimates

39
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Interventions

• In FDR’s time, hypertension drugs too toxic for


practical use
• In 1950s, the diuretic chlorothiazide was
developed
• Framingham Heart Study gave Ed Freis the
evidence needed to argue for testing effects of BP
drugs
• Veterans Administration (VA) Trial: randomized,
double blind clinical trial
• Found decreased risk of CHD
• Now, >$1B market for diuretics worldwide

40
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Interventions

• Despite Framingham results, early cholesterol


drugs too toxic for practical use
• In 1970s, first statins were developed
• Study of 4,444 patients with CHD: statins cause
37% risk reduction of second heart attack
• Study of 6,595 men with high cholesterol: statins
cause 32% risk reduction of CVD deaths
• Now, > $20B market for statins worldwide

41
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Drivers decisions in ride-hailing platforms

• Two-sided platforms bring together two distinct


but interdependent groups of customers. They
create value as intermediaries by connecting
these groups

42
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Ride-Hailing Platforms

• Ride-hailing platforms are online or application-


based platforms that allow users to hire a
personal driver.
• They connect private-hire vehicle drivers with
platform users who need a ride.
• Ride-hailing platforms have three components:
• driver app (for drivers to offer services and
communicate with their customers),
• rider app (for customers to book and track their
journeys and select vehicle types),
• dispatch system (a system that connects the driver and
the customer via their mobile phones).

43
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Uber, the first one

• In 2009, Uber was founded as Ubercab by Garret Camp and Travis


Kalanick .
• After Camp and his friends spent $800 hiring a private driver, he
wanted to find a way to reduce the cost of direct transportation.
• He realized that sharing the cost with people could make it
affordable, and his idea morphed into Uber.
• Following a beta launch in May 2010, Uber's services and mobile
app officially launched in San Francisco in 2011. Originally, the
application only allowed users to hail a black luxury car and the
price was 1.5 times that of a taxi.
• In 2011, the company changed its name from UberCab to Uber
after complaints from San Francisco taxicab operators.
• In April 2012, Uber launched a service in Chicago where users
were able to request a regular taxi or an Uber driver via its mobile
app.
• By early 2013, the service was operating in 35 cities.

44
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Dispatching System

45
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Dispatching System

• One key factor in ride-hailing platforms:


• Dispatching rate in each zone and each time period

• Process
• Rider check the price for her request
• If the price is lower than her willingness to pay she would ask
for a driver
• A dispatching algorithm starts matching riders and drivers
• In some applications driver can choose to accept or reject
the request
• In this apps the dispatching algorithm have a limited time
to find a suitable driver
• The app start offering the request to a prioritized list of
drivers until it gets an acceptance or the time is over.
46
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Dispatching System

• High Dispatching rate


• Higher satisfaction for riders
• Higher satisfaction for drivers
• Not necessarily higher revenue for the platform
• An indicator for a good balance in the network

47
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Drivers decision making

• Driver’s acceptance/rejection affect the performance of the


dispatching algorithm
• Higher rate
• Shorter waiting time for rider

48
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Driver’s decision making

• Dependent Variable
• Drivers’ acceptance/rejection

• Independent variable
• Weekday/weekend
• Time (peak/un-peak/night)
• Zone (city center/ business area/ suburb)
• Price/distance
• Net commission
• Driver’s same day revenue till the request offer
• Driver’s same month revenue till the request offer
• Number of requests offered to the driver before this request
• Origin-destination distance
• Driver’s distance to the origin
• Driver’s waiting time from last ride
• Age
• Sex
• If the driver is native

49
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Results

• AUC: 84%
• Accuracy: 75%

50
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology

You might also like