Professional Documents
Culture Documents
Logistic Regeression
Logistic Regeression
An Introduction to
Logistic Regression
1
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Modeling the Expert
2
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Modeling the Expert
3
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Modeling the Expert
4
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Claims Data
• Electronically available
• Standardized
• Not 100% accurate
• Under-reporting is
common
• Claims for hospital visits
can be vague
5
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Claims Sample
• Dependent Variable
• Quality of care
• Independent Variables
• ongoing use of narcotics
• only on Avandia, not a good first choice drug
• Had regular visits, mammogram, and immunizations
• Was given home testing supplies
• Diabetes treatment
• Patient demographics
• Healthcare utilization
• Providers
• Claims
• Prescriptions
7
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Logistic Regression
8
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Logistic Function
9
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Logistic Function
11
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Threshold Value
12
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Threshold Value
13
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Threshold Value
14
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Receiver Operator Characteristic (ROC) Curve
• Captures all
thresholds
simultaneously
• High threshold
• High specificity
• Low sensitivity
• Low Threshold
• Low specificity
• High sensitivity
15
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Selecting a Threshold using ROC
• Choose best
threshold for best
trade off
• cost of failing to
detect positives
• costs of raising false
alarms
16
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Note
• Measures of accuracy
17
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Confusion Matrix
18
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Making Prediction
19
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Area Under the ROC Curve (AUC)
20
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Area Under the ROC Curve (AUC)
21
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Conclusion
• Imbalanced Data
• Weighted Logistic Regression
• Resampling the data-set
• Generating synthetic samples
23
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study
24
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study
27
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
The Framingham Heart Study
28
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Coronary Heart Disease (CHD)
29
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Analytics to Prevent Heart Disease
30
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Factors
32
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Factors
33
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
An Analytical Approach
34
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Model Strength
35
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Model Validation
• Validation Plan
• Predict CHD risk for each patient using FHS model
• Compare to actual outcomes for each risk decile
• 1,428 black men in ARIC study
• Similar clinical
characteristics, except higher
diabetes rate
37
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Risk Model Validation –Japanese American
Men
• Validation Plan
• Predict CHD risk for each patient using FHS model
• Compare to actual outcomes for each risk decile
• 2,755 Japanese American men in HHS
• Lower CHD rate
38
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Recalibrated Model
39
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Interventions
40
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Interventions
41
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Drivers decisions in ride-hailing platforms
42
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Ride-Hailing Platforms
43
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Uber, the first one
44
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Dispatching System
45
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Dispatching System
• Process
• Rider check the price for her request
• If the price is lower than her willingness to pay she would ask
for a driver
• A dispatching algorithm starts matching riders and drivers
• In some applications driver can choose to accept or reject
the request
• In this apps the dispatching algorithm have a limited time
to find a suitable driver
• The app start offering the request to a prioritized list of
drivers until it gets an acceptance or the time is over.
46
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Dispatching System
47
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Drivers decision making
48
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Driver’s decision making
• Dependent Variable
• Drivers’ acceptance/rejection
• Independent variable
• Weekday/weekend
• Time (peak/un-peak/night)
• Zone (city center/ business area/ suburb)
• Price/distance
• Net commission
• Driver’s same day revenue till the request offer
• Driver’s same month revenue till the request offer
• Number of requests offered to the driver before this request
• Origin-destination distance
• Driver’s distance to the origin
• Driver’s waiting time from last ride
• Age
• Sex
• If the driver is native
49
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology
Results
• AUC: 84%
• Accuracy: 75%
50
Business Analytics– Dr. Shirin Aslani – GSME, Sharif University of Technology