You are on page 1of 22

n

6
g hl i
u
m
a
Tea h McL ala
t
Hea h Miriy
s
l
Aka Petril yanto
t
n
Rya elle Se
h
Mic
u
Ya S

K353 Final
Project
Predictive
Analytics
of
US Obesity

Data Overview
Predicting Obesity Rate in the US
through:

Fast Food Expenditure per State


Poverty Rate
Median Income
Grocery Spending per Capita
Ethnicity
Food Tax per State

Data
Overview

Continuous variables:

Population
Per Capita:

Grocery & Convenience Stores


Fast Food Restaurants & Expenditures
Gyms
Restaurants & Expenditures

Per State:

Food Tax
Prices of Milk over Soda

Continuous Socioeconomic
Variables:

% of Major Race Types


Median Income
Poverty Rate

Flag Socioeconomic Variables:

Poverty
Metro areas

Data Understanding & Preparation

Transformation: Counts Per

Capita
Removed Anomalies
Eliminated Outliers

Modeling
Supervised Techniques
Multiple Linear Regression
Analysis
Decision Tree
Unsupervised Techniques:
Agglomerative Cluster Analysis
K-Means Clustering

Supervised Technique
Multiple Regression
Analysis

Analyze important predictors

Estimate numerical target variable


(Adult Obesity Rates)

Regression Equation:
y = 98.526 .009(ExpPCRest) .000(MedInc) + .129(%Black) + .
255(%18&under) 1.568(RestPT) .808(%Asian) + .156(SodaTaxRetail) +
3.435(PriceMilkOverSoda) 4.834(GymsPT) .069(%Hispanic) + .
270(%Native) .003(ExpPCFastFood) + .023(%White) + e i

Evaluation
Goodness-of-fit
3.
Residuals
2.1.Binned
Scatterplot

Supervised Technique
Decision Tree

Predictors Predict Obesity Rate


Find conditions Obesity Rate (>= 35%)

Use CART method with predictors:

Expenses per capita on fast food


Adult diabetes rate
Fast food per thousand
Restaurants per thousand
Expenditures per capita on restaurants

Rules:
1. If Adult Diabetes Rate <= 13.85%
-> 0
2. If Diabetes Rate > 13.85% AND

Restaurants per Thousand <= .


659 -> 1
3. If Diabetes Rate > 13.85% AND

Restaurants per Thousand > .659


-> 0

Evaluation Error &


Sensitivity
Training

1,294

121

48

Testing

1,398

14

168

62

Training
Accuracy : 91%
Sensitivity : 28%
Specificity : 99%
Error : 10%
Testing
Accuracy : 89%
Sensitivity : 27%
Specificity : 99%
Error : 11%

Unsupervised
Technique
Agglomerative
Different variable Cluster
combinations
Clusters
Analysis
Children Poverty Rate, Poverty Rate &
Adult Obesity Rate
Gyms Per Thousand & Adult Obesity
Rate

Two-Step Clustering

Evaluation
Predictive accuracy

Unsupervised
Technique
K-Means Clustering
Data Create 4 clusters
Analyze any region

similarities
Northeast, South, Midwest,
West

Evaluation
Predictive accuracy

Cluster 1

Cluster 2

Obesity Rates

Cluster 3

Cluster 4

Cluster 1

Cluster 2

Gyms Per
Thousand

Cluster 3

Cluster 4

Deployment
Business / real world applications

Find out causes of obesity


&
How it can be predicted

Guide obese and overweight people to eliminate


the factors that can help improve their health

Alert the government on the importance of health


education to people with low income

Questions?