Professional Documents
Culture Documents
Data
The Big Organics data set contains 13 variables and 111,115 observations.
Goals
Explore the data.
Create a customer segmentation based predominately on demographics.
Use predictive models to determine customers who most likely will buy organic products.
3
Hands-on Workshop: SAS Visual Analytics - CASE: Big Organics
Click on Data – and select the BIGORGANICS data source in the TUNDATA directory. Click OK
5
Hands-on Workshop: SAS Visual Analytics - CASE: Big Organics
Exploration
Our Target variable Organics Purchase Indicator is a Measure, but it would be more convenient to have it
as a (binary) Category. Change the classification of it to Category.
To start with, let’s check the distribution of Gender, just drop the data item Gender into your page.
Select Organics Purchase Indicator from the Data – Measures, drop it at the bottom of your visualization
window (you will see + Auto Chart) and a new chart will be created with our target variable.
Hands-on Exercise 1:
1) Create charts to explore Loyalty Status, Geographic Region, Affluence Grade, Age and Loyalty Card
Tenure. Explore other variables at your will.
6
Hands-on Workshop: SAS Visual Analytics - CASE: Big Organics
2) On Data click on to inspect the measurement details, quickly verify which ones present missing
values.
NOTE: In the Linear Regression, Logistic Regression, and Generalized Linear Model, by default, all
observations that contain a missing value in any assigned role variable are dropped. In some cases, the
fact that an observation contains a missing value provides relevant information. By selecting the
Informative Missingness property it handles missing values automatically, for measure variables missing
values are imputed with the observed mean, and an indicator variable is created, for category variables,
missing values are considered a distinct level.
So, to avoid discarding observations containing missing values and to deal with cases where the automatic
process would not be appropriate, we can treat these cases. This process is usually called Imputation.
First, Gender. On Data, click on ADD and select Add custom category. Call our new categorical variable:
Gender_IMP (for imputed), set it to be based on Gender, substitute the missing values by “U” (already
used for unknown), and the Remaining Values to Show as is.
7
Hands-on Workshop: SAS Visual Analytics - CASE: Big Organics
Now we need to do the imputation for the measures. Let’s treat Loyalty Card Tenure by substituting the
missing values by 0. First, add a calculated item called: LC Tenure_IMP to substitute the missing values of
Loyalty Card Tenure by 0. Check its distribution.
8
Hands-on Workshop: SAS Visual Analytics - CASE: Big Organics
9
Hands-on Workshop: SAS Visual Analytics - CASE: Big Organics
Hands-on Exercise 2:
10