You are on page 1of 37

MARKETING SEGMENTATION

FOR
JACK DOWSON
INVESTMENT CORPORATION
(JDIC)
intermediate Assignment Python
Batch 5 Amsterdam - Team 4
FSDA Jan 2023

BY: DEWI REKNO


Content
Analytical Objectives

Data Preparation, Cleaning, and Communication

Exploratory Data Analysis

Cluster Analysis and Interpretation

Recommendation

See the data here


A. Analytical Objectives

Jack Dowson Investment Corp. (known: JDIC) is one of the popular investment platform in
Indonesia. It has mission to have campaign by tasking DA to identify what kind of thematic
campaigns that can recommend to the marketing team for the next month.

File Support:
Create a segmentation for
this thematic campaign

Give recommendation on the


themes on each campaign

See the data here


B. Data Preparation, Cleaning, and
Communication
Data Preparation
To ensure our running code will cooperate well, use the basic function of pandas,
numpy, seaborn, and matplotlib to file daily_user_transaction and file users.
So, it will lock the next command ahead for another code and command

See the data here


Data Cleaning

For data cleaning in this case, both on daily_user_transaction and users, the steps
are:

Check Data Type

Data Manipulation

Removing Duplicates and Check


Value

Cleaning Data

Handling Outliers

See the data here


Batch: daily_user_transaction
Check Data Types
Ensure we have duplicate/copy from the original data. It will save the original data if any
changes from the code inserted.

use df.info() to know what kind of data. Use this null.sum() to know column condition, which
Here, it has 158811 rows and 17 column column should be filled and need more treatment

See the data here


Batch: daily_user_transaction
Data Manipulation

use df.info() again to recheck the data after insert null Use df.info() again to ensure the data type we wanted
on blank column and rows. This is to ensure that after gave some tratment for changing data type
there's no missing value exist

See the data here


Batch: daily_user_transaction
Removing Duplicates and Check Value
Use basic /main info to show how much duplicates
from the data

use this function to drop the duplicates

The result after dropping duplicates, it can be infer that the


data we have is not too good

See the data here CLEAN!!!


Batch: daily_user_transaction
Handling Outliers
We chose:
total_investment_amount, saham_invested_amount, pasar_uang_invested_amount, pendapatan_tetap_invested_amount,
and campuran_invested_amount to check and handling outliers. Because these column considered strong enough to
represent a result.

On each calculating result, the points show


there are still have some outliers that
makes the line not so tight

See the data here


Batch: users
Check Data Types
Ensure we have duplicate/copy from the original data. It will save the original data if any
changes from the code inserted.

use df.info() to know what kind of data. Use this null.sum() to know column condition, which
Here, it has 14712 rows and 11 column column should be filled and need more treatment

See the data here


Batch: users
Data Manipulation
Becausthis data has one column that not fully filling with information, which is column
referrral_code_used, so our homework is to fill this blank row with "not used referral"

use df.info() again to recheck the data after insert null


on blank column and rows also after change from data
type. This is to ensure that there's no missing value
exist

See the data here


Batch: users
Removing Duplicates and Check Value
Because the result is "0", it can be infer from the result that there's no duplicates

This is the appearance of the cleaning data

See the data here


Batch: users
Handling Outliers
We chose:
end_of_month_investmen, total_buy_amount, and total_sell_amount to check and handling
outliers. Because these column considered strong enough to represent a result.

Because on the column end_of_month_investment and total_buy_amount have little outliers, so they appearance look so
rapid than the appearance result on daily_user_transaction.
But, on the column total_sell_amount has the opposite result, because the quantile result is negative. It is happen because the
company spend money to the customers.

See the data here


Check The Data Again for Merger

Ensure before we merge the data, the kind of data


type is already same, especially on user id and
datetime

See the data here


To make sure the data already merged, we can
take a look on the bottom information that shows
27 column, it means that the data has merged

See the data here


D. Exploratory Data Analysis
Firstly, we have to make a copy of the merger data from daily_user_transaction and users
and check the merged data exist

See the data here


Descriptive Statistics

Desc. Information about Numeric Variable Number of Users

Desc. Information about String Variable Referral Used Status

Desc. Information about Date Type Variable

See the data here


Desc. Information about Numeric Variable
By insert this code for numeric
variable, it will result the information
below

See the data here


Desc. Information about String Variable
By insert this code for string variable,
it will result the information below

Insight:
* we have 8277 user
* Most user_gender = male
* Most user_occupation = pelajar
* Most user_income_range = Kurang hadri 10 juta
* Most not used referral code
* Most user_income_source*= gaji

See the data here


Desc. Information about Date Type Variable
By insert this code for date type
variable, it will result the infrmation

Insight:
The date starts from 2021-08-01 00:11:14 to 2021-09-
28 13:20:00

Insight:
The date starts from 2021-08-04 00:00:00 to 2021-09-
30 00:00:00

See the data here


Number of Users Referral Used Status

Insight: By insert this code for referral code used variable, it will
After both of the data are cleaned and result the information above
merged, total users of the data are 8277

It it served on the pie chart diagram,


it will look like this one

Insight:
* Most user not use referral code
with percentage of 64,30% from total
user
* And the other 35,70% use referral
code
See the data here
E. Segmentation
Data Preparation Cluster

Libraries Segmentation

Preparing Each Data Merged Hasil Cluster dengan Dataset

Check Data Distribution Visualize Cluster

See the data here


Data Preparation

Ensure we have duplicate/copy from the original data of


merged data. It will save the original data if any changes
from the code inserted.

See the data here


Libraries
for Cluster, we have to insert Kmean code to lock the
intruction ahead

Preparing Each Data

In this case, we want to look total_investment_amount


based on age

See the data here


Check Data Distribution

01 02
No Scale Standard Scaler

See the data here


No Scale Standard Scaler

Robust Scaler MinMax Scaler

The distribution data of


total_invested_amount
based on user_age in 4
model scale are sam

See the data here


Cluster

01
Elbow Method

The best cluster from the future are cluster 4 and cluster 3

See the data here


Cluster

02
Shilouette Analysis

See the data here


Segmentation

Use ss scaler for segmentation

Chose cluster 4 for segmentation because, based on the figure


result, cluster 4 better than others

See the data here


Segmentation

Use ss scaler for segmentation

beside cluster 4 for segmentation is good, based on


the figure result, cluster 3 is not too bad too

See the data here


Merged Hasil Cluster dengan Dataset

Visualize Cluster

For no we call our clusters as is, then we will create


an interesting name for naming the cluster

Descriptive statistic both from x and y


axist from each cluster

See the data here


Visualize Cluster

Use this code to check what can be analyzed

Cluster Interpretation

Total Invested Amount Total Buy Amount

User Age User Occupation

End of Month Invested Amount User Income Range

See the data here


Total Investment Amount User Age

See the data here


End of Month Invested Amount Total Buy Amount

See the data here


User Occupation User Income Range

See the data here


Recommendation for Segmentation
JDIC can use 6 segment cluster interpretation for segmenting market, it also useful for
promotion based on each segment. There are: total invested amount, user age, end of
mont invested amount, total buy amount, user occupation, and user income range.

Based on those six segmentation, JDIC can know the average income or total buy
amount based on the mode of user age, mode of user occupation, and mode of user
incoma range. Based on those six segments also help on each cluster for promotion

See the data here


Thank You
Let's Get Contact!

Dewi Rekno

dewirekno140@gmail.com

+62 857-0420-6698

You might also like