Marketing Segmentation For Jack Dowson Investment Corporation (Jdic)

MARKETING SEGMENTATION
FOR
JACK DOWSON
INVESTMENT CORPORATION
(JDIC)
intermediate Assignment Python
Batch 5 Amsterdam - Team 4
FSDA Jan 2023
BY: DEWI REKNO

Content
Analytical Objectives
Data Preparation, Cleaning, and Communication
Exploratory Data Analysis
Cluster Analysis and Interpretation
Recommendation
See the data here

A. Analytical Objectives
Jack Dowson Investment Corp. (known: JDIC) is one of the popular investment platform in
Indonesia. It has mission to have campaign by tasking DA to identify what kind of thematic
campaigns that can recommend to the marketing team for the next month.
File Support:
Create a segmentation for
this thematic campaign
Give recommendation on the

themes on each campaign
See the data here

B. Data Preparation, Cleaning, and
Communication
Data Preparation
To ensure our running code will cooperate well, use the basic function of pandas,
numpy, seaborn, and matplotlib to file daily_user_transaction and file users.
So, it will lock the next command ahead for another code and command
See the data here

Data Cleaning
For data cleaning in this case, both on daily_user_transaction and users, the steps
are:
Check Data Type
Data Manipulation
Removing Duplicates and Check

Value
Cleaning Data
Handling Outliers
See the data here

Batch: daily_user_transaction
Check Data Types
Ensure we have duplicate/copy from the original data. It will save the original data if any
changes from the code inserted.
use df.info() to know what kind of data. Use this null.sum() to know column condition, which
Here, it has 158811 rows and 17 column column should be filled and need more treatment
See the data here

Data Manipulation
use df.info() again to recheck the data after insert null Use df.info() again to ensure the data type we wanted
on blank column and rows. This is to ensure that after gave some tratment for changing data type
there's no missing value exist
See the data here

Removing Duplicates and Check Value
Use basic /main info to show how much duplicates
from the data
use this function to drop the duplicates
The result after dropping duplicates, it can be infer that the

data we have is not too good
See the data here CLEAN!!!

Handling Outliers
We chose:
total_investment_amount, saham_invested_amount, pasar_uang_invested_amount, pendapatan_tetap_invested_amount,
and campuran_invested_amount to check and handling outliers. Because these column considered strong enough to
represent a result.
On each calculating result, the points show

there are still have some outliers that
makes the line not so tight
See the data here

Batch: users
Check Data Types
Ensure we have duplicate/copy from the original data. It will save the original data if any
changes from the code inserted.
use df.info() to know what kind of data. Use this null.sum() to know column condition, which
Here, it has 14712 rows and 11 column column should be filled and need more treatment
See the data here

Batch: users
Data Manipulation
Becausthis data has one column that not fully filling with information, which is column
referrral_code_used, so our homework is to fill this blank row with "not used referral"
use df.info() again to recheck the data after insert null

on blank column and rows also after change from data
type. This is to ensure that there's no missing value
exist
See the data here

Batch: users
Removing Duplicates and Check Value
Because the result is "0", it can be infer from the result that there's no duplicates
This is the appearance of the cleaning data
See the data here

Batch: users
Handling Outliers
We chose:
end_of_month_investmen, total_buy_amount, and total_sell_amount to check and handling
outliers. Because these column considered strong enough to represent a result.
Because on the column end_of_month_investment and total_buy_amount have little outliers, so they appearance look so
rapid than the appearance result on daily_user_transaction.
But, on the column total_sell_amount has the opposite result, because the quantile result is negative. It is happen because the
company spend money to the customers.
See the data here

Check The Data Again for Merger
Ensure before we merge the data, the kind of data

type is already same, especially on user id and
datetime
See the data here

To make sure the data already merged, we can
take a look on the bottom information that shows
27 column, it means that the data has merged
See the data here

D. Exploratory Data Analysis
Firstly, we have to make a copy of the merger data from daily_user_transaction and users
and check the merged data exist
See the data here

Descriptive Statistics
Desc. Information about Numeric Variable Number of Users
Desc. Information about String Variable Referral Used Status
Desc. Information about Date Type Variable
See the data here

Desc. Information about Numeric Variable
By insert this code for numeric
variable, it will result the information
below
See the data here

Desc. Information about String Variable
By insert this code for string variable,
it will result the information below
Insight:
* we have 8277 user
* Most user_gender = male
* Most user_occupation = pelajar
* Most user_income_range = Kurang hadri 10 juta
* Most not used referral code
* Most user_income_source*= gaji
See the data here

Desc. Information about Date Type Variable
By insert this code for date type
variable, it will result the infrmation
Insight:
The date starts from 2021-08-01 00:11:14 to 2021-09-
28 13:20:00
Insight:
The date starts from 2021-08-04 00:00:00 to 2021-09-
30 00:00:00
See the data here

Number of Users Referral Used Status
Insight: By insert this code for referral code used variable, it will
After both of the data are cleaned and result the information above
merged, total users of the data are 8277
It it served on the pie chart diagram,

it will look like this one
Insight:
* Most user not use referral code
with percentage of 64,30% from total
user
* And the other 35,70% use referral
code
See the data here
E. Segmentation
Data Preparation Cluster
Libraries Segmentation
Preparing Each Data Merged Hasil Cluster dengan Dataset
Check Data Distribution Visualize Cluster
See the data here

Data Preparation
Ensure we have duplicate/copy from the original data of

merged data. It will save the original data if any changes
from the code inserted.
See the data here

Libraries
for Cluster, we have to insert Kmean code to lock the
intruction ahead
Preparing Each Data
In this case, we want to look total_investment_amount

based on age
See the data here

Check Data Distribution
01 02
No Scale Standard Scaler
See the data here

No Scale Standard Scaler
Robust Scaler MinMax Scaler
The distribution data of

total_invested_amount
based on user_age in 4
model scale are sam
See the data here

Cluster
01
Elbow Method
The best cluster from the future are cluster 4 and cluster 3
See the data here

Cluster
02
Shilouette Analysis
See the data here

Segmentation
Use ss scaler for segmentation
Chose cluster 4 for segmentation because, based on the figure

result, cluster 4 better than others
See the data here

Segmentation
Use ss scaler for segmentation
beside cluster 4 for segmentation is good, based on

the figure result, cluster 3 is not too bad too
See the data here

Merged Hasil Cluster dengan Dataset
Visualize Cluster
For no we call our clusters as is, then we will create

an interesting name for naming the cluster
Descriptive statistic both from x and y

axist from each cluster
See the data here

Visualize Cluster
Use this code to check what can be analyzed
Cluster Interpretation
Total Invested Amount Total Buy Amount
User Age User Occupation
End of Month Invested Amount User Income Range
See the data here

Total Investment Amount User Age
See the data here

End of Month Invested Amount Total Buy Amount
See the data here

User Occupation User Income Range
See the data here

Recommendation for Segmentation
JDIC can use 6 segment cluster interpretation for segmenting market, it also useful for
promotion based on each segment. There are: total invested amount, user age, end of
mont invested amount, total buy amount, user occupation, and user income range.
Based on those six segmentation, JDIC can know the average income or total buy
amount based on the mode of user age, mode of user occupation, and mode of user
incoma range. Based on those six segments also help on each cluster for promotion
See the data here

Thank You
Let's Get Contact!
Dewi Rekno
dewirekno140@gmail.com
+62 857-0420-6698

Marketing Segmentation For Jack Dowson Investment Corporation (Jdic)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Marketing Segmentation For Jack Dowson Investment Corporation (Jdic)

Uploaded by

Copyright:

Available Formats

MARKETING SEGMENTATION

BY: DEWI REKNO

Data Preparation, Cleaning, and Communication

Exploratory Data Analysis

Cluster Analysis and Interpretation

See the data here

Give recommendation on the

See the data here

See the data here

Check Data Type

Removing Duplicates and Check

See the data here

See the data here

See the data here

use this function to drop the duplicates

The result after dropping duplicates, it can be infer that the

See the data here CLEAN!!!

On each calculating result, the points show

See the data here

See the data here

use df.info() again to recheck the data after insert null

See the data here

This is the appearance of the cleaning data

See the data here

See the data here

Ensure before we merge the data, the kind of data

See the data here

See the data here

See the data here

Desc. Information about Numeric Variable Number of Users

Desc. Information about String Variable Referral Used Status

Desc. Information about Date Type Variable

See the data here

See the data here

See the data here

See the data here

It it served on the pie chart diagram,

Preparing Each Data Merged Hasil Cluster dengan Dataset

Check Data Distribution Visualize Cluster

See the data here

Ensure we have duplicate/copy from the original data of

See the data here

Preparing Each Data

In this case, we want to look total_investment_amount

See the data here

See the data here

Robust Scaler MinMax Scaler

The distribution data of

See the data here

See the data here

See the data here

Use ss scaler for segmentation

Chose cluster 4 for segmentation because, based on the figure

See the data here

Use ss scaler for segmentation

beside cluster 4 for segmentation is good, based on

See the data here

For no we call our clusters as is, then we will create

Descriptive statistic both from x and y

See the data here

Use this code to check what can be analyzed

Total Invested Amount Total Buy Amount