You are on page 1of 27

Project Setup

Data
Collection

Data
Preparation
Training &
Debugging

Deploying &
Testing
Problem Statement
a. How to match between job applicants (users) and available job skillsets in the market?

Project Requirement & Goal


a. To build up a recommender system that can effectively match between users and skillsets.

b. The recommender system will be built based on user click event and demographic data.

Remarks:
- The click event is refer to job viewed, job wishlisted, job applied and job rating.
- System needs to have a formula to calculate the score of the skills that are related to the user
according to the click behaviors.

Data Collection Method


a. Mock dummy data using Python. [temporary]

b. Official data collection will start after the system is established.

Data Preprocessing
a. Text processing on demographic fields. [future]

b. Derive score for skillsets.

c. Prepare input data for model training.

Data Exploration
a. Perform some simple data exploration on the processed dataset, and some plots or graphs can
be generated as summary reference.

Data Modelling
a. Cold start method

b. Content based filtering

c. Collaborative filtering

Improve Model
a. Compare and contrast the learning results between several mechanims/ methods.
Deployment
a. Schedule for SIT and UAT.
B. Sync with IT team for the deployment method.
More In Details

a. Initially, the matching shall be performed based on the portal overall trending, which slowly evolving into
personalised or comprehensive matching when the system has gathered sufficient data.

a. Tools such as MySQL might be required to access digital data in database.

There are three master tables:

a. Employee table
- demographic information
b. Job table
- to map job ID to related skillsets (one-to-three mapping)
c. Click event table
- records of job viewed, wishlisted, applied and rated

Remark: Refer "master" sheet for details


Demographic fields
a. Majority of the data fields obtained from portal shall be drop-down list content (fixed). If free text is received,
additional data preprocessing (i.e. data crunching) might be needed.

Score
a. Weight is being assigned to different click events to indicate their significant level.
b. The score for a skillset is the summation of its weight for all click events. [temporary practice]
c. Weight could be enhanced and new click events could be included. [future improvement]

Input data for modelling


a. The value of skill columns (var Y) can be either 1 (significant) or 0 (not significant).
b. The significant level of skill is derived from its score and a system defined threshold.
c. Every skill column will then be converted into one-hot encoding data to fit into model.

Remark: Refer "cold start", "content-based" and "collaborative" sheets for details

a. Bar chart and boxplot to visualise distribution of demographic data.

b. Summary table for click event data.

Cold start method


a. Apply when there is neither historical click event records nor demographic information (new user or login as
guest).
b. Involve SQL and data engineering work.

Content based filtering


a. Apply to provide a customised recommendation to the user based on his/ her preferences.
b. Involve SQL and data engineering work.

Collaborative filtering
a. Apply to provide a comprehensive recommendation to the user based on his/ her cluster's trend.
b. Involve SQL, data engineering and tensorflow neural network.

Remark: Refer "cold start", "content-based" and "collaborative" sheets for details

a. Plot graphs to visualise loss and accuracy of model training and model validation.

b. Compare the value of evaluation metrics (i.e. accuracy, rmse etc.)


Expected Input

a. Raw data in SQL database


a. Excel/ csv files from Data Collection stage

a. Datasets from Data Preprocessing stage

a. Datasets from Data Preprocessing stage

a. Dataset with learning result


Expected Output

b. Excel/ csv files


a. Python pandas dataframe or other suitable format

a. Some graphs, charts or tables for internal checking before proceed to data modelling

a. Dataset with top k skillsets with their score

b. Dataset with learning result

a. Some plots or graphs to visualise the model performance


Timeline
Remarks
Employee table
employee_id skill_1 skill_2 skill_3 job_pref_position_level
1001
1002
1003

Usage
1) Employee table stores demographic details of every employees.

Job table
job_id job_skills
101 D
101 H
101 A
102 A
102 F
102 C

Usage
1) Job table stores skills related to every job. Suppose one job ID will have 3 records.

Click event table


employee_id job_id click_event_type weight
1001 101 5 5
1001 101 6 1
1001 101 7 1
1001 101 8 1
1002 101 6 1
1002 101 7 1
1003 102 6 1

Usage
1) Click event table stores click event history (view, wishlist, apply and rate) performed by employees on jobs.
2) The click event table should be scoped down to a specific time frame to ensure the recommendation results is always up
a) latest one month records based on created datetime or updated datetime of the raw records in database
b) latest 1000 click events, by employees or overall data depends on scenario
3) Weight is defined according to the significant level of the click event. The suggested assignment as per below:

click_event_type click_event_desc weight


1 rating 1 1
2 rating 2 2
3 rating 3 3
4 rating 4 4
5 rating 5 5
6 job viewed 1
7 job wishlisted 1
8 job applied 1

Interpretation
1) Employee 1001 has viewed, wishlisted, applied and rated 5 stars for job 101.
2) Employee 1002 has only viewed and wishlisted job 101.
3) Employee 1003 has only viewed job 102.
expected_monthly_salary state city qualification field_of_study specialization_1 specialization_2

med by employees on jobs.


he recommendation results is always up to trend. For example,
he raw records in database

ted assignment as per below:


specialization_3 exp_year lang_1 lang_2 lang_3
Problem statement
How to perform a general recommendation to a new portal subscriber?

Scenario
Adam is a new subscriber of werkseven job portal as an employee. Since he is new, he does not have any history records in th
We want to find out possible jobs (in term of skillsets) that can be recommended to Adam based on the overall trending of exi

Data Preparation & Methodology


Identify the top k skillsets required by the jobs that have been viewed, wishlisted, applied or rated by existing users in a certain
Suitable time frame can be the latest one month or the latest 1000 click events captured by system.

Illustration

Interpretation
1) Left join click event table with job table. The output table as per the first table.
- every record in click event table will now replicate by three, with each record ties with one job skill
2) Group by job_skills to get sum of weight as score in percent. The output table as per the second table.
3) Sort and apply filter to obtain top k skills based on overall trending.
ve any history records in the database.
n the overall trending of existing subscribers.

by existing users in a certain time frame.


Problem statement
How to perform a personalised recommendation to an existing subscriber?

Scenario
June is an existing subscriber of werkseven job portal as an employee for a year. Over the year, the system has captured and s
We want to find out similar jobs (in term of skillsets) that might attract June based on her past behaviour.

Data Preparation & Methodology


Identify the top k skillsets required by the jobs that have been viewed, wishlisted, applied or rated by desired users in a certain
Suitable time frame can be the latest one month or the latest 1000 click events performed by the desired user.

Illustration

Interpretation
1) Apply filter on click event table to get desired employee's records (in this case employee_id 1005).
2) Left join the filtered click event table with job table. The output table as per the first table.
- every record in click event table will now replicate by three, with each record ties with one job skill
3) Group by employee_id and job_skills to get sum of weight as score in percent. The output table as per the second table.
4) Sort and apply filter to obtain top k skills based on the desired employee's past behaviour.
e system has captured and stored all her click activities, including job viewed, wishlisted, applied and also rated.

by desired users in a certain time frame.


desired user.

as per the second table.


Problem statement
How to perform a collaborative filtering recommendation to an existing subscriber?

Scenario
Sam is an existing subscriber of werkseven job portal as an employee for a year. Over the year, the system has captured and s
Apart from Sam, the system has also captured and stored all related click activities performed by other subscribers. We want t
based on the trending of other subscribers who have similar backgrounds with him.

Data Preparation & Methodology


Identify the top k skillsets required by the jobs that have been viewed, wishlisted, applied or rated by a specific group of users
Suitable time frame can be the latest one month or the latest 1000 click events performed by the desired user.

Illustration (input)

Interpretation
1) Generate employee score table for every employee_id. Refer tab "content-based" for the method to derive employee score
2) Transform employee score table into input dataset for modelling.
3) The input dataset consists of the following columns:
- unique key: employee ID
- variable X: employee's demographic information i.e. skills possessed by employee, qualification, specialization
- variable Y: significant skills derived from employee's click events
4) The value for skill columns (var Y) is either 0 or 1, derived based on the following rules:
- 1 if score for the particular skill (in employee score table) is greater or equal to threshold value
- 0 otherwise
5) The threshold value can be defined based on business decision or mathematical formula.
6) The concept of the model will be multi-target classification, whereby employees will be classified into different groups that
7) Each of the skill (variable Y) will be converted into one-hot encode data before it is used for model training.
8) The model structure and output are illustrated as below.

Illustration (simple neural network structure)


he system has captured and stored all his click activities, including job viewed, wishlisted, applied and also rated.
y other subscribers. We want to find out other possible jobs (in term of skillsets) that Sam might like

ed by a specific group of users in a certain time frame.


e desired user.

thod to derive employee score table.

tion, specialization
fied into different groups that possess different combination of skillsets based on their demographic information.
odel training.

Illustration (output)

Output A (recommended)
Output B

You might also like