Professional Documents
Culture Documents
Data
Collection
Data
Preparation
Training &
Debugging
Deploying &
Testing
Problem Statement
a. How to match between job applicants (users) and available job skillsets in the market?
b. The recommender system will be built based on user click event and demographic data.
Remarks:
- The click event is refer to job viewed, job wishlisted, job applied and job rating.
- System needs to have a formula to calculate the score of the skills that are related to the user
according to the click behaviors.
Data Preprocessing
a. Text processing on demographic fields. [future]
Data Exploration
a. Perform some simple data exploration on the processed dataset, and some plots or graphs can
be generated as summary reference.
Data Modelling
a. Cold start method
c. Collaborative filtering
Improve Model
a. Compare and contrast the learning results between several mechanims/ methods.
Deployment
a. Schedule for SIT and UAT.
B. Sync with IT team for the deployment method.
More In Details
a. Initially, the matching shall be performed based on the portal overall trending, which slowly evolving into
personalised or comprehensive matching when the system has gathered sufficient data.
a. Employee table
- demographic information
b. Job table
- to map job ID to related skillsets (one-to-three mapping)
c. Click event table
- records of job viewed, wishlisted, applied and rated
Score
a. Weight is being assigned to different click events to indicate their significant level.
b. The score for a skillset is the summation of its weight for all click events. [temporary practice]
c. Weight could be enhanced and new click events could be included. [future improvement]
Remark: Refer "cold start", "content-based" and "collaborative" sheets for details
Collaborative filtering
a. Apply to provide a comprehensive recommendation to the user based on his/ her cluster's trend.
b. Involve SQL, data engineering and tensorflow neural network.
Remark: Refer "cold start", "content-based" and "collaborative" sheets for details
a. Plot graphs to visualise loss and accuracy of model training and model validation.
a. Some graphs, charts or tables for internal checking before proceed to data modelling
Usage
1) Employee table stores demographic details of every employees.
Job table
job_id job_skills
101 D
101 H
101 A
102 A
102 F
102 C
Usage
1) Job table stores skills related to every job. Suppose one job ID will have 3 records.
Usage
1) Click event table stores click event history (view, wishlist, apply and rate) performed by employees on jobs.
2) The click event table should be scoped down to a specific time frame to ensure the recommendation results is always up
a) latest one month records based on created datetime or updated datetime of the raw records in database
b) latest 1000 click events, by employees or overall data depends on scenario
3) Weight is defined according to the significant level of the click event. The suggested assignment as per below:
Interpretation
1) Employee 1001 has viewed, wishlisted, applied and rated 5 stars for job 101.
2) Employee 1002 has only viewed and wishlisted job 101.
3) Employee 1003 has only viewed job 102.
expected_monthly_salary state city qualification field_of_study specialization_1 specialization_2
Scenario
Adam is a new subscriber of werkseven job portal as an employee. Since he is new, he does not have any history records in th
We want to find out possible jobs (in term of skillsets) that can be recommended to Adam based on the overall trending of exi
Illustration
Interpretation
1) Left join click event table with job table. The output table as per the first table.
- every record in click event table will now replicate by three, with each record ties with one job skill
2) Group by job_skills to get sum of weight as score in percent. The output table as per the second table.
3) Sort and apply filter to obtain top k skills based on overall trending.
ve any history records in the database.
n the overall trending of existing subscribers.
Scenario
June is an existing subscriber of werkseven job portal as an employee for a year. Over the year, the system has captured and s
We want to find out similar jobs (in term of skillsets) that might attract June based on her past behaviour.
Illustration
Interpretation
1) Apply filter on click event table to get desired employee's records (in this case employee_id 1005).
2) Left join the filtered click event table with job table. The output table as per the first table.
- every record in click event table will now replicate by three, with each record ties with one job skill
3) Group by employee_id and job_skills to get sum of weight as score in percent. The output table as per the second table.
4) Sort and apply filter to obtain top k skills based on the desired employee's past behaviour.
e system has captured and stored all her click activities, including job viewed, wishlisted, applied and also rated.
Scenario
Sam is an existing subscriber of werkseven job portal as an employee for a year. Over the year, the system has captured and s
Apart from Sam, the system has also captured and stored all related click activities performed by other subscribers. We want t
based on the trending of other subscribers who have similar backgrounds with him.
Illustration (input)
Interpretation
1) Generate employee score table for every employee_id. Refer tab "content-based" for the method to derive employee score
2) Transform employee score table into input dataset for modelling.
3) The input dataset consists of the following columns:
- unique key: employee ID
- variable X: employee's demographic information i.e. skills possessed by employee, qualification, specialization
- variable Y: significant skills derived from employee's click events
4) The value for skill columns (var Y) is either 0 or 1, derived based on the following rules:
- 1 if score for the particular skill (in employee score table) is greater or equal to threshold value
- 0 otherwise
5) The threshold value can be defined based on business decision or mathematical formula.
6) The concept of the model will be multi-target classification, whereby employees will be classified into different groups that
7) Each of the skill (variable Y) will be converted into one-hot encode data before it is used for model training.
8) The model structure and output are illustrated as below.
tion, specialization
fied into different groups that possess different combination of skillsets based on their demographic information.
odel training.
Illustration (output)
Output A (recommended)
Output B