Professional Documents
Culture Documents
Guided by:
Dr. Ashish Sharma
And
Dr. Sandeep Tayal
INTRODUCTION
PROBLEM STATEMENT
The project's scope encompasses the entire resume screening process within
the hiring domain. It involves the automation of candidate evaluation through
the integration of Natural Language Processing (NLP) techniques and the
utilization of Python libraries such as Pandas, NumPy, Matplotlib, and
Seaborn. The scope extends to handling diverse resume formats, extracting
relevant information, and presenting structured outputs. Data visualization
tools enhance result interpretability, providing a comprehensive solution for
efficient and unbiased candidate screening. The primary objective is to
streamline and optimize the hiring process by addressing the shortcomings of
manual resume screening. The project aims to automate the extraction of key
information from resumes using NLP, leveraging Pandas and NumPy for
efficient data manipulation. Through Matplotlib and Seaborn, the objective is
to enhance the visualization of results, making it easier for hiring teams to
assess and select qualified candidates.
METHODOLGY
Dataset Preparation
The project utilizes a Kaggle dataset comprising 1000 resumes, forming the
foundation for our methodology. Employing Pandas and NumPy, we conduct data
preprocessing and manipulation to ensure data integrity. Natural Language
Processing (NLP) techniques are then applied to extract relevant information from
the resumes. The structured data is visualized using Matplotlib and Seaborn,
providing insights for streamlined candidate evaluation. This dataset-driven
approach ensures the practical applicability and effectiveness of our resume
screening methodology.
Modelling
1. KNeighborsClassifier:
The KNeighborsClassifier is a fundamental model in our methodology,
employing the k-nearest neighbors’ algorithm. In the context of resume
screening, it evaluates resumes based on their similarity to others in the dataset.
This algorithm classifies a resume by considering the class labels of its k-nearest
neighbors, effectively capturing patterns within the data. The
KNeighborsClassifier contributes to the project's ability to identify
commonalities and differences between resumes, facilitating a nuanced and data-
driven approach to candidate assessment.
2. OneVsRestClassifier:
The OneVsRestClassifier is instrumental in extending the project's applicability
to multilabel classification scenarios. Given the diverse nature of resume content,
where candidates may possess skills spanning multiple categories, the
OneVsRestClassifier enables the model to handle each label independently. This
approach is crucial for accurately capturing the various skills, experiences, and
qualifications that candidates may present in their resumes. The
OneVsRestClassifier enhances the project's adaptability, ensuring that it can
effectively process and categorize resumes with multiple skill sets,.
TECHNOLOGIES USED
Operating System:
• Windows 10/11 x64
Language Used:
• Python
Editors:
• Jupyter Notebook
• Google Colab
Libraries:
• Pandas
• Numpy
• Matplotlib
• Seaborn
• Tensorflow