You are on page 1of 1

Report for Resume Parser

A resume can be various formats so; it is difficult to extract the data in a structured and
organized form. But having the text, we can obtain the information by extracting information
from keywords and phrases, through the use of Tokenization.
So, whether the resume is in JPG, PDF, DOCX, we can extract the text line by line by using
various python libraries like OCR-Tesseract, there are multiple libraries to extract from PDF, and
Docx-python library for Document file.

 But we have to preprocess the text before inputting in the NLP model and get the relevant
details.
Text normalization and part of speech accounts for the different possible formats of resume and
normalizes them by removing stop words which are not relevant to the context. For this, we had
to train a model. Still, we have an open-source library Spacy an NLP toolkit for python which
has all the required models.
Lemmatization reduces words to their root using a language dictionary, and Stemming removes
“s”, “ing”, etc. It reduces the different possible forms of language used.
This is the part of text normalization.

1. For Phone Number, Email Address, GitHub, LinkedIn profile, Experience-We can
use regular expressions, which can easily extract Phone Numbers, Email-address, Links,
Regular-Expressions.
2. To extract skills/education/experience-We have to use Tokenization. First, we should
have all the possible skills in a file. So, we can find all the skills and list them.
3.  Similarly, for education, to extract degrees we have to use Tokenization, for this we
should have all the possible degree like. B.Tech, B.A, in a file.
4. Step 2 and 3 need can be done in one go only to reduce computational speed.
5. To extract the candidate name and Company Name, Name and Company Name is a
proper noun. In Spacy, we have trained model to obtain the Proper noun from a text.

Relevant Links For Reference:


https://omkarpathak.in/2018/12/18/writing-your-own-resume-parser/

https://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9_parsing

https://github.com/OmkarPathak/ResumeParser

You might also like