You are on page 1of 8

SoMeDi: Successful Internship Programs Matching Job Offers

with Candidates Skills

George Suciu1, Adrian Pasat1, Ioana Rogojanu1

(1) R&D Department, BEIA Consult International, Str. Peroni 16, RO-041386,
ROMANIA
E-mail: george[at]beia.ro

Abstract
Companies are searching for ways to maximize the employees working time.
Hiring internship candidates can serve numerous purposes for the employer
depending on the organizational structure, and also aid interns to obtain work
experience and improve their professional skills. A successful internship program
should create opportunities for internal development of existing employees,
identify their strength and weaknesses for expansion or backfill. SoMeDi platform
proposes an online recruitment platform which offers several Digital Intelligence
Tools (DIT) designed to assess the professional level of the users from their cover
letters/or letters of recommendation and determine the user sentiment regarding
the company's different fields of activity and suggest an e-Learning program.
Several approaches, like natural language processing (NLP), machine learning
algorithms and unsupervised learning methods from Cloud platforms can be
applied to automate these sentiment analysis processes. This paper will present a
review of several Cloud platforms that are making use of sentiment analysis tools
and how to make the most of the metadata surrounding the DIT application. Also,
we will present the recruitment and e-Learning process for both types of users,
companies and internship candidates.

Keywords: e-Learning, Natural language processing, Unsupervised


learning methods, Machine learning, Social media

1 Introduction
The prominence of Social Media as a mass of communication is part of today’s daily
lives for citizens worldwide. Communities and personal relations of all sorts are now
inextricable from the Internet tools which have appeared over the last decade. Impacts are
felt everywhere: from news spreading through personal relationships and artistic
movements, social media has grown to cover a majority of the spectrum of human
activities. Thus, the use of social media has moved from the experimentation phase to a
more mission-critical action, placing significant pressure on social programs to efficiently
monitor, analyze and engage this variety of information.
Nowadays, online social media has become a necessary tool for recruiting since it has
the potential to be cost-effective and efficient for text analysis. New technologies are
emerging every day, and Natural language processing (NLP) has contributed to the field
of human-computer interaction providing practical applications. NLP is defined as a
computer's capacity to understand and process a large amount of the natural human
language (Suciu et al, 2018a).
Over the years, NLP improved using Machine Learning (ML) technologies and
General-purpose computing on graphics processing units (GPGPU) for the manipulation
of text and speech.
The use of NLP for social media is a complete study which refers to text pre-
processing, normalizing words and the evaluation of applications using data collected
from social media. In the early stages, the trend was to collect and process public data
from social media to train a model for every domain, but recently pre-trained models
from Microsoft, Google or IBM are used instead, as these models allow the integration of
the ML algorithms (Suciu et al, 2018b).
The main advantage is that companies can use NLP techniques and semantic analysis
to develop and improve their recruitment strategies. Taking into consideration the
occurred changes and the need to obtain relevant data, the recruitment process must be
adapted to social media since traditional forms of media are not as popular as before. To
analyze this case study, the researchers must take into consideration several important
aspects of the contextual research, the need to apply UX (User Experience) and the
interaction with passive candidates.
SoMeDi's main goal is to unlock the hidden values in the digital content and traces of
human interactions using applications which require artificial intelligence and machine
learning techniques. To reach this goal, SoMeDi project develops methods and DITs to
analyze digital interaction data, including social media. The provided methods produce
improved sentiment analysis and opinion mining to increase the perception of the user's
attitude towards topics and concepts at aspect level.
This paper aims to describe several sentiment analysis tools and compare them in
order to improve the SoMeDi platform.
The rest of the paper is organized as follows: Section II provides an overview of the
available sentiment analysis solutions, Section III describes the conceptual architecture of
the SoMeDi platform, Section IV presents the results, while Section V concludes the
paper.

2 Related Work
Sentiment analysis has been a subject of an intensive research effort which has been
applied in different domains from identifying polarity (positive, neutral or negative) to a
computational treatment of opinion, sentiment or subjectivity.
In this section, we described five sentiment analysis tools: Alchemy, SentiStrenght,
Natural Language Toolkit (NLTK), Stanford CoreNLP, Google Cloud Natural Language
API and Text Analytics API from Microsoft Azure. In the next sections, we will analyze
SoMeDi platform and we will make a between the sentiment analysis tools provided by
Google and by Azure.

2.1 Lexicon-based tools


Lexicon-based solutions use dictionaries of words. Every word is annotated with its
semantic orientation and it incorporates the negation and intensification.
Alchemy is a lexicon based tool which uses ML techniques (specifically, deep
learning) to do semantic text analysis using NLP, including sentiment analysis and also
enables the use of custom models for increased accuracy. Alchemy returns the status, the
language detected, the score and the type of the sentiment resulted after the text analysis.
The score is in the range (-1, 1). For a negative score, the type is negative, conversely for
a positive score, the type is positive. If the score is 0, the type is neutral. The main
disadvantage is that any result obtained using a non-English language is ignored. It is
described at https://console.bluemix.net/catalog/services/natural-language-understanding.
SentiStrenght is a lexicon-based classifier which employs several novel methods to
extract sentiment strength from short informal electronic text using a dictionary of
sentiment words with associated strength from an informal text. The dictionary is a
collection of 465 negative terms and 298 positive terms classified for either negative or
positive sentiment with a value ranging from 1 to 5 for positivity, p and similarly, a value
ranging from -5 to -1 for negativity. In order to calculate the scores and the level for a
document, we follow M. Thelwall's approach. It is considered that a text is negative when
, positive when . A text is considered to be neutral if and

. If it is considered that the text has undetermined sentiment and it should be


removed from the datasets. The main advantage is that SentiStrenght uses non-lexical
linguistic information in order to detect the sentiment for an informal text (Jongeling, et
al, 2015).
NLTK, installed from https://www.nltk.org/install.html, is a platform which provides
a practical solution for working with human language data. It provides an easy to use
interface to over 50 lexical resources, a suite of text processing libraries and wrappers for
NLP libraries. NLTK returns the text's probability for each kind of sentiment (positive,
negative or neutral). If the probability score for neutral sentiment is greater than 0.5, the
text is considered to be neutral. Otherwise, the text is considered to be the sentiment with
the highest probability.
Standford CoreNLP, installed from https://stanfordnlp.github.io/CoreNLP/, is
designed to be extensible and flexible. Its goal is to apply the linguistic analysis tools
easily to a piece of text. Standford CoreNLP breaks down the text into sentences and
assigns to every sentence a score with a value between 0 and 4, where 0 is very negative,
2 is neutral and 4 is very positive. The main disadvantage is that the tool does not provide
the score for the full text. In order to determine the level of the sentiment, the user has to
compute [1] where n denotes the number of negative
sentences, z, denotes the number of neutral sentences and p is the number of positive
sentences. The text is considered to be negative, neutral or positive according to the
resulted scores.

2.2 Cloud-based tools


Cloud Computing aims to provide reliable, high-quality, customized and dynamic
computing services for users. The technologies utilized for Cloud Computing are still in
the process of maturing.
Google Cloud Natural Language API, installed from https://cloud.google.com/natural-
language/, is a cloud-based service which provides natural language understanding
technologies including syntax analysis, sentiment analysis, and entity sentiment analysis.
The main advantage is that Google Cloud Natural Language API can support several
languages. The sentiment analysis tools inspect the text an identify the emotional opinion
to determine the attitude. The sentiment's score ranges between -1 and 1 and corresponds
to the overall emotional learning of the text.
Also, besides the score, Google Cloud Natural Language API gives a magnitude score
which indicated the strength of emotion. It ranges between 0 and +inf. For example, a
text is considered to be clearly negative if the score is -6 and magnitude is 0.4, neutral if
the score is 0.1 and magnitude is 0.0, clearly positive if the score is 0.8 and magnitude is
0.3 and mixed if the score is 0.0 and magnitude is 4.0.
Text Analytics API, installed from https://azure.microsoft.com/en-
us/services/cognitive-services/text-analytics/, from Microsoft Azure is a cloud-based
service which provides advanced natural language processing services and includes three
main functions: language detection, key phase extraction, and sentiment analysis. The
sentiment analysis tools generate a score between 0 and 1 using classification techniques.

3 SoMeDi platform
SoMeDi platform is intended to analyze the text and calculate the sentiment since there is
a strong correlation between social profiles and users. The sentiment analysis tools
developed within SoMeDi platform will be used for mining data from professional
networks, social media platforms to provide personalized recommendation, evaluation
of internship and/or apprenticeship programs offered by companies. The goal is to
identify the candidates' opinion regarding several aspects: company activity, required
aptitudes, and knowledge. Compared to direct competitors, SoMeDi platform will set a
novel approach by using the latest technologies such as Machine Learning, Natural
Language Processing, and Sentiment Analysis backend services deployed within a
reliable cloud infrastructure easily scalable, according to the number of subscriptions.
Figure 1. SoMeDi dashboard

SoMeDi has identified value mining of social and other digital user interactions as a
viable business model mainly related to business trends such as Software-as-a-Service
(SaaS). The objective is to applied technological innovations in the area of artificial
intelligence, opinion mining, big data to exploit the digital user interactions and transform
them in Digital Interaction Intelligence (DII).

Figure 2. Sentiment Analysis process


The main task is to find and innovate current technologies to pass through the Input –
Analysis – Output process, and here SoMeDi will enhance deep syntax structured
resources that go beyond the n-gram and bag-of-words paradigm and better capture the
complexity of natural language sentences, integrating the target of sentiments, or
considering the holder of the opinion via a deeper syntactic and semantic representation
or inference system.

4 Comparison of sentiment analysis tools


This section will provide a comparison of the Cloud-based sentiment analysis tools
presented in section 2 in order to identify the most reliable tool.
We will start by presenting the corpus of several journals and we will see the usage of
every sentiment analysis tool in research articles and we will continue by evaluating their
performances on the same text.

4.1 Distribution of sentiment analysis tools


We created a relevant database, based on all available article, in a specific JSON format
using a Python script and Scopus API. We will present the obtained results, and we will
analyze the distribution of topics. We used the tools presented as keywords.

Figure 3. Distribution of articles

4.2 Comparison between Google Cloud Natural Language API and Text Analytics
API
In this section, we present a comparison between the two Cloud services in order to see
which is the most efficient.
The main purpose of SoMeDi project is to build an efficient recruitment platform,
focused on the professional level, which collects and analyze the users’ work experience
using their cover letters and/or letters of recommendation and calculates the user
sentiment.
In Table 1. is presented the sentiment analysis score after bringing the results in the
same range.

Table 1. The score obtained using the two cloud-based services

Sentiment analysis tool: Score for full text:


Google Cloud Natural Language API 0.9279513955116272

Text Analytics API

0.8

We have created a form to analyze people’s sentiment regarding the presented cover
letter. In order to determine the efficiency of both tools, we calculated the correlation
coefficient using a reference work (Taylor, 1990).
First, after we gathered the data, we will use parametric tests to obtain the relevant
dataset. After that, the text is broken into fragments and we analyzed each fragment and
then we calculated the correlation coefficient.
The values of the correlation coefficient show how strong is the relationship between
the two variables.

Table 2. The values of the correlation coefficient


Sentiment analysis tool: Correlation coefficient:

Google Cloud Natural Language API 0,681168117

Text Analytics API 0,402209226

In our case, the values show how efficient are the proposed techniques. The value
corresponding to Azure's technology shows a reasonable relationship between the two
methods. The value corresponding to Google's technology shows a strong linear
relationship.

5 Conclusions
In this article, we have presented an overview of sentiment analysis tools and compared
two Cloud solutions due to the fact that the Cloud solutions are better than the lexicon
based solutions.
The main purpose was to identify the most reliable tool in order to implement it
further for the SoMeDi platform. The results of the comparison showed that the results
from Google Cloud Natural Language API is the most used technique for research
articles and it fit with the user’s opinion.
For future work, we will make a comparison of the Cloud-based techniques using an
increased number of cover letters and we will use the translation API for both techniques
in order to see their performances for a non-English text.
6 Acknowledgement
This paper was partially supported by UEFISCDI Romania and MCI through projects
SoMeDi and PAPUD, and funded in part by European Union's Horizon 2020 research
and innovation program under grant agreements No. 777996 (SealedGRID project) and
No. 787002 (SAFECARE project).

7 References

Journal Articles:
Taylor R. (1990): Interpretation of the Correlation Coefficient: A Basic Review. Journal
of Diagnostic Medical Sonography
Conference Proceedings:
Jongeling, R. et al. (2015): Choosing Your Weapons: On Sentiment Analysis Tools for
Software Engineering Research in IEEE International Conference on Software
Maintenance and Evolution (ICSME).
Suciu G. et al. (2018a): Design of an internship recruitment platform employing NLP
based technologies in ECAI 2018 - International Conference, June 2018, BEIA Consult
International.
Suciu G. et al. (2018b): Geolocation and Social Media for Enhanced Recruitment
Campaigns in Scientific Conference eLearning and Software for Education Bucharest,
April 2018, BEIA Consult International.

Internet Sources:
https://console.bluemix.net/catalog/services/natural-language-understanding, accessed
2018
https://www.nltk.org/install.html, accessed 2018
https://stanfordnlp.github.io/CoreNLP/, accessed 2018
https://cloud.google.com/natural-language/, accessed 2018
https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/, accessed
2018

You might also like