AutoTag Rec To Classify AK in AGSE

1
Automatic Tags Recommendation to Classify

Architectural Knowledge in an Agile Global Software
Engineering Environment
Gilberto Borrego
(Instituto Tecnológico de Sonora, Mexico
gilberto.borrego@itson.edu.mx)
Samuel Gonzalez-Lopez
(Universidad Tecnológica de Nogales, Mexico
sgonzalez@utnogales.edu.mx)
Ramon R. Palacio
(Instituto Tecnológico de Sonora, Mexico
ramon.palacio@itson.edu.mx)
Abstract: Agile global software engineering (AGSE) presents challenges on architec-

tural knowledge (AK) management, since agile paradigm prefers face-to-face interac-
tions over comprehensive documentation, which is reflected on the knowledge-sharing
practices, causing documentation debt, and AK losing over time. AK condensation
concept has been proposed to reduce AK losing, using the AK shared through unstruc-
tured textual electronic media (UTEM). A key part of this concept is a classification
mechanism, which eases AK recovering in the future. We implemented this mechanism
through a Slack complement based on tagging, which recommends tags according to
a chat/message topic, using natural language processing (NLP) techniques. We evalu-
ated the Slack complement comparing NLP assisted tagging against auto-completion
tagging, in terms of correctness and time to select a tag. We had 52 participants which
used the Slack complement during a controlled evaluation in which they emulated a
documentation debt scenario. We obtained that the messages tagged trough NLP rec-
ommendations showed statistically less semantic errors (using a proper tag) and they
spent statistically less time to select a tag. Also, participants perceived the component
as very useful and usable. These results indicated us that a tag recommendation system
is necessary to implement the AK condensation, since it helps to classify the shared
AK, to recover it in the future. As a future work we will improve the NLP techniques
to evaluate AK condensation in a long-term test.
Key Words: Agile global software engineering, architectural knowledge condensation,
natural language processing, tags recommendation system
1 Introduction
Architectural knowledge (AK) is a core part of any software project [Vliet, 2008],
and it is commonly documented based upon standards [Kruchten, 2009]. This
documentation eases the knowledge management (KM) cycle [Dalkir, 2011] (cre-
ate/capture, share/disseminate, acquire/apply), even when companies practice
global software engineering (GSE), since, transferring/sharing documents among
2
the distributed facilities is a common action on these environments in the aim

to decrease the effect of the four distances in this paradigm [Holmstrom et al.,
2006] (physical, temporal, linguistic and cultural).
Nowadays, GSE companies are adopting agile software development (ASD),
which caused the arising of the agile GSE approach (AGSE); in fact, VersionOne
1
reports in 2019 that 68% of surveyed agile enterprises practice AGSE. This
trend causes a paradigms contradiction referring to knowledge managing. ASD
states that functional software and face-to-face communication are preferred
over a processes and comprehensive documentation2 . In the practice, most of
agile projects’ documentation could be replaced by enhancing informal commu-
nication, because ASD propose a stronger emphasis on tacit knowledge rather
than explicit knowledge [Cockburn and Highsmith, 2001]. This emphasis on tacit
knowledge has to be done without disregarding formal documentation [Yanzer
Cabral et al., 2014], however, in the AGSE practice this implies a reduction of
AK documentation or even its disappearance [Borrego et al., 2016, Yanzer Cabral
et al., 2014, Clerc et al., 2011], known as documentation debt [Tom et al., 2013].
This situation directly affects the AK management, since most of the projects’
AK remains tacit [Bider and Söderberg, 2016], hindering the KM cycle in AGSE
environments, because traditional KM on GSE is based on the explicit knowledge
sharing.
The prevalence of AK documentation debt and the affectation to the KM
cycle, eventually causes AK vaporization [Bosch, 2004], and it could trigger
some of the following problems in AGSE projects [Holz et al., 2003, Uikey et al.,
2011]: poorly understood technical solutions; reduced knowledge transfer be-
tween teams; defects in software evolution and maintenance; and time wasted
by experts answering questions and attempting to find solutions to problems
previously solved. It is worth to highlight that constant questions could cause
interpersonal relationship erosion, which could affect the knowledge flow [Ryan
and O’Connor, 2013], and the trust relationship building, which is important for
agile teams.
In the aim to address the above problems, in a previous study [Borrego et al.,
2019] we presented the knowledge condensation (KC) concept, which takes ad-
vantage of the AGSE developers’ preference to share AK by unstructured textual
and electronic media (UTEM), e.g. instant messengers, emails, etc. [Clerc et al.,
2011, Estler et al., 2012]. KC concept includes a knowledge classification mech-
anism, which was initially based on tagging, so that developers tag relevant AK
during their interactions. These tags have to be registered by the development
teams, so that the tags were related to the teams’ jargon. Also, these tags must
1
https://www.stateofagile.com/#ufh-i-521251909-13th-annual-state-of-agile-
report/473508
2
https://agilemanifesto.org/
3
be related to a set of meta-tags, which are semantic anchors to ease the AK

later retrieval. This semi-fixed tagging mechanism could avoid the problems re-
lated to the free and unassisted tagging, such as: tag explosion, interpretation
differences of a tag’s meaning, incomplete context to understand a tag; tags
that only make sense when used together (known as composite tags), and tags
with the same meaning but written differently (known as obscure similarity)
[Bagheri and Ensan, 2016]. These problems could lead to tagging difficulties
which could ruin the AK classification mechanism, and consequently they would
cause information retrieval problems [Nagwani et al., 2017]. In order to preserve
the classification mechanism, in the same study [Borrego et al., 2019], we pre-
sented a tagging helper implemented for Skype, which auto-completes tags in
alphabetic way during developers UTEM interactions, helping them to select an
existing tag and to type it in the correct way. Both, AK classification mecha-
nism and tagging assistant were evaluated and we obtained broadly acceptance
by the participants. However, they suggested including a more intelligent way
to get help on the tagging action, which we considered important in order to
increase the KC concept adoption in a real AGSE environment.
In this paper, we present an AK classification mechanism based on semi-fixed
tagging, which was implemented as a Slack3 service, including a tags recom-
mendation module, based on natural language processing (NLP) and statistical
language models techniques. This mechanism was evaluated to determine the
semantic correctness of the recommended tags with respect to the context of the
tagged message, as well as comparing the time spent to tag message on both
ways: assisted by auto-completion and assisted by NLP. The main contribution
of this paper is that a tag recommendation system is necessary to implement the
AK condensation, since it helps to classify the shared AK in a correct and quick
manner, to recover it in the future. The remainder of this paper is organized as
follows: Section 2 presents the work related to AK extraction/classification and
tagging recommendation in software engineering; Section 3 presents a descrip-
tion of the NLP models used to obtain the tags recommendation; in Section 4, we
present the Slack complement used to implement AK classification mechanism;
Section 5 presents the method used to evaluate the complement which includes
the tags recommendation component; in Section 6 and Section 7, we respectively
present the results of the evaluation and its threats of validity. Finally, in Section
8 we discuss the obtained results and we present our conclusions in Section 9.
2 Related work
Any software project AK involves a set of decisions to justify the design of a

system; diverse authors have identified that the majority of the projects’ AK is
3
https://slack.com/
4
in tacit state and consequently it is difficult to manage AK. Further, the existing
explicit AK is spread on diverse locations (e.g. storages, repositories, etc.) and
diverse types of artifacts. In that same sense, Souza et al. [Souza et al., 2019]
presented the limitations of the existing methods to derive architectural mod-
els from the specification of requirements: undetailed decision-making process,
inconsistency between the artifacts, and semantic loss and lack of traceability
between models. These limitations are derived from a lack of understandable and
standardized terms which causes poor communication and eventually causes mis-
takes mainly on the activities of novice developers or software architects with
less experience.
One proposal to address the above presented problems is the recommen-
dation systems. Mirakhorli et al. [Mirakhorli et al., 2015] presented ARS, an
architectural recommendation system which is proposed to work collecting large
amounts of data from open source software repositories to detect and extract a
large set of architectural design concepts using a set of architecture profiler tech-
niques, such as design patterns, design tactics, architecture styles, etc. However,
this works only as a proposal that was useful to identify challenges associated
with the volume, velocity and variety of our data set, and other challenges about
big data systems. In the same sense, RecovAr [Shahbazian et al., 2018] is a tech-
nique to automatically recover the design decisions of the history of projects’
artifacts, directly from issue trackers and version control repositories, however,
this work does not guarantee adequate scalability to track user stories, thus in
AGSE environments this technique could be inefficient on small projects with a
non-rigid discipline regarding the commits and bug fixing comments. Likewise,
Sobernig and Zdun [Sobernig and Zdun, 2016] reported a process of distillation
of architectural decisions based on UML models. AK artifacts (e.g., architecture
documentation, interviews) were systematically coded to ease the identification
of architectural design decisions and their details. This codification was found
useful for structuring and analyzing a design space during the identification of
decisions, and for helping to prioritize decisions and their details for the respon-
sible developers.
Another proposal to address the problems described above is the use of tag-
ging recommendations to classify work items. Tagging has been used in software
engineering for information finding, team coordination and cooperation, and
helping to bridge socio-technical issues [Treude and Storey, 2012, Storey et al.,
2009]. Automatic tag recommendation in software engineering started with the
TAGREC method [Al-kofahi et al., 2010] to recommend tags for work items in
IBM Jazz, based on the fuzzy set theory. More recently, neural networks had
been used to recommend tags for question & answers (Q&A) sites [Liu et al.,
2018], founding that this technique could infer the posts context and conse-
quently it could recommend accurate tags for new posts. On the same line, in
5
[Mushtaq et al., 2018] it was proposed a framework to obtain explicit and implicit
knowledge from community-based Q&A sites (e.g. StackOverflow) through min-
ing techniques to support AK management; however, the authors did not report
any implementation. In [Parra et al., 2018], a process to extract relevant key-
words from video-tutorials was proposed, which uses the video title, description,
audio transcription, as well as external resources related to the video content,
such as StackOverflow publications and its tags. This method showed that it can
automatically extract tags that encompass up to 66% of the information that de-
velopers consider to be relevant to outline the content of a software development
video.
The above cited works use the traditional approaches in tags recommenda-
tion (EnTagRec, TagMulRec, and FastTagRec), and they were compared against
deep learning based approaches (TagCNN, TagRNN, TagHAN and TagRCNN)
[Zhou et al., 2019], obtaining that the performance of TagCNN and TagRCNN
was more adequate in terms of performance, than the traditional approaches.
However, all these approaches do not consider the ”cold start” problem [Martins
et al., 2016], i.e., the absence of previous information about the target object.
Another way of tagging is presented by Jovanovic et al. [Jovanovic et al.,
2014], called semantic tagging, which uses Wikipedia information to semantically
establish the tagging process, and to provide a methodical approach to apply
tagging in software engineering content. In the same sense, TRG [Wang et al.,
2014] proposes to discover and enrich tags focused on open source software,
through using a semantic graph to model the semantic correlations between the
tags and words of the software descriptions.
With the presented works, it can be noticed that StackOverflow or any other
similar platform can be considered as a source of AK, as was stated by Soliman
et al. [Soliman et al., 2016], who analyzed and classified StackOverflow posts
and these were contrasted with software engineers. They obtained that there is
valuable AK particularly related to technological decisions. However, these dis-
cussions do not include specific project AK such as business rules, architectural
decisions based on particular conditions of the project, etc. Therefore, organiza-
tions need to unify their AK sources, in the aim to have a similar resource as
Q&A systems. CAKI [Musil et al., 2017] is a proposed solution to the necessity
of internal AK, which works as a continuous integration of organization-internal
and external AK sources, with enhanced semantic reasoning and personalising
capabilities.
It is evident that there are different approaches to enhance the AK managing,
focused on the extraction, classification and searching of knowledge. Many efforts
are focused on extract/classify AK from public Q&A services sites which con-
centrate enormous volumes of information, experiences and knowledge related
to technical aspects of software engineering, using different artificial intelligence
6
techniques and data mining. However, there is AK which is not stored in those
Q&A services, such as discussions about project business rules, about architec-
tural decisions which depends on the project conditions, and even AK related
to the project deployment. This AK is usually stored in the UTEM log of the
software company [Borrego et al., 2016], and these information sources are our
focus with the KC concept. In the aim to help on classifying the unstructured
AK in UTEM log, we chose assisted tagging to recommend adequate tags to
project stakeholders. In the following section, we describe the Slack service that
was developed to recommend tags using NLP models.
3 Slack tagging service
Slack is an application that facilitates communication and collaboration within

teams, which is booming in ASD and AGSE, and researchers have even studied
the way in which developers use Slack [Lin et al., 2016, White et al., 2017, Chat-
terjee et al., 2019]. Slack can be considered as an UTEM, since the messages
shared between the interested parties have no fixed structure to ease the re-
trieval of AK. Given the popularity of Slack, we decided to develop an Slack
tagging service so that the concept of AK condensation could eventually be
evaluated in a real AGSE environment. Hereunder, the operation of the service
is described in terms of the elements that must be implemented to instantiate
the concept of AK condensation [Borrego et al., 2019], excepting the element
named ”Architectural knowledge searching mechanism”, since it is out of the
scope of this study (an example of the implementation of this mechanism can
be seen in [Borrego et al., 2019].
3.1 Accessible messages of the UTEM log files

Everyone involved in an AGSE project must be able to access the information
contained in the UTEM log files to consult the AK that is shared among the
development team. In order to achieve this part, we developed a component in
Node.js, which extracts the messages periodically (or at request) using the Slack
API, and then stores the messages in a common repository (Algolia4 ). The main
objective is to keep the Slack messages (previously classified by tags – see next
section) and another UTEM’s messages located in one single point so that the
AK can easily be found.
3.2 UTEM log files classification mechanism

In [Borrego et al., 2019] we learned that tagging is feasible to classify the AK
stored in the UTEM log files. With this focus, we proposed that developers tag
4
https://www.algolia.com/
7
the UTEM messages which they consider relevant in terms of AK, during their
daily interactions, to recover them in the future. In the aim of avoiding a tags
explosion [Bagheri and Ensan, 2016] that occurs when free tagging is allowed, we
chose a semi-fixed tagging in [Borrego et al., 2019], this type of tagging has as
a meta-tag base model (obtained from a previous study [Borrego et al., 2016]),
which represents the AK shared in UTEM in an AGSE environment. Each meta-
tag is related to user-tags which developers register in a web application at the
beginning of each agile development cycle, after they agree about which tags
they will use during that cycle. The meta-tag model could be changed by other
model that fits in others environments; the main objective of this model is to
have a semantic anchor to recover AK in the future.
Figure 1: Graphic user interfaces of Slack tagging service. Part A depicts the
auto-completion assistance to select a tag; part B depicts the tags recommenda-
tion list of the NLP assistance which come from the language models.
In the aim to ease the tags recalling during the interactions through Slack,
we implemented a tagging system on which developers have to select the option
“Tag! Tagger” from the options that appear for each message. Then, a dialogue
box appears to allow the selection from a tags’ list previously recorded, including
meta-tags [see Fig. 2 part A]. This selection list includes an auto-completion
feature, that filters the available tags as the developer is writing. It is also possible
to enter customized (free) tags in a text field. In the last selection list, developers
can view the tags recommendation list which comes from the language models
[see Fig. 1 part B]. On this list, the first element is the semantically closest tag
to the message to be tagged, and so on. It is worth to remark that the tagging
system has three different ways to tag a message only for evaluation purposes.
Once the tag is selected, the user clicks the button “Tag!” to republish the
message in the same channel, with the selected tag(s) [see Fig. 2].
8
Figure 2: Slack graphic user interface where the tagging service re-published a
tagged message to the conversation channel (red square).
4 Language models development
As we mentioned before, we incorporated NLP techniques and statistical lan-

guage models (LMs) into Slack tagging service. The developed method seeks to
capture features of the seven categories of UTEM considered for implementa-
tion. A statistical model of language estimates the prior probabilities of terms
(can be a word, symbol or character) strings. The probability of a sentence S (as
a sequence of terms ti) is: P (S) = P (t1, t2, t3, . . . , tn). In [Fig. 3], sentence S1
represents the original comment from a developer in the #UserStory category.
S2 is the sentence processed with Freeling, a multilingual tool that provides the
grammatical class of terms.
S1*: I am almost done, now from story 3, the resource you consult is the same as
story 2?
S2*: I PRP am VBP almost RB done VBN , Fc now RB from IN story NN 3 Z , Fc the
DT resource NN you PRP consult VBP is VBZ the DT same JJ as IN story NN 2 Z ?
Figure 3: Example sentence and its transformation through Freeling. Both sentences
were translated from original comments in Spanish.
For instance, the probability of the term in position 8 “story” could be ex-
9
pressed as shown in [Eq. 1]. This probability is affected by the appearance of the
term “story” in the entire collection of interactions (corpus). The first step was
to create a model for each category, which correspond to tags used in a previous
work [Borrego et al., 2019] (Encryption, Documentation, UserStory, RESTTest,
RESTResource, TechnologicalSupport and TestData). Documentation and Tech-
nologicalSupport are meta-tags.
P (S) = P (t1, t2, t3, t4, t5, t6, t7, t8), P (t8|t1, t2, t3, t4, t5, t6, t7) (1)
Seven language models were created with the SRILM version 1.7.3 tool .
SRILM is a toolkit for building and applying LMs, primarily for use in speech
recognition, statistical tagging and segmentation, and machine translation. 291
interactions previously validated by an expert conform the corpus and were used
to generate the language models. The corpus is grouped as shows in [Tab. 1].
In previous work [González-López et al., 2018] it was possible to identify that
the language models had a better performance than the bag of words (BoW)
method, a technique widely used to represent text data.
The configuration used for the creation of the models was as follows:
– N-grams-2 [bigrams]: the sequence of terms was determined after performing

an analysis between N-grams of size 2, 4 and 6.
– We used Kneser-Ney discounting (kndiscount) as a smoothing method.
– Gtmin: specify the minimum counts for N-grams to be included in the LM,
we used gt2min.
4.1 Text Preprocessing
In order to train the seven models, all terms were used. Repeated text segments
are also removed to avoid overfitting. Then, the text was lemmatized with Freel-
Tags and Metatags Elements by category

Encription 26
Documentation 34
UserStory 88
RESTTest 43
RESTRe-source 61
TechnologicalSupport 16
TestData 23
Table 1: Seven categories and 291 interactions for training

10
ing 5 with the default config file for Spanish, that also allowed to obtain the
part-of-speech (POS) tags for the text.
The operation of the models embedded in the Slack tagging service is as
follows: the trained language models receive the requests with the interaction
that the student or professional wrote. Subsequently, the perplexity value was
calculated in each of the LMs and a ranking was made, where the first position
indicated that this interaction is very similar to the language model.
The output of each language model tool is numerical, which represents per-
plexity that expresses the confidence of the sentence tested in the model. A low
value of perplexity is higher confidence. The measure used to compare the simi-
larity was ”average perplexity per word” (ppl1). Thus, the Slack service suggests
a list of categories ordered by similarity. It is worth mentioning that the per-
plexity value is expressed between 0 to 1, where values close to zero indicate a
very strong closeness.
5 Evaluation method
In this section we present our method (based on the Wohlin et al. [Wohlin et al.,
2012] guidelines) to evaluate the Slack tagging service.
5.1 Objective
To analyze the use of the Slack tagging service, with the purpose of determining
the semantic correctness of the suggested tags with respect to the context of
the tagged message, as well as comparing the time spent to tag message in a
both-way: assisted by auto-completion and assisted by NLP, from the point of
view of professional developers and students in the context of agile and global
software engineering.
5.2 Planning
5.2.1 Context and subjects selection
This study had two contexts: academic and industrial. The academic context was
at a computer lab at two public universities. Regarding the industrial context,
it was in different software development companies, taking care to select a room
where they could not have any interruptions during the evaluation activities.
The subjects of both contexts were chosen for convenience. The participants
in the academic context were students enrolled in either a software engineering
program or an information technology program, taking care that they already
had coursed subjects related with agile methodologies and web development,
5
http://nlp.lsi.upc.edu/freeling/
11
since the context scenario of the evaluation was a Web project driven by an
agile method. Regarding the industrial context participants, they were software
professional developers with experience in agile/distributed development and
web application projects.
5.2.2 Study design, variables selection and hypotheses formulation
This study can be considered as a quasi-experiment with a within-subject design,

since all the treatments were applied to all the participants. All the participants
were mentally situated in a context scenario to interact through Slack with
a counterpart and using the tagging service (with predefined user tags) and
following a chatting script which contains eight marks suggesting what to tag,
without specifying the tag to use. There were two types of marks: one indicates
to use a tag recommended by the Slack tagging service, another type of mark
indicates to select a tag from the tags catalog or using the auto-complete feature.
The independent variable is represented by the different ways of choosing
tags, using either the tags recommendation feature or the tags auto-completion
feature. The dependent variables were the semantic correctness of the chosen
tag [see Section 5.3.3 to know how we determined the tag correctness] and the
time to select a tag. In addition, we observed the number of new tags that
the participants created during the evaluation, as well as the perceived ease of
use and perceived usefulness. Based on the previous variables, we stated two
hypotheses for this study.
– H0Correctness . There is no significant difference in the number of messages

correctly tagged by using the tags recommendation feature or by using the
auto-completion feature.
– H0T ime . There is no significant difference in time spent to tag a message

by using the tags recommendation feature or by using the auto-completion
feature.
5.2.3 Instrumentation
Below, we present the instruments developed in order to conduct this study as

it was designed.
– Context scenario. This scenario concerned two agile developers from different
companies and locations working on the same project (medical appointments
system), one of whom required information about a RESTful service that
the other was developing. They had documentation debt, and consequently
they had to acquire the project AK by asking questions to each other.
12
– Chatting scripts. Each pair of participants had to follow two scripts (one per
scenario role) to simulate a technical conversation taking place using Slack
regarding the context scenario.
– Slack tagging service. It was presented on [Section 3]. In order to obtain the
time spent to tag a message, the service also registers the time in which it
was activated and the time in which it was closed.
– Messages gathering program. It is a program developed in Node.js which uses

the Slack API to extract the messages of public channels, and then sends
them to the Algolia repository.
– Extended TAM questionnaire. We prepared a questionnaire in Google Forms

which was based on the Technology Acceptance Model [Davis, 1989] (TAM)
using a Likert-7 scale. We extended this questionnaire adding items such:
how the Slack component integrates to the daily work, and another in which
we ask about enhancements that the participants would consider to add to
the component. Also, this questionnaire collected the following demographic
data: age, years of experience in agile development, years of experience in
distributed/global software development.
5.3 Operation
5.3.1 Preparation
We verified that the computers to be used in this study had an Internet con-
nection and a web browser which was able to execute the Slack messenger. We
created a Slack workspace and we installed the tagging service. Then, we added
the user tags which correspond to the context scenario. The user tags were:
IPService, TestsREST (related to TechnologicalSupport meta-tag); RestApikey,
RestSecurity, Encryption, TestData, RESTResource, RESTResponse (related
to Code meta-tag); AngularEncryption (related to Component meta-tag); and
UserStory (related to Documentation). In addition, we checked that the tags
corresponding to the trained NLP models are appearing in the respective list.
5.3.2 Execution
We had 52 participants which 30% were professional developers (26.4 years old
on average, s.d. = 3.6) from three different Mexican companies, who declared
2.7 years of experience in agile development (s.d. = 2.09) and 2.06 years of
experience in global/distributed software development (s.d.= 2.5). The rest of
the participants were students (21.6 years old on average, s.d. = 2.7), from two
different Mexican universities. The evaluation sessions consisted of three parts
which we explain below.
13
– Introduction (duration 10 minutes). The participants were given an overall

explanation of the study sessions along with their objectives. We organized
the participants in pairs (to chat between them) and then we asked their
email addresses to register them on the Slack workspace. We created a public
channel for each pair of participants (to isolate the pairs’ conversations) and
we helped them to complete the Slack registration. We gave the participants
a short training session regarding how to use the tagging service (3 minutes,
approx.), and they quickly explored the available tags (2 minutes, approx.).
Then, we described the scenario in which they would be located to carry out
the tasks, and we assigned a role to each pair member: either the developer
working on a RESTful service or the developer who wished to use it. We
gave them the corresponding scripts and we explained to them that the
scripts had marks indicating when to use the recommendation feature. Each
member of each pair sat in a different part of the session room, ensuring
they had no visual contact, as if they were geographically distributed. We
asked them to avoid talking to each other to better emulate an environment
of geographic distribution.
– Interacting through Slack (duration 25 minutes). The participants used Slack

to chat following the corresponding script, and tagging aided by the tagging
service. We explained them that the script was only a guide, thus they could
paraphrase the messages. We also told the participants that they could write
a new tag (unregistered/invalid tag) if they could not find one that fitted a
certain message on the options shown by the tagging service.
– Finalization (duration 3 minutes). When the participants had finished chat-

ting through Slack, they answered the TAM-based questionnaire. Finally, we
executed the messages gathering program to send all the channel conversa-
tion to a common repository.
5.3.3 Data collection and data validation

The tagged messages from each conversation were collected from the repository
and we executed an automatic process to determine whether the used tags are
semantically correct or not. This process consisted in comparing the tagged
messages against the expected tags, accounting that the meta-tags hierarchy of
each tag was also considered as correct. We noticed that participants also tagged
messages which were not marked to do so. In those cases, two expert persons
manually evaluated whether the tag fitted to the message semantics. In addition,
we extracted from the tagging service log file the closing and opening time of
the tagging dialog, in order to obtain the time spent to select a tag for each
participant. Also, we obtained the answers of the TAM questionnaire directly
from Google Forms. Finally, we discarded those tagged messages which did not
14
correspond the chatting script such the initial testing messages and those which
were written after the final line of the chatting script.
6 Results
We obtained 356 tagged messages, where 50.6% were tagged using the auto-
completion feature, and 49.4% were tagged using the recommendation feature.
We present the results organized as follows: one section per each stated hypothe-
sis, another section in which the TAM results are presented, and a section where
we present additional observations of this study.
6.1 Tagging correctness

In this part, it is worth to remember that the participants did not register any tag
for this controlled study, as it is supposed to happen in a knowledge condensation
implementation. This was the reason why we allowed the participants to create
tags when they considered that the predefined tags/meta-tags did not fit to a
message. We obtained that 14% of the messages were tagged with a tag created
by the participants, from which 20% were incorrect and 80% were correct. In
addition, 25% of messages with new tags were considered as synonyms of the
predefined tags. Thus, in the aim to calculate the number of messages tagged
correctly using the auto-completion feature, we summed the messages which
were correctly tagged with the existing tags, and those which were correctly
tagged using a new synonym tag. Moreover, we added the number of correctly
tagged messages which was done using the recommendation feature. It is worth
to remember that we excluded the messages which were tagged with a new tag,
and that we included those new tags which could be considered as synonyms.
[Fig. 4 part A] shows that participants had more messages correctly tagged
when they used the recommendation feature. In order to determine whether
this difference is significant, we applied a goodness of fit test, based on χ2 with
α = 0.05, assuming a uniform distribution. We obtained that the difference
was statistically significant (χ2 = 16, p − value = 0.00006), even the difference
was significant if we assume a distribution on which correctly tagged messages
using the recommendation feature, corresponds to 53% (χ2 = 4.455, p − value =
0.03479), thus we can reject the null hypothesis H0Correctness .
On the other hand, if we group the data by participants, the above stated
difference is also evident on the average of correctly tagged messages per partic-
ipant on both participant profiles [see Fig. 4 part B]. We verified whether both
data sets (average of correctly tagged messages using recommendation and auto-
completion) were normally distributed by applying Kolmogorov-Smirnov test of
normality. We obtained that the set of the tagged messages using the auto-
completion feature was not fit to the normal curve (D = 0.18621, p − value =
15
Figure 4: Tagging correctness using the recommendation feature and using the
auto-completion feature, classified by participants profile. Part A shows the per-
centage of correct tagging on both ways. Part B shows the average of correct
tagging by participant (whiskers represent standard deviation).
0.04731). Thus, we applied the Wilcoxon signed-rank test (α = 0.05) to deter-

mine whether the difference between the average of correctly tagged messages per
participant using recommendation and auto-completion feature was significant.
We obtained that the difference is statistically significant (W = 237.5, p−value =
0.00452), hence, this result provides more evidence to reject the null hypothesis
H0Correctness .
6.2 Time spent to tag messages
We obtained that the participants spent less time tagging when the recommen-
dation feature was used (avg = 11.03 sec., std = 5.23), either in general, or
per each participant profile [see Fig. 5, part A]. It could be said that this re-
sult is obvious since the recommendation list only shows seven options, but
we were interested to determine whether the difference is statistically signifi-
cant. In this case, we applied a T-test since both sets of data were normally
distributed according to the Kolmogorov-Smirnov test (Time using recommen-
dation: D = 0.11737, p − value = 0.48686 - Time using auto-completion: D =
0.12835, p − value = 0.34096). By applying T-test (α = 0.05), we obtained that
the difference between the time spent to tag by auto-completion and to tag using
the recommendation feature was significant (T = −5.10628, p−value < 0.00001).
In the aim to deeply explore this result, we discarded the outliers values of
both data sets, which were obtained by applying the Grubbs’ test with α = 0.05
(Time using recommendation: Z = 4.2314, outlier − value = 33.18 - Time using
auto-completion: Z = 5.0222, outlier − value = 85.90), and the difference was
still visually present, as is shown in [Fig. 5 part B]. Removing the outlier values
16
Figure 5: Time (in seconds) spent to tag a message using the recommendation
feature and using the auto-completion feature, classified by participants profile.
Part A shows the results including outlier values, and the part B shows the results
without the outlier values (whiskers represent standard deviation).
give us that the average time was 10.56 sec. (std = 4.13) using the recommen-
dation feature, and 19.94 sec. (std = 9.06) using auto-completion feature. We
applied again T-test (α = 0.05) and the difference was also statistically signifi-
cant (T = −6.48889, p−value < 0.00001), thus, we can reject the null hypothesis
H0T ime .
6.3 Additional observations
We noticed that 7.3% of the messages were tagged using the recommendation
feature when it was not indicated in the script, where 57% were correctly tagged.
This fact indicates that the language models used to detect the messages con-
text are robust enough to support different ways of writing diverse messages
of the same topic. The average of correct and incorrect tagged messages per
participant was 0.89 (std = 0.90) and 0.67 (std = 0.68) respectively. We ap-
plied the Kolmogorov-Smirnov test of normality and we obtained that both
sets, correct and incorrect tagged messages, were normal (correct messages:
D = 0.23407, p − value = 0.23742 - incorrect messages: D = 0.28579, p − value =
0.0855). Hence, we applied a T-test (α = 0.05) obtaining that the difference be-
tween both sets is not significant (T = 0.83299, p − value = 0.205331), which is
an expected result since the language models were not trained to support the
kind of messages where participants tagged.
In addition, when participants used the recommendation feature and they
tagged correctly, 85% of the times the selected option was in the first position
of list [see Fig. 6 part A]. This difference is significant according to the goodness
of fit test based on χ2 (α = 0.05), assuming a distribution on which the first
position has 75% and the rest 25% (χ2 = 7.714, p − value = 0.00548). On the
17
Figure 6: Position of the correct option on the recommendation feature when

participants tagged correctly (A) and incorrectly (B), when they used the recom-
mendation component.
other hand, when the participants used the recommendation feature and they
tagged incorrectly, 68% of the times the correct option was located in the first
position of the list [see Fig. 6 - part B]. This difference is significant according to
the goodness of fit test based on χ2 (α = 0.05), assuming a distribution on which
the first position has 58% and the rest 42% (χ2 = 4.599, p − value = 0.032). This
result means that the language models were preciser than the participants, when
tagging was unexpected. The tags that were correctly recommended were the
following: DataTest, Documentation, TechnologicalSupport, AngularEncription,
RestResource and UserStory.
Finally, we obtained 176 messages that were tagged using the recommenda-
tion feature, without the unexpected tagged messages, where in 112 messages
the correct tag was in the first position of the recommendation list. With these
data we can calculate a P recision = 112/176 = 0.6363 of NLP recommendation
method. On the other hand, if we include the unexpected tagged messages, we
have 204 tagged messages, where in 125 messages the correct tag was in the first
position of the recommendation list, i.e., now P recision = 125/204 = 0.6127.
6.4 Qualitative results

Participants perceived the implemented Slack tagging service as very useful and
easy to use in general (median of 6 in Likert-7 scale) [see Tab. 2], highlighting
that for both measures (usefulness and ease of use) the mode was 7 in Likert-7
scale. In [Tab. 2] we can observe that all the Likert values obtained for usefulness
and ease of use are considered as positive (greater of 4), which means that the
tagging service was well perceived by most of the participants. However, there
were some outlier values (Developers ease of use perception, values of 2,3 and
18
Students Developers
Daily work Daily work
Usefulness Ease of use Usefulness Ease of use
integration integration
Mode 7 7 7 7 7 7
Median 6 6 6 6 7 6
Q.75 7 7 7 7 7 7
Q.50 6 6 6 6 7 6
Q.25 5 5 5 5 6 4.25
Table 2: Extended TAM results by participant profile. All the values are based on
a Likert-7 scale.
4, Students perception on each aspect, values of 1), which could be related to

the received suggestions about the pre-loaded tags, on which it is mentioned
that participants would have liked to participate on the tags registration. Also,
participants commented that the way of tagging must be improved, however, it
fully depends on the Slack conditions to create complements or services; partic-
ularly, participants asked to be able to tag before sending a message, not when
the message has already been sent, which is how the tagging services works
currently. These results are accompanied by some comments of the participants
such: ”...tagging messages is a bit tedious... I don’t think that people would use
it in an agile development...”, which indicate us that the service operation has
to be improved. Finally, we extended the TAM questionnaire with one question
about how the tagging service integrates to the daily work. We obtained that
participants think that this service integrates easily to the working conditions of
AGSE projects. However, we obtained a value lower than 5 (Likert-7 scale) on
quartile 25 and some outliers values [see Tab. 2], which could be related to the
above reported problems too.
7 Threats to validity
In order to understand how valid are the results and how they can be used,
we present the validity threats according to Wohlin et al.’s [Wohlin et al., 2012]
specification.
Regarding the conclusion validity, this study had participants from different
backgrounds, and with different experience using Slack. In order to balance the
participants’ knowledge about Slack, they received a brief training regarding
how to use this application, and about how to use the tagging service. The
participants were anonymous in this study, and we did not have previous contact
with them before or after the evaluation, therefore, there was no reason for trying
to please us. Furthermore, we are aware that the evaluation period was quite
19
short, however, the results indicate a clear trend about the benefits of using the
implemented Slack service.
Referring to the internal validity, the study sessions were short to prevent
the participants from getting bored or tired. We attempted to avoid learning
effects by using a counterbalancing technique, i.e., we placed the participants in
groups and we presented the conditions (auto-completion and recommendation)
to each group in a different order. In addition, all the participants were volunteers
and showed interest in collaborating in this study. Regarding persistence effects,
the study was run with subjects who had never taken part in a similar study.
Moreover, the participants did not have any previous knowledge of the context
scenario, since it was fictitious.
Taking the point of view of the construct validity, we measured the required
time to tag a message using the tagging service, which registered the time when
the tagging screen is shown and the time when it is closed by the participants.
This time could have been affected by the participants’ reading speed and by the
time to decide a proper tag, however, we considered that participants had similar
abilities and this threat could have been reduced by the arrangement of the sets.
In order to discover the participants perceptions about how the tagging service
integrates to their daily work, we extended the standard TAM questionnaire by
adding one question with the same structure than the others TAM questions.
This extension provided us with a structured means to obtain the participants’
perceptions of topics that the conventional TAM questionnaire does not include.
Finally, we identified two main threats to external validity: subjects and
tasks/materials. Regarding subjects, we included students in order to have more
controlled conditions to conduct the study in an academic context. Unfortu-
nately, these students had no experience of real AGSE projects, but they had
taken courses concerning ASD and where they had been in touch with GSE top-
ics during their university education. We included professional developers with
experience in AGSE to enforce external validity. Concerning tasks/materials, the
chatting scripts were based on a fictitious scenario, but taking care to include
real world situations and characteristics.
8 Discussion
In this section, we discuss our results in two ways: implications about the AK
condensation in AGSE, and implications about natural language models.
8.1 Implications about the AK condensation in AGSE
We observed that participants tagged in correct manner more frequently when

they used the recommendation feature. In our previous study [Borrego et al.,
20
2019], we obtained 62% of correctly tagged messages, and in this study we ob-
tained 66%, where 32% corresponds to tagging by the auto-completion feature
and 68% corresponds to tagging by the recommendation feature. Thus, tags rec-
ommendations were important to rise the correctness percentage, however, this
difference is not significant according to the goodness of fit based on χ2 , sup-
posing a uniform distribution (χ2 = 0.03, p − value = 0.86283). Regarding the
spent time to tag a message, the use of the recommendation feature represents
a significantly faster way to select tags. In general the time difference was 9 sec.,
which could represent the difference between a good or bad user experience,
and consequently an important factor to adopt the tagging service and the KC
concept.
Regarding the perceived ease of use and the perceived usefulness, the Slack
tagging service obtains high values of the Likert-7 scale, even on quartile 25,
which means that participants consider the service very useful and usable. Also,
participants considered that tagging service could be integrated on the daily
work activities. Further, there are points to enhance regarding the operation to
tag a message, however, it heavily depends on the conditions that Slack states to
develop a complement. This situation could happen to any other UTEM, since
each of them will have those conditions for developing plugins to complement
its operation.
The obtained positive results about tagging correctness, tagging time, and
participants perception, led us to think that the semi-fixed tagging approach
supported by a tags recommendation feature, is an efficient (precise and fast)
and well perceived way to classify AK during UTEM interactions in an AGSE
environment. With these results, we can project that in a scenario where devel-
opers only use the recommendation feature to tag the UTEM relevant messages,
thus, the tagging correctness percentage could increase in general. In addition,
the presented results are promising for the AK condensation concept since the
AK retrieval could be improved, easing and speeding up the correct tagging,
reducing the cognitive load to AGSE developers during their daily work, and
easing the AK finding since the AK is going to be alienated to the classification
scheme, on which the tags are based. In this way, a developer could recover AK
from a past project, using the tags and meta-tags to do an initial exploration of
the project AK.
In relation to the precision of the recommendation feature, it could be consid-
ered a good score (P recision = 0.6363 without including the unexpected tagged
messages) taking into account the low amount of data (242 messages obtained
from our previous study [Borrego et al., 2019]) used to train the language models.
For instance, works based on YouTube [Parra et al., 2018] or Stackoverflow [Liu
et al., 2018, Zhou et al., 2019, Soliman et al., 2016] had an available a big corpus
(years of tagged posts and video tutorials) to train the recommendation models.
21
In particular, Parra et al. [Parra et al., 2018] reported up to 66% of the tagged
videos were considered relevant by developers. Thus, it could be interpreted that
they obtained 0.66 of precision, which is pretty close to our precision value. More-
over, considering our acceptable results and that we had a small corpus to train
the language models, we could say that our approach presents certain tolerance
to the ”cold start” problem, although we have to conduct more experiments to
determine its robustness.
It is worth to remark that 68% of the times that participants tagged in-
correctly, the correct option was the first element of the recommendation list,
but participants decided to choose another list element. This situation leads us
to improve the language models of the recommendation feature to explore a
mechanism of semi/auto-tagging. In this way, when a developer wants to tag
message, s/he could use this mechanism to assign a tag which will be decided
by a language model. Hence, we are projecting a mechanism in which a tag
is automatically assigned to a message, however, we do not think that a fully
auto-tagging mechanism is convenient since the relevance of the message to tag
is decided by developers, and not by an automatic mechanism.
All the presented results led us to think that a recommendation feature could
be considered as an important part to have a better AK classification, when
tagging is the approach to implement the classification mechanism which is part
of the AK condensation concept.
8.2 Implications about natural language models in AGSE
The TAM evaluation provides encouraging results, developers and students iden-
tified the tag recommendation component as useful (based on LMs and incor-
porated into the Slack tagging service) in software projects. The correctness
percentage obtained by the recommender component gives evidence that the
LMs captured the sentence structure written by the participant. For instance,
the word ”story” is identified by the language model with a probability of 0.75
and the phrase “user story” is 0.76, that is, the terms used by the participants
are very close to those used in the creation stage of the models.
However, for the terms that have a low probability, in some cases, they com-
plement the probability of the grammatical class. For example, the symbol ”?”
in the model has a probability of 0.05, but with its grammatical class the prob-
ability increases to 0.97. The probabilities of the term and the grammatical
class complement each other and give the tagging service robust performance,
reflecting in the correctness percentage. Those terms that are not part of the vo-
cabulary of the language model are softened by the function of the Kneser-Ney
as a smoothing method.
We addressed the above presented problem by training a model for each of
the categories due to the size of the corpus that was used to build them. However,
22
each model provides the system with a fine and specific analysis. One aspect that
we can identify to improve is to incorporate more elements in the training corpus
to be able to reach a higher percentage of correct ones. Increasing the number
of language models could leverage the operation of the Slack tagging service and
the tagging process in an AGSE environment.
9 Conclusion
In this paper we presented an implementation of an AK classification mecha-

nism (based on tagging) which is a key part of the AK condensation concept.
This implementation was the Slack tagging service, which allows to tag messages
based on a classification scheme obtained in a previous study. The objective of
this service was to help developers to remember the available tags during their
daily interactions, by auto-completing tags or by offering tags recommendations
through NLP techniques. Slack tagging service was evaluated by students and
professional developers in a controlled experiment, in the aim to compare both
tagging mechanisms in terms of correctness and time to select a tag. The re-
sults statistically showed that more than 64% of the Slack messages were tagged
correctly (i.e. tags were semantically related with the messages) using the tags
recommendation, taking into account that the participants were not aware of
the existing tags and the interaction context. Also, when participants used the
recommendation feature, they spent less time to select a tag (statistically signif-
icant too). Further, the Slack tagging service was perceived as highly useful and
usable, according to an instrument based on TAM.
The obtained results are remarkable, since the corpus volume to train the
language models was significantly smaller than other approaches reported in the
literature, which used sources as StackOverflow entries to train the recommen-
dation models. In addition, our approach is focused on AK fully dependent on
projects circumstances and conditions, such business rules clarifications, archi-
tectural decisions, and even deployment topics. Thus, we cannot use public Q&A
services, since these sources contains more general AK than the knowledge which
is shared through UTEM during a software project.
Furthermore, our results reinforce the AK condensation concept feasibility, as
well as that semi-fixed tagging could be an AK classification mechanism adequate
for the AGSE environment. Also, these results lead us to conduct a long term
study in an AGSE company to evaluate the concept adoption, and to evaluate
how AK condensation could affect metrics such as: time to finish tasks, interrup-
tions quantity, team efficiency and product quality. This long term study will help
us to increase the size of the training corpus for language models. Further, we
will explore new ways to implement AK classification mechanism, for instance:
using ontologies or deep learning techniques, and even new approaches to con-
23
dense AK on any agile software development environment, either co-located or

distributed.
10 Acknowledgements
The authors are grateful to the participating companies: EMCOR, Sahuaro Labs
and Tufesa, as well as the students of Instituto Tecnológico de Sonora and Uni-
versidad Tecnológica del Sur de Sonora, for the support provided to carry out
the present study, and for their willingness to continue working with us in future
projects.
References
[Al-kofahi et al., 2010] Al-kofahi, J. M., Tamrawi, A., Nguyen, T. T., Nguyen, H. A.,
and Nguyen, T. N. (2010). Fuzzy Set Approach for Automatic Tagging in Evolving
Software. In Software Maintenance (ICSM), 2010 IEEE International Conference on.
IEEE.
[Bagheri and Ensan, 2016] Bagheri, E. and Ensan, F. (2016). Semantic tagging and
linking of software engineering social content. Automated Software Engineering,
23(2):147–190.
[Bider and Söderberg, 2016] Bider, I. and Söderberg, O. (2016). Becoming Agile in
a Non-disruptive Way - Is It Possible? In Proceedings of the 18th International
Conference on Enterprise Information Systems, number April, pages 294–305.
[Borrego et al., 2016] Borrego, G., Moran, A. A. L., Palacio, R. R., Rodriguez, O.
M. O., Morán, A. L., Palacio, R. R., Rodrı́guez, O. M., Moran, A. A. L., Palacio,
R. R., and Rodriguez, O. M. O. (2016). Understanding architectural knowledge shar-
ing in AGSD teams: An empirical study. In 11th IEEE International Conference on
Global Software Engineering, volume 00, pages 109–118, Irvine, CA, USA. IEEE
Computer Society.
[Borrego et al., 2019] Borrego, G., Morán, A. L., Palacio, R. R., Vizcaı́no, A., and
Garcı́a, F. O. (2019). Towards a reduction in architectural knowledge vaporization
during agile global software development. Information and Software Technology.
[Bosch, 2004] Bosch, J. (2004). Software Architecture: The Next Step. In Oquendo,
F., Warboys, B., and Morrison, R., editors, EWSA, volume 3047 of Lecture Notes in
Computer Science, pages 194–199. Springer.
[Chatterjee et al., 2019] Chatterjee, P., Damevski, K., and Pollock, L. (2019). Ex-
ploratory Study of Slack Q & A Chats as a Mining Source for Software Engineering
Tools. In ICSE 2019, MSR 2019 MSR Technical Papers, Montreal, Canada.
[Clerc et al., 2011] Clerc, V., Lago, P., and Vliet, H. V. (2011). Architectural Knowl-
edge Management Practices in Agile Global Software Development. In Sixth
International Conf. on Global Software Engineering Workshop, pages 1–8. Ieee.
[Cockburn and Highsmith, 2001] Cockburn, A. and Highsmith, J. (2001). Agile Soft-
ware Development:The People Factor. Computer, 34(11):131–133.
[Dalkir, 2011] Dalkir, K. (2011). Knowledge Management in Theory and Practice.
The MIT Press, second edition.
[Davis, 1989] Davis, F. D. (1989). Perceived Usefulness, Perceived Ease of Use, and
User Acceptance of Information Technology. MIS Quarterly, 13(3):319–340.
[Estler et al., 2012] Estler, H. ., Nordio, M., Furia, C. A., Meyer, B., and Schneider,
J. (2012). Agile vs. structured distributed software development: A case study. In
2012 IEEE Seventh International Conference on Global Software Engineering, pages
11–20.
24
[González-López et al., 2018] González-López, S., Borrego, G., López-López, A., and
Morán, A. L. (2018). Clasificando conocimiento arquitectónico a través de técnicas
de minerı́a de texto. Komputer Sapiens, 1(Enero-Abril):29–33.
[Holmstrom et al., 2006] Holmstrom, H., Conchuir, E. O., Agerfalk, P. J., and Fitzger-
ald, B. (2006). Global Software Development Challenges: A Case Study on Tempo-
ral, Geographical and Socio-Cultural Distance. Global Software Engineering, 2006.
International Conference on, pages 3–11.
[Holz et al., 2003] Holz, H., Melnik, G., and Schaaf, M. (2003). Knowledge man-
agement for distributed agile processes: Models, techniques, and infrastructure. In
Enabling Technologies: Infrastructure for Collaborative Enterprises, pages 291 – 294.
IEEE.
[Jovanovic et al., 2014] Jovanovic, J., Bagheri, E., Cuzzola, J., Gasevic, D., Jeremic,
Z., and Bashash, R. (2014). Automated semantic tagging of textual content. IT
Professional, 16(6):38–46.
[Kruchten, 2009] Kruchten, P. (2009). Documentation of Software Architecture from a
Knowledge Management Perspective – Design Representation, pages 39–57. Springer
Berlin Heidelberg, Berlin, Heidelberg.
[Lin et al., 2016] Lin, B., Zagalsky, A. E., Storey, M.-A., and Serebrenik, A. (2016).
Why Developers Are Slacking Off: Understanding How Software Teams Use Slack.
In Conference on Computer Supported Cooperative Work, pages 333–336, San Fran-
cisco, CA, USA. ACM Press.
[Liu et al., 2018] Liu, J., Zhou, P., Yang, Z., Liu, X., and Grundy, J. (2018). FastTa-
gRec: fast tag recommendation for software information sites. Automated Software
Engineering, 25(4):675–701.
[Martins et al., 2016] Martins, E. F., Belém, F. M., Almeida, J. M., and Gonçalves,
M. A. (2016). On cold start for associative tag recommendation. Journal of the
Association for Information Science and Technology, 67(1):83–105.
[Mirakhorli et al., 2015] Mirakhorli, M., Chen, H. M., and Kazman, R. (2015). Min-
ing Big Data for Detecting, Extracting and Recommending Architectural Design
Concepts. In 1st International Workshop on Big Data Software Engineering, pages
15–18.
[Mushtaq et al., 2018] Mushtaq, H., Malik, B. H., Shah, S. A., Siddique, U. B.,
Shahzad, M., and Siddique, I. (2018). Implicit and explicit knowledge mining of
Crowdsourced communities: Architectural and technology verdicts. International J.
of Advanced Computer Science and Applications, 9(1):105–111.
[Musil et al., 2017] Musil, J., Ekaputra, F. J., Sabou, M., Ionescu, T., Schall, D., Musil,
A., and Biffl, S. (2017). Continuous Architectural Knowledge Integration: Making
Heterogeneous Architectural Knowledge Available in Large-Scale Organizations. In
IEEE International Conference on Software Architecture, pages 189–192.
[Nagwani et al., 2017] Nagwani, N. K., Singh, A. K., and Pandey, S. (2017). TAGme:
A topical folksonomy based collaborative filtering for tag recommendation in com-
munity sites. ACM International Conference Proceeding Series, Part F1296.
[Parra et al., 2018] Parra, E., Escobar-Avila, J., and Haiduc, S. (2018). Automatic
tag recommendation for software development video tutorials. Proceedings -
International Conference on Software Engineering, pages 222–232.
[Ryan and O’Connor, 2013] Ryan, S. and O’Connor, R. V. (2013). Acquiring and
sharing tacit knowledge in software development teams: An empirical study.
Information and Software Technology, 55(9):1614–1624.
[Shahbazian et al., 2018] Shahbazian, A., Kyu Lee, Y., Le, D., Brun, Y., and Med-
vidovic, N. (2018). Recovering Architectural Design Decisions. 15th International
Conference on Software Architecture, pages 95–104.
[Sobernig and Zdun, 2016] Sobernig, S. and Zdun, U. (2016). Distilling architectural
design decisions and their relationships using frequent item-sets. 13th Working
IEEE/IFIP Conference on Software Architecture, pages 61–70.
25
[Soliman et al., 2016] Soliman, M., Galster, M., Salama, A. R., and Riebisch, M.
(2016). Architectural knowledge for technology decisions in developer communities:
An exploratory study with StackOverflow. In 13th Working IEEE/IFIP Conference
on Software Architecture, pages 128–133.
[Souza et al., 2019] Souza, E., Moreira, A., and Goulão, M. (2019). Deriving ar-
chitectural models from requirements specifications: A systematic mapping study.
Information and Software Technology, 109:26–39.
[Storey et al., 2009] Storey, M. A., Ryall, J., Singer, J., Myers, D., Cheng, L. T., and
Muller, M. (2009). How software developers use tagging to support reminding and
refinding. IEEE Transactions on Software Engineering, 35(4):470–483.
[Tom et al., 2013] Tom, E., Aurum, A., and Vidgen, R. (2013). An exploration of
technical debt. Journal of Systems and Software, 86(6):1498–1516.
[Treude and Storey, 2012] Treude, C. and Storey, M. A. (2012). Work item tagging:
Communicating concerns in collaborative software development. IEEE Transactions
on Software Engineering, 38(1):19–34.
[Uikey et al., 2011] Uikey, N., Suman, U., and Ramani, A. (2011). A Documented
Approach in Agile Software Development. International Journal of Software . . . ,
2(2):13–22.
[Vliet, 2008] Vliet, H. V. (2008). Software architecture knowledge management. In
Software Engineering, 2008. ASWEC 2008. 19th, pages 24–31. IEEE.
[Wang et al., 2014] Wang, T., Wang, H., Yin, G., Ling, C. X., Li, X., and Zou, P.
(2014). Tag recommendation for open source software. Frontiers of Computer
Science, 8(1):69–82.
[White et al., 2017] White, K., Grierson, H., and Wodehouse, A. (2017). Using Slack
for Asynchronous Communication in a Global Design Project. In International Conf.
on Engineering and Product Design Education, pages 346–351, Oslo, Norway.
[Wohlin et al., 2012] Wohlin, C., Runeson, P., Hst, M., Ohlsson, M. C., Regnell, B.,
and Wessln, A. (2012). Experimentation in Software Engineering. Springer Publish-
ing Company, Incorporated.
[Yanzer Cabral et al., 2014] Yanzer Cabral, A. R., Ribeiro, M. B., and Noll, R. P.
(2014). Knowledge Management in Agile Software Projects: A Systematic Review.
Journal of Information & Knowledge Management (JIKM), 13(01):1450010.
[Zhou et al., 2019] Zhou, P., Liu, J., Liu, X., Yang, Z., and Grundy, J. (2019). Is
deep learning better than traditional approaches in tag recommendation for software
information sites? Information and Software Technology, 109(January 2018):1–13.

AutoTag Rec To Classify AK in AGSE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AutoTag Rec To Classify AK in AGSE

Uploaded by

Copyright:

Available Formats

1

Automatic Tags Recommendation to Classify

Abstract: Agile global software engineering (AGSE) presents challenges on architec-

the distributed facilities is a common action on these environments in the aim

be related to a set of meta-tags, which are semantic anchors to ease the AK

Any software project AK involves a set of decisions to justify the design of a

3 Slack tagging service

Slack is an application that facilitates communication and collaboration within

3.1 Accessible messages of the UTEM log files

3.2 UTEM log files classification mechanism

4 Language models development

As we mentioned before, we incorporated NLP techniques and statistical lan-

– N-grams-2 [bigrams]: the sequence of terms was determined after performing

– We used Kneser-Ney discounting (kndiscount) as a smoothing method.

4.1 Text Preprocessing

Tags and Metatags Elements by category

Table 1: Seven categories and 291 interactions for training

5.2.1 Context and subjects selection

5.2.2 Study design, variables selection and hypotheses formulation

This study can be considered as a quasi-experiment with a within-subject design,

– H0Correctness . There is no significant difference in the number of messages

– H0T ime . There is no significant difference in time spent to tag a message

Below, we present the instruments developed in order to conduct this study as

– Messages gathering program. It is a program developed in Node.js which uses

– Extended TAM questionnaire. We prepared a questionnaire in Google Forms

– Introduction (duration 10 minutes). The participants were given an overall

– Interacting through Slack (duration 25 minutes). The participants used Slack

– Finalization (duration 3 minutes). When the participants had finished chat-

5.3.3 Data collection and data validation

6.1 Tagging correctness

0.04731). Thus, we applied the Wilcoxon signed-rank test (α = 0.05) to deter-

6.2 Time spent to tag messages

6.3 Additional observations

Figure 6: Position of the correct option on the recommendation feature when

6.4 Qualitative results

4, Students perception on each aspect, values of 1), which could be related to

8.1 Implications about the AK condensation in AGSE

We observed that participants tagged in correct manner more frequently when

8.2 Implications about natural language models in AGSE

In this paper we presented an implementation of an AK classification mecha-

dense AK on any agile software development environment, either co-located or

You might also like