Professional Documents
Culture Documents
Is Your
tant in an open source environment
because of the higher likelihood of
team change—contribution to open
Development
source projects is purely voluntary.
In this article, we offer an auto-
matic approach to evaluate team
robustness based on social network
Our Approach
Our approach to evaluating develop-
ment team robustness is composed
of the following three parts.
J A N U A R Y/ F E B R U A R Y 2 0 1 8 | I E E E S O F T WA R E 65
FOCUS: ACTIONABLE ANALYTICS
(a) (b)
(c) (d)
(e) (f)
FIGURE 1. Development team hierarchy. (a) Hadoop. (b) Cassandra. (c) CXF. (d) PDFBox. (e) Camel. (f) HBase.
66 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
Collaboration Hierarchy Visualization
Figure 1 shows the TLCH calculated Table 1. Subject projects.
from each project. (For the sake of
readability, each graph only shows Length of history—mm/yy
Subject (no. of mos.) No. of developers No. of emails
up to the top 20 most contributing
developers from each project.) In this Cassandra 09/09 to 12/16 (87) 88 1,479
article, the TLCH for each project is
Hadoop 08/09 to 12/16 (88) 64 18,675
calculated as a static view covering
the selected period shown in Table I. PDFBox 08/09 to 02/17 (90) 16 12,105
Applying the TLCH approach, all,
CXF 12/07 to 02/17 (110) 43 3,971
except Hadoop, contain the two
hierarchical layers. HBase 12/09 to 02/17 (86) 63 19,613
Each calculated TLCH contains
Camel 07/08 to 12/16 (101) 42 17,281
two (CXF and PDFBox) to four
(Cassandra) developers in the inner
layer. These few developers have a TLCH are likely to be the point of code contribution percentage made
significant amount of collaboration risk for team robustness, we cal- by each core developer (if any core
among themselves and usually (ex- culate the CoreIL associated with developers exist) to the project
cept for PDFBox) are highly con- these developers. Inner-layer devel- code base. If core members contrib-
nected with other developers in the opers are highlighted as the thicker ute the majority of changes to the
team. Thus, they play critical roles dots in Figures 2b to 2f. The CoreIL code base, these core developers are
in maintaining the stability of team in the five projects with a hierarchi- indeed the point of risk for team ro-
collaboration due to their significant cal structure is between 0.346 (CXF bustness. In other words, when those
share of knowledge of the system. in Figure 2c) and 0.473 (PDFBox in core developers become unavailable,
If they leave, the team is subject to Figure 2d). This indicates that if any daily code revisions will be signifi-
severe information loss. Therefore, of the inner-layer developers become cantly disrupted. In contrast, if con-
these inner-layer developers are the unavailable, around 34.6% to 47.3% tributions by core members are less
point of risk for the robustness of of the project information might be significant, the project’s daily opera-
each team. lost. In particular, PDFBox has the tions would not be affected much.
In comparison, Hadoop shows a least-robust team according to the Table 2 lists the evaluation results.
flat team structure. This is because data, whereas the CoreIL reaches The second column shows the total
the Hadoop team does not meet the 0.473. And, if the two developers in number of code revisions submitted
criteria of the TLCH algorithm. In- the inner layer leave, the total infor- by the entire development team of
tuitively, this implies that the collab- mation loss is more than 60%. This a project during the studied period.
oration among developers is evenly will cause significant disruptions to Columns 3 to 6 list the contribution
distributed among all the team mem- the daily operations of the team. percentage of each inner-layer mem-
bers. We believe that a flat structure In comparison, since Hadoop ber in a project. The core members
is more robust than the hierarchy has a flat team structure as shown are in the inner layer of the calcu-
structure because no matter who in Figure 1a, we calculate the aver- lated TLCH as shown in Figure 1.
leaves the team, only a small propor- age IL dev of the top four develop- The last column shows the total con-
tion of the project knowledge will be ers, which is only 0.187. This implies tribution percentage of all the inner-
lost. Hence, the whole team is more much more affordable information layer members. Since Hadoop has a
resilient to risk. loss from an individual develop- flat structure, the last row in Table 2
er’s absence, compared to the other just lists the maximal, average, and
Information Loss projects. standard deviation of the individual
Figure 2 shows the information loss developer’s contributions.
trend along with the top four de- Evaluation We can make the following ob-
velopers in each project. Since the We mined the revision history of servations from Table 2. First, the
developers in the inner layer of the each project to calculate the actual few inner-layer members (up to four
J A N U A R Y/ F E B R U A R Y 2 0 1 8 | I E E E S O F T WA R E 67
FOCUS: ACTIONABLE ANALYTICS
100 100
Average IL, 0.187 Core member IL, 0.464
80 80
60 60
40 40
20 20
0 0
0 1 2 3 4 5 0 1 2 3 4 5
(a) Top developers (b) Top developers
100 100
Core member IL, 0.346 Core member IL, 0.473
80 80
Information loss (%)
Information loss (%)
60 60
40 40
20 20
0 0
0 1 2 3 4 5 0 1 2 3 4 5
(c) Top developers (d) Top developers
100 100
Core member IL, 0.435 Core member IL, 0.395
80 80
Information loss (%)
60 60
40 40
20 20
0 0
0 1 2 3 4 5 0 1 2 3 4 5
(e) Top developers (f) Top developers
FIGURE 2. Team information loss (IL) with absent developers. (a) Hadoop. (b) Cassandra. (c) CXF. (d) PDFBox. (e) Camel. (f) HBase.
developers) together make a signifi- be disrupted significantly—who can Second, in comparison, in Hadoop,
cant contribution—from 49% (Cas- replace them and make the large the individual developer usually makes
sandra) to 61% (PDFBox)—to the percentage of revisions? In particu- a relatively trivial percentage of re-
code base of each project. This im- lar, the top core member alone con- visions. The maximal individual
plies that when these few developers tributes 26% (Cassandra) to 49% contribution is 5%, and the indi-
become unavailable, the daily code (PDFBox) of the revisions to the vidual contribution average is only
revision of the projects will likely entire code base. 0.69%, with a standard deviation
68 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
of 1%. This implies that Hadoop
indeed has the most robust team Table 2. Core developer contributions.
structure.
No of. Core 1 Core 2 Core 3 Core 4
In summary, the data show that Project revisions (%) (%) (%) (%) Total (%)
the TLCH of the developing team
and the information loss associated Cassandra 23,169 26 12 9 2 49
with the top four developers cal- CXF 13,393 30 28 N/A N/A 58
culated by our approach can faith-
fully reflect the actual code base PDFBox 6,204 49 12 N/A N/A 61
contributions of the core develop- Camel 29,166 44 14 2 N/A 60
ers. Thus, our approach can provide
HBase 13,539 36 12 6 N/A 54
useful insights for evaluating team
robustness. Hadoop 16,196 Max. deviation 5 5%, avg. deviation 5 0.69%, and N/A
standard deviation 5 1%
Limitations and
Future Work
In this article, we analyzed email data
to construct developers’ collabora- our approach to visualize and httpd. 5 They found that the mes-
tion links. Developers’ collaboration monitor the dynamics of the sages sent by an individual and the
can take other forms—for example, collaboration hierarchy over number of source changes that in-
bug-tracking systems, shared code time, which can provide in- dividual makes have a Spearman’s
ownership, and so on. We acknowl- sights in analyzing a potential rank correlation of about 0.8. This is
edge the limitation of considering increase or decrease of team consistent with our evaluation results:
only email exchanges. However, our robustness. the top few inner-layer developers in
proposed approach can be general- • Although the TLCH algorithm each project contribute a majority
ized to collaboration links extracted reveals meaningful team struc- of the code revisions. Chris Jensen
from other data sources. ture for the six projects with and Walt Scacchi proposed an “on-
When calculating the TLCH, 16 to 88 developers, we ac- ion” diagram to represent the dif-
the inner layer is distinguished knowledge that it may not work ferent roles of open source software
from the outer layer if the weight properly for an ultra-large-scale developers: active users, develop-
of a node is one standard deviation development team with more ers, project managers, community
above the mean. “One standard de- than hundreds of developers. managers, core developers, passive
viation” is not the only feasible ap- But, based on similar rationale, users, and observers.7 Andrew Meneely
proach. Users are suggested to tune we plan to extend the TLCH and Laurie Williams found that de-
this value depending on the project to a Multilayer Collaboration veloper social network measures,
circumstances. Hierarchy to describe the team such as the edges, social distance,
We plan to address the following structure. and network centrality, are consis-
in our future work: tent with developers’ perception of
Related Work their actual collaborations.8 Gustavo
• Here, we apply our approach to In the past decades, numerous stud- Oliva and his colleagues reported
the six Apache open source proj- ies examined the social structure that only 25% of the developers in
ects. We plan to evaluate and of software development teams us- a project may be considered as key
apply this approach on a broader ing different methods for different developers, who are often active in
spectrum of projects with more goals. 5–10 This section compares the mailing list and fulfill the coor-
diverse characteristics. this article with the most relevant dination requirements.9 Recently,
• Currently, the TLCH is con- prior work. Mitchell Joblin and his colleagues
structed based on data for a Christian Bird and his colleagues reported that network metrics are a
selected period of time and mined the social network from better data source to capture the core
thus is static. We plan to apply the public email archive of Apache developers in a project, compared
J A N U A R Y/ F E B R U A R Y 2 0 1 8 | I E E E S O F T WA R E 69
FOCUS: ACTIONABLE ANALYTICS
T
gration and Advancement Processes
to count-based metrics such as he study results suggest that in OSSD Projects: A Comparative
churn (changed lines of code) and our approach can effectively Case Study,” Proc. 29th Int’l Conf.
commits.10 help people intuitively un- Software Eng. (ICSE 07), 2007, pp.
Compared to the work we just derstand and quantitatively evalu- 364–374; dx.doi.org/10.1109/ICSE
mentioned, the uniqueness of this ate the robustness of a development .2007.74.
article is that it leverages the analy- team. Even though our approach 8. A. Meneely and L. Williams, “Socio-
sis of the developer social network has so far been applied only on open technical Developer Networks:
for a new angle: analyzing team source projects, it is directly appli- Should We Trust Our Measurements?,”
robustness. cable to commercial projects. Project Proc. 33rd Int’l Conf. Software Eng.
70 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
(ICSE 11), 2011, pp. 281–290; and Technology (CRIWG 12), 39th Int’l Conf. Software Eng. (ICSE
doi.acm.org/10.1145/1985793 2012, pp. 97–112; dx.doi. 17), 2017, pp. 164–174.
.1985832. org/10.1007/978-3-642-33284-5_8.
9. G.A. Oliva et al., “Character- 10. M. Joblin et al., “Classifying Develop- Read your subscriptions
izing Key Developers: A Case ers into Core and Peripheral: An Em- through the myCS
publications portal at
Study with Apache Ant,” Proc. pirical Study on Count and Network
18th Int’l Conf. Collaboration Metrics,” Proc. 2017 IEEE/ACM
http://mycs.computer.org
Quality
November/December 2016
September/October 2016
January/February 2017
Assessment
July/August 2016
Defense
C G &A
and
Perception Applications
Quality Assessment and Perception in Computer Graphics
in Computer Graphics
Water, Sky, and the Human Element
Sports Data Visualization
Defense Applications
VOLUME 37 NUMBER 1
VOLUME 36 NUMBER 5
VOLUME 36 NUMBER 4
VOLUME 36 NUMBER 6
www.computer.org/cga
IEEE Computer Graphics and Applications bridges the theory
and practice of computer graphics. Subscribe to CG&A and
• stay current on the latest tools and applications and gain
invaluable practical and research knowledge,
• discover cutting-edge applications and learn more about
the latest techniques, and
• benefit from CG&A’s active and connected editorial board.
J A N U A R Y/ F E B R U A R Y 2 0 1 8 | I E E E S O F T WA R E 71