LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Orphaned Lines of Code: Analysing Code Left behind and Its Impact on Software Evolution
Daniel Izquierdo Cort´zar a
dizquierdo@gsyc.es GSyC/Libresoft, Universidad Rey Juan Carlos

SIG, Amsterdam January, 22th 2009

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code: Analysing Code Left behind and Its Imp

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

(cc) 2009 Daniel Izquierdo Cort´zar. a Some rights reserved. This document is distributed under the Creative Commons Attribution-ShareAlike 2.5 licence, available in http://creativecommons.org/licenses/by-sa/2.5/

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Table of contents
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Location

Located at Universidad Rey Juan Carlos, M´stoles, Madrid o Youngest university in Madrid (10 years old) Escuela T´cnica Superior de Ingenier´ de Telecomunicaciones e ıa GSyC (Grupo de Sistemas y Comunicaciones) http://libresoft.es

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

People

Jes´s M. Gonzalez-Barahona, head of the group u Gregorio Robles, assistant professor 32 people working in Libresoft. 2 professors 18 of them working on projects and research activities 9 of them with LibreSoft internships 3 of them with research (to do PhD) grants

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

LibreSoft

Academic activities: mining software repositories. Industrial activities (consultant, mobile networking, web 2.0) Reports related to libre software, like FLOSSImpact. Master on Free Software (Galicia and Madrid http://master.libresoft.es)

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

European Projects

QualOSS: Quality of Open Source Software FLOSSMetrics: Free/Libre/Open Source Software Metrics FLOSSWorld: Free/Libre/Open Source Software: Worldwide impact study FLOSSInclude: Free/Libre/Open Source Software, International Cooperation OSOR: The Open Source Observatory and Repository QualipSO: Trust and Quality in Open Source Systems

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Empirical Approach: Data Sources

Publicly available data sources Source Code Management (SCM) (CVS, SVN, Git, Mercurial, Bazaar,...) Mailing Lists, forum data Bug Tracking Systems Web site, wiki, documentation, IRC logs, ... Other places, FLOSSMole, FLOSSMetrics, ...

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Empirical Approach: Data Sources

We can directly analyse how a project/community evolves We can directly analyse how the developers social network is growing We can measure number of people working on it ...

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Empirical Approach: Data Sources

Empirical analysis provide a new point of view Traceability of results Traceability of tools We can not measure how Windows Vista evolved We can not measure how Microsoft people interact among them We can not even measure the size of Windows in number of lines

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Data Sources

CVSAnalY - CVS, SVN, Git. Mailing List Stats - mbox format Bicho - SourceForge bug tracking system SLOCCount (David Wheeler, not from Libresoft)

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

CVSAnalY: An Example

It extracts statistical information out of CVS, SVN and Git It transforms that data into a MySQL database Web site: http://tools.libresoft.es/cvsanaly Hosted in Morfeo forge: https://forge.morfeo-project. org/projects/libresoft-tools/ svn checkout https://svn.forge.morfeo-project.org/ svn/libresoft-tools/cvsanaly/ It provides plug-ins to create some graphics

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

CVSAnalY: CVS Log

... 1.246 1.246 1.246 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.1 1.137 1.1 ...

(pj (pj (pj (rms (rms (rms (rms (rms (rms (rms (jimb (rms (jimb

13-Nov-01): /* Optional arg STRING supplies menu name for the keymap 13-Nov-01): in case you use it as a menu with ‘x-popup-menu’. */) 13-Nov-01): (string) 11-Sep-92): Lisp Object string; 11-Sep-92): 11-Sep-92): Lisp Object tail; 11-Sep-92): if (!NILP (string)) 11-Sep-92): tail = Fcons (string, Qnil); 11-Sep-92): else 11-Sep-92): tail = Qnil; 06-May-91): return Fcons (Qkeymap, 13-May-97): Fcons (Fmake char table (Qkeymap, Qnil), tail)); 06-May-91):

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

CVSAnalY: Basic Metrics

Number of Commits Number of Commiters Committers with highest number of commits Number of files worked by commiter etc...

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

CVSAnalY: Graphics

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Developer Turnover

Developer turnover is a problem. There is a knowledge gap when senior developers leave a project New developers need some time to become familiar with existing source code

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Knowledge

Tacit knowledge: Not measurable Explicit knowledge: It remains in the company/project artifacts, such as code or documentation

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Knowledge Sharing

It is complicated to share tacit knowledge, but ... We can measure explicit knowledge: Source code management (CVS, SVN, Git, ...) Mailing lists Bug tracking systems

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Research Question

How can we measure the knowledge loss due to developer turnover? Approach to measure knowledge gap left by developers Is this knowledge gap a good indicator for the project’s health? Useful for managers, to determine risky zones We can measure the impact in number of lines of code without author present in the current team

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Previous Definitions

Committer: Developer with write access to the SCM system. Author: Original developer of a line of code Non-Active Committer: Committer with no activity since a given date Orphaned Line: Line of code whose author is a non-active committer

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Studying each period
Monthly analysis Snapshots from each month

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Detection of non-active committers
When does a committer leave the project? How many lines were authored by her? new orphaned lines

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Case Studies: GIMP and Evolution

GIMP (GNU Image Manipulation Program) is a graphics editor

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Case Studies: GIMP and Evolution

Evolution combines e-mail, calendar, address book and task list managements functions

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Case Studies: GIMP and Evolution

Both started at the same time (around 1998. At least using the Source Code Management system) Both are included by default in the GNOME desktop (using the same process and release cycle)

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Orphaned Lines vs Size
Size and Orphaned Lines Evolution
1200000

1000000

Number of Lines

800000

600000

Ev. Total Lines Ev. Orph. Lines GIMP Total Lines GIMP Orph. Lines

400000

200000

0 1998-02-28

2000-02-28

2002-03-28

2004-03-28

2006-03-28

Time

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Orphaned Lines Evolution
Orphaned Lines Evolution. Evolution and GIMP.
300000

Number of Lines

200000 Evolution 03 GIMP 03 Evolution 05 GIMP 05 Evolution 07 GIMP 07 100000

0 2001-05-28

2002-09-28

2003-12-28

2005-03-28

2006-06-28

2007-09-28

Time

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Orphaned Lines Evolution

Three main jumps Core group of developers turnover is abrupt Evolution of orphaned lines (not taking into account big deletions) is almost stable, just a small decrease Nowadays, the number of orphaned lines reach 80%!

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Orphaned Lines GIMP

Just a big jump Core group of developers remain stable during the whole project Evolution of orphaned lines continuously decrease Nowadays, the number of orphaned lines reach 13%!

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

What can tell us Orphaning?

We do not really know... but we have some intuitions: Maintenance problems could appear: try to quickly fix a defect in areas which you unknown Code decay and aging : Orphaned areas in Evolution have not been modified for a long time

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Matching Orphaning and Productivity

Work in Progress: Hypothesis: High orphaning is correlated with low productivity

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Matching Orphaning and Productivity:Evolution
Productivity Evolution Project
2000

1500

Modified FIles

1000

500

0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Time
400000

300000

200000

100000

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Matching Orphaning and Productivity:GIMP
Productivity GIMP Project
3000 2500

Modified Files

2000

1500

1000

500

0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Time
500000

400000

300000

200000

100000

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Conclusions

Evolution: New set of developers = Increase of the productivity (Surprising for us!) Evolution: Old core group tended to disappear = big jumps in orphaning and productivity GIMP: Risky situation, if some of the core group developers disappear

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Conclusions

Evolution: company-driven project. People have to deal with that code, they are paid for that job. GIMP: community-driven project. People just choose, perhaps they prefer to start from scratch.

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Productivity

Detection of files with high levels of orphaning Measure how they evolve New Hypothesis: High levels of orphaning (in a file) is correlated with low levels of productivity (in that file)

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Enrich the approach

More projects, more data. Working on Apache 1.3, GTK++, Wireshark and some others.

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Index
1 2 3 4 5 6 7

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Slides based on the paper: ”Using Software Archaeology to Measure Knowledge Loss in Software Projects due to Developer Turnover ”. Daniel Izquierdo-Cortazar, Gregorio Robles, Felipe Ortega and Jesus M. Gonzalez-Barahona. GSyC/Libresoft, Universidad Rey Juan Carlos (Madrid, Spain). dizquierdo, grex, jfelipe, jgb@gsyc.es

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

LibreSoft: General Overview Libresoft Tools Introduction to Orphaning Approach Results and Work in Progress Further Work Bibliography

Questions?

Thanks for your attendance! Questions?

Daniel Izquierdo Cort´zar a

Orphaned Lines of Code

Sign up to vote on this title
UsefulNot useful