You are on page 1of 145
Software Quality Observatory for Open Source Software Project Number: IST-2005-33331 D2 - Overview of the

Software Quality Observatory for Open Source Software

Project Number: IST-2005-33331

D2 - Overview of the state of the art

Deliverable Report

D2 - Overview of the state of the art Deliverable Report Work Package Number: 1 Work

Work Package Number: 1 Work Package Title:

Requirements Definition and Analysis

Deliverable Number:

2

Coordinator:

AUTH

Contributors: AUTH,AUEB,KDE

Due Date:

Delivery Date:

Availability:

Document Id:

22nd January 2007 22nd January 2007 Restricted

SQO-OSS_D_2

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Executive Summary

Chapter one presents the most important and widely used metrics in software engi- neering for quality evaluation. The area of software engineering metrics is always under study; researchers continue to validate the metrics. The metrics presented were selected after studying software engineering literature, yielding only those metrics that are widely accepted. We must stress that we have not presented any models for evaluating quality, only metrics that can be used for quality evaluation. Quality evaluation models will be presented in the appropriate deliverable. The metrics presented are categorised according to an accepted taxonomy among researchers into three sections: process metrics, product metrics and resources met- rics. We have also included a section for metrics specific for Open Source software development. The presentation of the metrics is brief, allowing for a straightforward application and tool development. We have included both metrics that are consid- ered classic (e.g. program length and McCabe’s cyclomatic complexity) and modern metrics (e.g. the Chidamber and Kemerer metrics suite and object oriented design heuristics). While we present some metrics for Open Source software development, this topic will be presented at length elsewhere. Chapter two presents tools for acquiring metrics presented in chapter one. The tools presented are both Open Source and proprietary. There are a lot of metrics tools available and we tried to present a representative sample of them. Specifically we present those tools that are going to be useful for our own system and there is a potential to include them in our system (especially the Open Source ones). We tried to install and test each tool ourselves. For each tool we present its functionality and include also some screenshots of it. Although we tried to include all possible tools that might be helpful to our project, future work will accomodate such tools as become available. Chapter three introduces empirical Open Source Software studies from many viewpoints. The first part details the historical perspectives of the evolution of five popular Open Source Software systems (Linux, Apache, Mozilla, GNOME, and the FreeBSD). This is followed by horizontal studies in which researchers examining sev- eral projects collectively. A model for the simulation of the evolution of Open Source Software projects and results from early studies is also presented. The evolution of Open Source Software projects is directly linked with the evolution of the code and communities around the project. Thus, the forth viewpoint in this chapter considers code quality studies of Open Source Software by applying evolution laws of Open Source software development to study how code evolves and how this evolution af- fects the quality of the software. The chapter concludes with community studies in mailing lists, in which a research methodology for the extraction and analysis of community activities in mailing lists is proposed. Chapter four introduces the concept of data mining and its significance in the

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

context of software engineering. A large amount of data is produced in software de- velopment that software organizations collect in hope of better understanding their processes and products. Specifically, the data in software development can refer to versions of programs, execution traces, error or bug reports and Open Source packages. As well, mailing lists, discussion forums and newsletters could provide useful information about software. This data is widely believed that hides signif- icant knowledge about software projects’ performance and quality. Data mining provides the techniques (clustering, classification and association rules) to analyze and extract novel, interesting patterns from software engineering databases. In this chapter we review the data mining approaches that have currently been proposed, aiming to assist with some of the main software engineering tasks. Since software engineering repositories consists of text documents (e.g. mailing lists, bug reports, execution logs), the mining of textual artifacts is requisite for many important activities in software engineering: tracing of requirements, retrieval of components from a repository, identification and prediction of software failures, etc. We present the state-of-the-art of the text mining techniques applied in software engineering, providing also a comparative study for them. We conclude by briefly discussing further work directions of Data/Text Mining in software engineering.

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Document Information

Deliverable Number: 2 Due Date:

Deliverable Date:

22nd January 2007 22nd January 2007

Approvals

 

Name

 

Organisation

Date

Coordinator

Georgios

 

AUEB/SENSE

10/09/2006

Gousios

Technical Coordinator

Ioannis

Samo-

AUTH/PLaSE

WP leader

ladas Ioannis Antoni- ades

AUTH/PLaSE

Quality Reviewer 1 Quality Reviewer 2 Quality Reviewer 3

Revisions

 

Revision

Date

Modification

 

Authors

0.1

05/10/2006

Initial version

 

AUTH

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Contents

1 Software Metrics and Measurement

 

7

1.1 Software Metrics Taxonomy

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

7

1.2 Process Metrics

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8

1.2.1 Structure Metrics

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

13

1.2.2 Design Metrics

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

17

1.2.3 Product Quality Metrics

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

19

1.3 Productivity Metrics

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

22

1.4 Open Source Development Metrics

 

22

1.5 Software Metrics Validation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

24

1.5.1 Validation of prediction measurement

 

25

1.5.2 Validation of measures

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

26

2 Tools

27

2.1 Process Analysis Tools

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

27

2.1.1 CVSAnalY

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

28

2.1.2 GlueTheos

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

30

2.1.3 MailingListStats

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

32

2.2 Metrics Collection Tools

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

ckjm

2.2.1 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

2.2.2 The Byte Code Metric Library

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

2.2.3 C and C ++ Code Counter

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

2.2.4 Software Metrics Plug-In for the Eclipse IDE

.

.

.

.

.

.

.

.

.

.

.

.

34

2.3 Static Analysis Tools

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

37

2.3.1 FindBugs

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

37

2.3.2 PMD

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

2.3.3 QJ-Pro .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

2.3.4

Bugle

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

2.4 Hybrid Tools .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

2.4.1 The Empirical Project Monitor

 

39

2.4.2 HackyStat

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

39

2.4.3 QSOS

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

39

2.5 Commercial Metrics Tools

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

39

2.6 Process metrics tools

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

39

2.6.1 MetriFlame

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

40

2.6.2 Estimate Professional

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

41

2.6.3 CostXpert

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

42

2.6.4 ProjectConsole

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

43

2.6.5 CA-Estimacs

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

44

2.6.6 Discussion

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

45

2.7 Product metrics tools

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

46

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

2.7.1 CT C ++ -CMT++-CTB

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

47

2.7.2 Cantata++

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

47

2.7.3 TAU/Logiscope

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

49

2.7.4 McCabe IQ

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

50

2.7.5 Rational Functional Tester (RFT)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

52

2.7.6 Safire

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

53

2.7.7 Metrics 4C

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

54

2.7.8 Resource Standard Metrics .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

55

2.7.9 Discussion

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

56

3 Empirical OSS Studies

 

57

3.1 Evolutionary Studies

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

57

3.1.1 Historical Perspectives

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

57

3.1.2 Linux

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

58

3.1.3 Apache

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

64

3.1.4 Mozilla

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

66

3.1.5 GNOME

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

67

3.1.6 FreeBSD

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

68

3.1.7 Other Studies

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

69

3.1.8 Simulation of the temporal evolution of OSS projects

 

72

3.2 Code Quality Studies

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

78

3.3 F/OSS Community Studies in Mailing Lists

 

84

3.3.1 Introduction:

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

84

3.3.2 Mailing Lists

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

84

3.3.3 Studying Community Participation in Mailing Lists: Research method-

 

ology

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

85

4 Data Mining in Software Engineering

 

88

4.1 Introduction to Data Mining and Knowledge Discovery

 

.

.

.

.

.

.

.

.

.

.

88

4.1.1

Data Mining Process

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

88

4.2 Data mining application in software engineering: Overview

 

89

4.2.1 Using Data mining in software maintenance

 

90

4.2.2 A Data Mining approach to automated software testing

 

102

4.3 Text Mining and Software Engineering

 

105

4.3.1 Text Mining - The State of the Art

106

4.3.2 Text Mining Approaches in Software Engineering

 

108

4.4 Future Directions of Data/Text Mining Applications in Software Engi-

neering

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

111

5 Related IST Projects

 

113

5.1 CALIBRE .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

113

5.2 EDOS .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

117

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

5.3 FLOSSMETRICS

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

120

5.4 FLOSSWORLD

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

122

5.5 PYPY

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

125

5.6 QUALIPSO

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

125

5.7 QUALOSS

 

.

.