Software Quality Observatory for Open Source Software

Project Number: IST-2005-33331

D2 - Overview of the state of the art
Deliverable Report

Work Package Number: Work Package Title: Deliverable Number: Coordinator: Contributors: Due Date: Delivery Date: Availability: Document Id:

1 Requirements Definition and Analysis 2 AUTH AUTH,AUEB,KDE 22nd January 2007 22nd January 2007 Restricted SQO-OSS_D_2

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Executive Summary
Chapter one presents the most important and widely used metrics in software engineering for quality evaluation. The area of software engineering metrics is always under study; researchers continue to validate the metrics. The metrics presented were selected after studying software engineering literature, yielding only those metrics that are widely accepted. We must stress that we have not presented any models for evaluating quality, only metrics that can be used for quality evaluation. Quality evaluation models will be presented in the appropriate deliverable. The metrics presented are categorised according to an accepted taxonomy among researchers into three sections: process metrics, product metrics and resources metrics. We have also included a section for metrics specific for Open Source software development. The presentation of the metrics is brief, allowing for a straightforward application and tool development. We have included both metrics that are considered classic (e.g. program length and McCabe’s cyclomatic complexity) and modern metrics (e.g. the Chidamber and Kemerer metrics suite and object oriented design heuristics). While we present some metrics for Open Source software development, this topic will be presented at length elsewhere. Chapter two presents tools for acquiring metrics presented in chapter one. The tools presented are both Open Source and proprietary. There are a lot of metrics tools available and we tried to present a representative sample of them. Specifically we present those tools that are going to be useful for our own system and there is a potential to include them in our system (especially the Open Source ones). We tried to install and test each tool ourselves. For each tool we present its functionality and include also some screenshots of it. Although we tried to include all possible tools that might be helpful to our project, future work will accomodate such tools as become available. Chapter three introduces empirical Open Source Software studies from many viewpoints. The first part details the historical perspectives of the evolution of five popular Open Source Software systems (Linux, Apache, Mozilla, GNOME, and the FreeBSD). This is followed by horizontal studies in which researchers examining several projects collectively. A model for the simulation of the evolution of Open Source Software projects and results from early studies is also presented. The evolution of Open Source Software projects is directly linked with the evolution of the code and communities around the project. Thus, the forth viewpoint in this chapter considers code quality studies of Open Source Software by applying evolution laws of Open Source software development to study how code evolves and how this evolution affects the quality of the software. The chapter concludes with community studies in mailing lists, in which a research methodology for the extraction and analysis of community activities in mailing lists is proposed. Chapter four introduces the concept of data mining and its significance in the

Revision: final

1

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

context of software engineering. A large amount of data is produced in software development that software organizations collect in hope of better understanding their processes and products. Specifically, the data in software development can refer to versions of programs, execution traces, error or bug reports and Open Source packages. As well, mailing lists, discussion forums and newsletters could provide useful information about software. This data is widely believed that hides significant knowledge about software projects’ performance and quality. Data mining provides the techniques (clustering, classification and association rules) to analyze and extract novel, interesting patterns from software engineering databases. In this chapter we review the data mining approaches that have currently been proposed, aiming to assist with some of the main software engineering tasks. Since software engineering repositories consists of text documents (e.g. mailing lists, bug reports, execution logs), the mining of textual artifacts is requisite for many important activities in software engineering: tracing of requirements, retrieval of components from a repository, identification and prediction of software failures, etc. We present the state-of-the-art of the text mining techniques applied in software engineering, providing also a comparative study for them. We conclude by briefly discussing further work directions of Data/Text Mining in software engineering.

Revision: final

2

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Document Information
Deliverable Number: Due Date: Deliverable Date: 2 22nd January 2007 22nd January 2007

Approvals
Name Coordinator Technical Coordinator WP leader Quality Reviewer 1 Quality Reviewer 2 Quality Reviewer 3 Georgios Gousios Ioannis Samoladas Ioannis Antoniades Organisation AUEB/SENSE AUTH/PLaSE AUTH/PLaSE Date 10/09/2006

Revisions
Revision Date 0.1 05/10/2006 Modification Initial version Authors AUTH

Revision: final

3

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Contents
1 Software Metrics and Measurement 1.1 Software Metrics Taxonomy . . . . . . . . . 1.2 Process Metrics . . . . . . . . . . . . . . . . 1.2.1 Structure Metrics . . . . . . . . . . . 1.2.2 Design Metrics . . . . . . . . . . . . . 1.2.3 Product Quality Metrics . . . . . . . . 1.3 Productivity Metrics . . . . . . . . . . . . . 1.4 Open Source Development Metrics . . . . . 1.5 Software Metrics Validation . . . . . . . . . 1.5.1 Validation of prediction measurement 1.5.2 Validation of measures . . . . . . . . 2 Tools 2.1 Process Analysis Tools . . . . . . . . . 2.1.1 CVSAnalY . . . . . . . . . . . . . 2.1.2 GlueTheos . . . . . . . . . . . . 2.1.3 MailingListStats . . . . . . . . . 2.2 Metrics Collection Tools . . . . . . . . 2.2.1 ckjm . . . . . . . . . . . . . . . . 2.2.2 The Byte Code Metric Library . 2.2.3 C and C++ Code Counter . . . . 2.2.4 Software Metrics Plug-In for the 2.3 Static Analysis Tools . . . . . . . . . . 2.3.1 FindBugs . . . . . . . . . . . . . 2.3.2 PMD . . . . . . . . . . . . . . . . 2.3.3 QJ-Pro . . . . . . . . . . . . . . . 2.3.4 Bugle . . . . . . . . . . . . . . . 2.4 Hybrid Tools . . . . . . . . . . . . . . . 2.4.1 The Empirical Project Monitor . 2.4.2 HackyStat . . . . . . . . . . . . 2.4.3 QSOS . . . . . . . . . . . . . . . 2.5 Commercial Metrics Tools . . . . . . . 2.6 Process metrics tools . . . . . . . . . . 2.6.1 MetriFlame . . . . . . . . . . . . 2.6.2 Estimate Professional . . . . . . 2.6.3 CostXpert . . . . . . . . . . . . . 2.6.4 ProjectConsole . . . . . . . . . . 2.6.5 CA-Estimacs . . . . . . . . . . . 2.6.6 Discussion . . . . . . . . . . . . 2.7 Product metrics tools . . . . . . . . . . Revision: final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 13 17 19 22 22 24 25 26 27 27 28 30 32 33 33 33 33 34 37 37 38 38 38 38 39 39 39 39 39 40 41 42 43 44 45 46 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eclipse IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

2.7.1 2.7.2 2.7.3 2.7.4 2.7.5 2.7.6 2.7.7 2.7.8 2.7.9

CT C++ -CMT++-CTB . . . . . . . Cantata++ . . . . . . . . . . . . . TAU/Logiscope . . . . . . . . . . . McCabe IQ . . . . . . . . . . . . . Rational Functional Tester (RFT) Safire . . . . . . . . . . . . . . . . Metrics 4C . . . . . . . . . . . . . Resource Standard Metrics . . . . Discussion . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

47 47 49 50 52 53 54 55 56

3 Empirical OSS Studies 57 3.1 Evolutionary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.1.1 Historical Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.1.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1.3 Apache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1.4 Mozilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.1.5 GNOME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.1.6 FreeBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.1.7 Other Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.1.8 Simulation of the temporal evolution of OSS projects . . . . . . . . 72 3.2 Code Quality Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3 F/OSS Community Studies in Mailing Lists . . . . . . . . . . . . . . . . . 84 3.3.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.2 Mailing Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.3 Studying Community Participation in Mailing Lists: Research methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4 Data Mining in Software Engineering 4.1 Introduction to Data Mining and Knowledge Discovery . . . . . . . . . 4.1.1 Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Data mining application in software engineering: Overview . . . . . . . 4.2.1 Using Data mining in software maintenance . . . . . . . . . . . . 4.2.2 A Data Mining approach to automated software testing . . . . . 4.3 Text Mining and Software Engineering . . . . . . . . . . . . . . . . . . . 4.3.1 Text Mining - The State of the Art . . . . . . . . . . . . . . . . . . 4.3.2 Text Mining Approaches in Software Engineering . . . . . . . . . 4.4 Future Directions of Data/Text Mining Applications in Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 88 88 89 90 102 105 106 108

. . . . . . . .

. 111

5 Related IST Projects 113 5.1 CALIBRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2 EDOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Revision: final

5

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

5.3 5.4 5.5 5.6 5.7 5.8 5.9

FLOSSMETRICS FLOSSWORLD . PYPY . . . . . . . QUALIPSO . . . . QUALOSS . . . . SELF . . . . . . . TOSSAD . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

120 122 125 125 128 129 131

Revision: final

6

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

1

Software Metrics and Measurement

As stated in the Description of Work, SQO-OSS aims to provide a a holistic approach on software assessment, initially targeted to open source software development. Specifically the main goals of the project are: 1. Evaluate the quality of Open Source software. 2. Evaluate the health of an Open Source software project. These two main goals will be delivered through a plug-in based quality assessment platform. In order to achieve these goals, the project’s consortium has to answer specific questions derive from those goals. Thus, for the goals presented the following have to be answered: 1. How can the quality of Open Source software be evaluated and improved? • How is quality evaluated? 2. How can the health of an Open Source software project be evaluate? • How is the health of a project evaluated? These questions can be answered if we examine and measure both the process of creating Open Source software and the product itself, i.e. the code. Both entities can be measured with the help of software metrics. This section presents software metrics and overview of how useful the metrics are for software evaluation.

1.1

Software Metrics Taxonomy

In this section we describe the various software metrics that exist in the area of software engineering and are going to be useful for our research. Furthermore, we refer to metrics specific to open source software development. These metrics are divided into categories. The chosen classification is widely used in the software metrics literature [FP97]. • Process metrics are metrics that refer to the software development activities and processes. Measuring defects per testing hour, time, number of people, etc. falls under this category. • Product metrics are metrics that refer to the the products of the software development process (e.g. code but also documents etc.). • Resources metrics are metrics that refer to any input to the development process (e.g. people and methods).

Revision: final

7

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Each one of these categories contains metrics that are further distinguished as either internal or external metrics. • Internal metrics of a product refers to the process or resource that can be purely measured by examining the product, process or the resource on its own. • External metrics of a product refers to the process or resource that can be measured only with respect to how the product, process or resource related to its environment (i.e. the behaviour). Apart from the formal categories presented, we shall also include some metrics derived directly from the Open Source development process. In the following sections the most important (to our own perspective) metrics shall be presented in each of the categories above. However, the metrics presented have been studied and used extensively in traditional closed source software development. In the end we present metrics for Open Source software that have appeared in the recent years, when researchers started studying Open Source software. Although these metrics can be classified according to the above taxonomy, we prefer to present them separately.

1.2

Process Metrics

Defect Density: One of the most widely accepted metrics for software quality is Defect Density. This metric is expressed as the number of defects found per certain amount of software. This amount is usually counted as the number of lines of code of the delivered product (specific metrics regarding size are presented in the following sections). Defect Density can simply be expressed thus: Defect Density = Number Of Known Defects

LOC

Many researchers split the kind of defects into two categories: known defects, which are the defects that have been discovered during testing (before the release of the product) and latent defects, which are the defects discovered after the release of product [FP97]. For each one of these two categories, there is a separate defect density metric. Defect density is considered to be a product metric and thus it should have been presented in the next section. However it is directly derived from the development process [FP97] (defect discovery through testing) so it is presented in this section. In addition, it is also a product metric, as it reflects the quality of the product, particularly the defects found after product release.

Revision: final

8

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Defect Removal Effectiveness: Defect Removal Effectiveness is a process metric and reflects the ability of the development team to remove defects [Kan03]. The metric is defined as: Defect Removal Effectiveness = Defects Removed in Development Defects Removed + Defects Found Later

∗ 100%

This is a very useful metric and can be applied at any phase of the software development process. One other metric which can be derived from defect density is system spoilage [FP97], a metric rather useful for the effectiveness of the development team. This metric is defined as

System Spoilage =

Time To Fix Post Release Defects Total System Development Time

As mentioned, this metric reflects the ability of the development team to respond to defects found. LOC: Code can be measured in several ways. The first and most common metrics in the area of software engineering is the number of lines of code (LOC). Although it may seem easy to measure the lines of the code of a computer program, there is a controversy about what we mean by LOC. Most researchers refer to LOC as the Source Lines Of Code (SLOC) which can either be physical SLOCs or Logical SLOCs. Specific definitions of these two measures vary in the sense that what is actually measured is not explicit. One needs to consider whether what is measured refers to any one of the following: • Blank lines. • Comment lines. • Data declarations. • Lines that contain several separate declarations. Logical SLOC measures attempt to measure the number of "statements". Its definition will vary depending on the programming language. Since programming languages have language-specific syntax, the Logical SLOC definition for each language will be different. One simple logical SLOC measure for C-like languages is the number of statement-terminating semicolons. It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions. Unfortunately, SLOC measures are often stated without providing their Revision: final 9

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

definition, and logical SLOC can often be significantly different from physical SLOC. For the purpose of our research, a physical source line of code (SLOC) will be defined as: ... a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character. Comment delimiters (characters other than newlines starting and ending a comment) are considered comment characters. Data lines only including whitespace (e.g., lines with only tabs and spaces in multiline strings) are not included. Using the definition above we have to stress that this size metrics does not represent the actual size of the source code of the program since it excluded the comment lines. Thus the total length of the program is represented as

Totallength(LOC ) = SLOC + Numberofcommentedlines.
The number of commented line is also a useful metric when we refer to other aspects of software, e.g. documentation. Halstead Software Science: Apart from counting lines of code there are also other kind of metrics that try to measure the length of a computer program. One of the earliest of such metrics was introduced by Halstead [Hal77] in the late ’70s. Halstead measures are based on four measures that are directly derived from the source code: • µ1 the number of distinct operators, • µ2 the number of distinct operands, • N1 the total number of operators, • N2 the total number of operands. Halstead further introduced some metrics based upon the previous measures. These metrics are: • The length N of a program N = N1 + N2 , • The vocabulary µ of a program µ = µ1 + µ2 , • The volume V of a program V = N ∗ log2 µ, • The difficulty D of a program D =
µ1 2

N2 . µ2

Revision: final

10

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

In order for these metrics to be measured, one has to decide how to identify the operators and operands. Halstead also used his metrics to estimate the length and the effort for a given program. For more on Halstead estimations see [Hal77]. Halstead Software Science metrics have been criticised a lot during the years and there are controversial opinions regarding them, especially for the volume, difficulty and the rest of estimation metrics. These opinions vary from “no corresponding consensus” [FP97] to “strongest measures of maintainability” [OH94]. However the value of N as a program length, as well as the volume of a program, as proposed by Halstead, does not contradict any relations we have between a program and its length. Thus, we choose to include Halstead metrics in our research [FP97]. Function Points: The previous size measures (1.2-1.2) count physical size: lines, operators and operands. Many researchers argue that this kind of measurement could be misleading since it does not capture the notion of functionality, i.e. the amount of function inside the source code of a given program. Thus, they propose the use of functionality metrics. One of the first such metrics was proposed by Albrecht in 1977 and it was called Function Point Analysis (FPA) [Alb79] as a means of measuring size and productivity (and later on also complexity). It uses functional, logical entities such as inputs, outputs, and inquiries that tend to relate more closely to the functions performed by the software as compared to other measures, such as lines of code. Function point definition and measurement have evolved substantially; the International Function Point User Group or IFPUG1 , formed in 1986, actively exchanges information on function point analysis (FPA). In order to compute Function Points (FP), one first need to compute Unadjusted Function Point Count (UFC). To calculate this, first on further needs to calculate the following: • External inputs: Every input provided from the user (data and UI interactions) but not inquiries. • External outputs: Every output to the user (i.e. reports and messages). • External inquiries: Interactive inputs requiring a response. • External files: Interfaces to other systems. • Internal files: Files that the system uses for its purposes. Next each item is assigned a subjective “complexity” rating on a 3-point ordinal scale: • Simple.
1

http://www.ifpug.com/

Revision: final

11

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Average. • Complex. Then a weight is assigned to the item according to some tables (e.g. for simple external input this is 3 and for a complex external inquiry this is 6, the total number of weights equals to 15). So, the UFC is calculated as
15

UFC =
i=1

{(Number Of Items Of Variety i) ∗ weighti }

Then we compute a technical complexity factor (TCF). In order to do this we rate 14 factors(Fi , such as Reusability and Performance, from 0 to 5 (0 means irrelevant, 3 average and 5 means it is essential to the system built) and then combine all this to the following formula:
14

TCF = 0.65 + 0.01
i=1

Fi

The final calculation of the total FP of the system is

FP = UFC ∗ TCF
There is a very large user community for function points; IFPUG has more than 1200 member companies, and they offer assistance in establishing a FPA program. The standard practices for counting and using function points can be found in the IFPUG Counting Practices Manual. Without some standardisation of how the function points are enumerated and interpreted, consistent results can be difficult to obtain. Successful application seems to depend on establishing a consistent method of counting function points and keeping records to establish baseline productivity figures for specific systems. Function measures tend to be independent of language, coding style, and software architecture, but environmental factors such as the ratio of function points to source lines of code will vary, although there have been some tries to map LOCs to FPs [Jon95]. Some limitations of the function points include problems about the subjectivity of the TCF and other subjective measures used, the weights and other. Also, their application is rather time consuming and demands well trained staff. Taking into account its limitations, the method can be rather useful as an estimator about size and other metrics that take size into account. Object Oriented Size Metrics: In object oriented development, classes and methods are the basic constructs. Thus, apart from the metrics presented above, in object oriented technology we can use the number of the classes and methods as an aspect of size. These metrics are straightforward: • Number of classes. Revision: final 12

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Number of methods per class. • LOC per class. • LOC per method. It is obvious that metrics from other sections also apply to object oriented development, but in relation to classes and objects (for example, for the complexity metrics presented later in this document, we have average complexity per class or method). Reuse: With the term reuse we mean the amount of code which is reused in the future release of the software. Although it may sound simple, reuse cannot be counted in a straightforward manner, because it is difficult to define what we mean by code reuse. So there are different notions of reuse that take into account the extent of reuse [FP97]: we have straight reuse (copy and paste of the code) and modified reuse (take a module and change the appropriate line in order to implement new features). In addition, in object oriented programming, reuse extends to the reuse or inheritance of certain classes. Reuse also affects size measurement of successive releases, if the present release of a software contains a large identical amount of code from the previous one, what is its actual size? For example, IBM uses a metric called shipped source instructions (SSI) [Kan03] which is expressed as SSI (current) = SSI (previous)

+ CSI (new and changed code for current release) − deleted code − changed code
The final term adjusts for changed code which would otherwise be counted twice. This metric encapsulates reuse in its definition and it is rather useful. 1.2.1 Structure Metrics

Apart from size, there are other internal product attributes that are useful to software engineering measurement practice. Since the early stages of the science of software metrics, researchers pointed out a link between the structure of the product (i.e. the code) and certain quality aspects. These are called structural metrics and here we are going to present them. According to our belief these metrics are going to be useful for our research. McCabe’s Complexity Metrics: One of the first and widely used complexity metrics is McCabe’s Cyclomatic Complexity [McC76]. McCabe proposed that a program’s cyclomatic complexity can be measured by applying principles of graph theory. He represented the program structure as a graph G. So for a program with a Revision: final 13

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

graph G, the cyclomatic complexity is

v(G) = e − n + 1
where e is the number of edges of G and n the number of nodes. In addition, McCabe has given some other definitions such as the cyclomatic number, which is

v(G) = e − n + 2
and the essential cyclomatic complexity, which is

ev(G) = v(G) − m
where m is the number of sub flowgraphs, else the number of connected components of the graph. In the literature there is also the definition:

v(G) = e − n + p
(where e is the number of edges, n the number of nodes and p the number of nodes that are exit points — last instruction, exit, return etc.) So for the graph in Figure 1.2.12 , the cyclomatic complexity V (G) = 3. Although the cyclomatic complexity metric was developed in the mid ’70s, it has evolved and been calibrated during the years and it has become a mature, objective and useful metric to measure a program’s complexity. It is also considered to be a good maintainability metric. The above metrics (LOC, McCabe’s Cyclomatic Complexity and Halstead’s Software Science) treat each module separately. The metrics below try to take into account the interaction between the modules and quantify this interaction. Coupling: The notion of coupling was introduced by three IBM researchers in 1974. Stevens, Myers and Constantine proposed a metric that measures the quality of a program’s design [SMC74]. Coupling between two modules of a piece of software is the degree of interaction between them. By combining coupling between all the system’s modules, one can compute the whole system’s global coupling. There are no standard measures of coupling. However, there are six basic types of coupling that are expressed as a relation between two modules x and y [FP97] (the relations are listed from the least dependent to the most): • No coupling relation: x and y have no communication and they are totally independent of each other. • Data coupling relation: x and y communicate by parameters. This type of coupling is necessary for the communication of x and y.
2

Courtesy of: http://www.dacs.dtic.mil/techs/baselines/complexity.html

Revision: final

14

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 1: A program’s flowchart. The cyclomatic complexity of this program V (G) is 3.

• Stamp coupling relation: x and y accept the same record type (i.e. in database systems) as a parameter, which may cause interdependency between otherwise not related modules. • Control coupling relation: x passes a parameter to y with the intention of controlling its behaviour. • Common coupling relation: x and y refer to the same global data. This type of coupling is the kind that we don’t want to have. • Content coupling relation: x refers to the inside of y (i.e. that is it branches into, changes data in, or alters a statement in y. From the above coupling relations the instance of common coupling has been explored in the case of the Linux kernel in order to explore its maintainability [YSCO04]. Henry and Kafura’s Information Flow Complexity: Another complexity metric that is common in software engineering measurement is Henry and Kafura’s Information Flow Complexity [HK76]. This metric is based on the information passing between the modules (or functions) of a program and particularly the fan in and fan out of a module. With the term fan in of a module m, we mean the number of Revision: final 15

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

modules that call module m plus the number of data structures that are retrieved by m. With fan out we mean the number of modules that are called from m plus the number of data structures that are updated by m. The definition of the metric for a module m is: Information Flow Complexity(m) = length(m) · (Fan In(m) · Fan Out(m))
2

Other researchers have proposed to omit the factor length and, thus simplify the metric. Since its introduction, Henry and Kafura’s metric has been validated and connected with maintainability [FP97], [Kan03]. Modules with high information flow complexity tend to be error prone, while, on the other hand, low values of the metric correlate with fewer errors. Object Oriented Complexity Metrics: With the rise of object oriented programming, software metrics researchers tried to figure out how to measure the complexity of such applications. One of the most widely used complexity metrics for object oriented systems is the Chidamber and Kemerer’s metrics suite [CK76]: • Metric 1: Weighted Methods per Class (WMC) WMC is the sum of the complexities of the methods, whereas complexity is measured by cyclomatic complexity:

WMC =

n i=1 ci

where n is the number of methods and c i is the complexity of the i -th method. We have to stress here that measuring the complexity is difficult to implement because, due to inheritance, not all methods are assessable in the class hierarchy. Therefore, in empirical studies, WMC is just the number of methods in a class and the average of WMC is the average number of methods per class [Kan03]. • Metric 2: Depth of Inheritance Tree (DIT) This metric represents the length of the maximum path from the node to the root of the inheritance tree. • Metric 3: Number of Children (NOC) This is the number of immediate successors (subclasses) of a class the hierarchy of the inheritance tree. • Metric 4: Coupling between Object Classes (CBO) An object class is coupled to another if it invokes another one’s member functions or instance variables. CBO is the number of these other classes. • Metric 5: Response for Class (RFC) This metric represents the number of the methods that can be executed in response to a message received by an object of that class. It equals to all the local methods plus the number of methods called by local methods. Revision: final 16

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Metric 6: Lack of Cohesion Metric (LCOM) The cohesion of a class is indicated by how closely the local methods are related to local instance variables of the class. LCOM equals the number of disjoint sets of local methods. Several studies show that the CK metrics suite assist in measuring and predicting an object oriented systems maintainability [FP97], [Kan03]. In particular, studies show that certain CK metrics are linked to faulty classes and help predict such [Kan03]. 1.2.2 Design Metrics

Along with object oriented programming the notion of object oriented design was introduced, too. The programmer has to use some kind of modelling with classes and objects in order to design his application first. After the design is completed, the programmer goes on with coding. One of the questions that a programmer asks himself is whether or not his design is of good quality. An experienced programmer can answer that question by applying on his/her design a number of rules based on his/her experience. He looks for bad choices that may have been done or the violation of some intuitive rules of himself. If the design passes his own checks then it is of good quality and he continues to code writing. Of course with big applications, inspection of a design by a person is rather difficult, so a tool is needed. These intuitive rules are called “design heuristics” checks. They are based on experience. They are like design patterns, but rather than proposing a certain design for certain problems, heuristics are rules that help the designer check the validity of his design. Design heuristics are validations for the object oriented design and advise the programmer for certain design mistakes. These advises should be taken into account by the programmer, who has to make some research to correct things. Of course a heuristics violation does not mean a design mistake all the time, but it is a point for further investigation by the development team. A well known set of such object oriented design heuristics was first introduced by Arthur Riel. Riel in his seminal work [Rie96] defined a set of more than 60 design heuristics, a result of his experience. His work has helped many people to improve their designs and the way they program. Before Riel other researchers have addressed similar issues, including Coad and Yourdon [YC91]. Additionally, there is on going research in the field of design heuristics. Researchers are investigating the impact of the application of object oriented design heuristics and the evaluation and the validation of these heuristics [DSRS03, DSA+ 04]. As an example someone can read the object oriented design heuristics listed in the list below. These heuristics are taken from Riel [Rie96]. 1. The inheritance hierarchy should not be deeper than six. 2. Do not use global data. Class variables or methods should be used instead. 3. All data should be hidden within its class. Revision: final 17

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

4. All data in a class should be private. 5. All methods in a class should have no more than six parameters. 6. A class should not have zero methods. 7. A class should not have one or two methods. 8. A class should not be converted to an object of another class. 9. A class should not contain more than six objects. 10. The number of public methods of a class should be no more than seven. 11. The number of classes with a class collaborates should not be more than four. 12. Classes with so much information should be avoided. We consider that a class fits into this description when it associates with more than four classes, has more than seven methods and more than seven attributes. 13. The fan out of the class should me minimised. The fan out is the product of the number of methods defined by the class and the number of methods they send. This number should be no more than nineteen. 14. All abstract classes must be base classes. 15. All base classes should be abstract classes. 16. Do not use multiple inheritance. 17. A class should not have only methods with names set, get print. 18. If a class contains objects from another class, then the containing class should be sending messages to the contained objects. If this does not happen then we have a violation of the heuristic. 19. In case that a class contains objects from other classes, these objects should not be associated with each other. 20. If a class has only one method apart from set, get and print it means then there is a violation. 21. The number of messages between a class and its collaborator should me minimal. If this number is more than fifteen we have One should note here that the above heuristics can be validated with the use of a tool. Before Riel, Lorenz [LK94] proposed similar rules derived from industrial experience (include metrics for the development process): Revision: final 18

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

1. Average method size should be less than 24 LOC for C++. 2. Average number of methods class should be less than 20. 3. Average number of instance variables per class should be less than 6. 4. Class hierarchy nesting level (or DIT of CK metrics) should be less than 6. 5. Number of subsystem-subsystem relationships should be less than the number of the 6th metric. 6. Number of class-class relationships in each subsystem should be relatively high. 7. Instance variable usage: If groups of methods in a class use different sets of instance variables, look closely to see if the class should be split into multiple classes along those “service” lines. 8. Average number of comment lines per method should be greater than 1. 9. Number of problem reports should be low. 10. Number of times class is reused (a class should be reused in other projects, otherwise it might need redesign). 11. Number of classes and methods thrown away (this should occur in a steady rate). As mentioned before, all these “rules of thumb” are derived from the experience gained during multiple development processes and reflect practical knowledge. For example, a large number of average method size may indicate a poor OO design and a function oriented coding [Kan03]. A class containing too much responsibility (many methods) indicates that there should be a separate class for some of methods. The list of practice goes on, reflecting this practical knowledge mentioned. 1.2.3 Product Quality Metrics

The previous sections discussed development and design quality. These are the quality metrics that can be applied to a software product early in the product lifecycle: before the product is released these metrics may already be calculated. The following metrics are post-release metrics and apply to a finished software product.

Revision: final

19

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Maintainability: When a software product is complete and released, it enters into the maintenance phase. During this phase, defects are corrected, re-engineering occurs and new features are added. Here we look at four types of software maintenance: • Corrective maintenance, which is the main maintenance task and involves correcting defects that are reported by users. • Adaptive maintenance is the maintenance that has to do with adding new functionality to the system. • Preventive maintenance is the defect fixing done by the development team, preventing defects delivered to the user. • Perfective maintenance involves mainly re-engineering and redesigning tasks. For the maintenance process, we mainly have four maintainability metrics. Average Code Lines per Module: This is a very simple metric which is the average number of comment lines in the code of its module of the code (e.g. function, or class). This metric show how easy the code can be maintained or how easy someone can understand part of the code and correct it. With this metric there are some considerations regarding the comment lines, considerations that also apply later in the Maintainability Index metric. For instance, considerations need to be given to how much of the comment lines reflects the code (are there useless comment lines?), if the commented lines contain comment with copyright notices and other legal notices etc. Mean Time To Repair: Mean Time To Repair (MTTR) is an external measure, it has to do with the delivered product from the user point of view, and not the code. MTTR is the average time to fix a defect from the time it was reported to the moment the development team corrected it. Sometimes MTTR is referred to as “fix response time.” Backlog Management Index: Backlog Management Index (BMI) is also an external measure of maintainability and it has to do both with defect fixing and defect arrival [Kan03]. The BMI is expressed as BMI = Number Of Problems Closed Number Of Problem Arrivals

· 100%

The number of problems that arrive or are closed are counted over some fixed time period, usually a month. Of course the time period can change from a week to a fixed day period. If BMI is bigger than 100% it means that the development team Revision: final 20

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

is efficient and closes bugs faster than their arrival rate. If it is less than 100% it means that the development team has efficiency problems and stays behind with the defect fixing process. Maintainability Index: Several metrics have been proposed to describe the internal measurement of maintainability [FP97]. Most of them try to correlate structural metrics presented before, with maintainability. In certain cases there has been a link of a certain metric with maintainability. For example, McCabe categorised programs in the maintenance risk categories, stating that any program with a McCabe metric larger than 20 has a high risk of causing problems. One interesting model that derived from regression analysis and is based on metrics proposed before is the Maintainability Index (MI) proposed by Welker and Oman [WO95]. The MI shows strong correlation between Halstead Software Science metrics, McCabe’s cyclomatic complexity, LOCs , and the number of comments in the code. There are two expressions of MI, one using the three of the previous metrics and another using the four: Three-Metric MI equation MI = 171 − 5.2 ln(aveV ) − 0.23aveV (g) − 16.2 ln(aveLOC) where aveV is the average Halstead Volume per module, aveV (g) is the average extended cyclomatic complexity per module, and aveLOC is the average lines of code per module. Four-Metric MI equation MI = 171 − 5.2 ln(aveV ) − 0.23aveV (g) − 16.2 ln(aveLOC) + 50 sin

2.4perCM

here aveV (g) and aveLOC are as before and aveE is the average Halstead Effort per module and and perCM is the average percent of lines of comments per module. In their article, Welker and Oman proposed three rules on how to choose which metric (3 or 4 metric equation) is appropriate for use [WO95]. They proposed three criteria, if one them is true then it is better to use the 3 metric equation, otherwise use the 4 metric one. The criteria are: • The comments do not accurately match the code. Unless considerable attention is paid to comments, they can become out of synchronisation with the code and thereby make the code less maintainable. The comments could be so far off as to be of dubious value. • There are large, company-standard comment header blocks, copyrights, and disclaimers. These types of comments provide minimal benefit to software maintainability. As such, the 4 metric MI will be skewed and will provide an overly optimistic maintainability picture. Revision: final 21

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• There are large sections of code that have been commented out. Code that has been commented out creates maintenance difficulties. Calculated MI is simple because there are tools (we examine such tools in Section 2) that measure the metrics it facilitates. As authors suggest MI is useful periodic assessment of the code in order to test its maintainability.

1.3

Productivity Metrics

Software productivity is a very complex metric which is mainly used in effort and cost estimation. However, we shall use productivity as a quality metric in order to evaluate the health of a software project. Generally productivity is expressed as

Productivity =

Number Of Things Implemented PersonMonths

The term “things” refers to size measurements which can be expressed as lines of code, function points or number of classes in case of object oriented development. Similarly, person months can be any fixed time period. We must note here that the metric proposed is a very simple one. Of course, more complex metrics exist (such as metrics derived from regression techniques) but are beyond the scope of our own research.

1.4

Open Source Development Metrics

Apart from the metrics presented in the previous sections, there are metrics that can be applied directly to the Open Source development process and have been use in the past in order to perform open source software evaluation and success measurement [lD05], [CHA06]. Additionally we present some metrics used by Open Source hosting companies like Freshmeat.net. Number of releases in past 12 months: This measures the activity of a software project, particularly its productivity and also reliability (and defect density). This bidirectional nature of this metric (productivity and maintainability) has no a nominal scale. Small values of the metric may show small productivity, but it may be an indicator for minor improvements and bug fixes. Thus, this metric has to be measured along with others, like the number of contributors and/or number of downloads. Furthermore, the metric can be refined to number of major releases and number of minor releases or patches. With this distinction, the previous problem of the metric can be overcome, high number of minor releases is an indicator of problematic software (but also of high response fix time). The number of minor releases or patches metric can be used along with the defect removal effectiveness metric

Revision: final

22

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Volume of mailing lists: This metric is rather useful in order to evaluate the health of a project and the support it provides [SSA06]. It is a direct measurement of the number of messages sent in a project’s list in a month period (or another fixed time period). A healthy project has an active mailing list, while a soon to be abandoned one has lower activity. The volume of the users’ mailing list is also an indicator of how well this project is supported and well documented. Volume of available documentation: Along with the previous metric, this one is an indicator of the available support. When we refer to the volume of available documentation, we mean the available documents, like the installation guide or the administrator’s guide. Number of contributors: A direct repository measurement, which represents how big the community of a project is. High number of contributors mean fast bug fixing, availability of support and of course it is a perquisite for a project to evolve. We have to stress here that a lot of projects like Apache, have a small core group that produces the majority of the code and a larger one that contributes less [MFH02]. Thus, this metric has to be further evaluated and be used along with other metrics. Repository Checkouts: From a project’s repository, someone can directly extract some other interesting metrics, particularly productivity metrics. These metrics are the number of commits per commiter, the number of commits of a specific commiter for a fixed period (for example, a month) and the total number of commits for a fixed period. All these metrics are productivity metrics and also can be an indicator for the defect removal effectiveness. Of course, as these metrics measure activity, they represent the healthiness of the project. Number of downloads: Direct measurement of the number of downloads of an Open Source project. This metric can show us a project’s popularity, thus it is an indicator of its healthiness and end user quality. However, someone must have in mind that someone may downloading a software does not mean that he used it and, if he did, whether he was satisfied with it. Freshmeat User Rating: Freshmeat.net hosting service uses a user rating metric which works like this, according to its website3 : Every registered user of freshmeat may rate a project featured on this website. Based on these ratings, they build a top 20 list and users may sort their search results by ratings as well. Please be aware of the fact that unless a project received 20 ratings or more a project will not be
3

http://freshmeat.net/faq/view/31/

Revision: final

23

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

considered for inclusion in the top 20. The formula gives a true Bayesian estimate, the weighted rank (WR):

WR =

v m R+ C v+m v+m

where: R = average rating for the project v = number of votes for the project m = minimum votes required to be listed in the top 20 (currently 20) C = the mean vote across the whole report

Freshmeat Vitality: The second metric that Freshmeat uses is the project’s vitality. Again, according to Freshmeat 4 , the vitality score for a project is calculated thus:

popularity = (announcements ∗ age)/(last announcement)
which is the number of announcements multiplied by the number of days an application exists divided by the days passed since the last release. This way, applications with lots of announcements that have been around for a long time and have recently come out with a new release earn a high vitality score, old applications that have only been announced once get a low vitality score. The vitality score is available through the project page and can be used as a sort key for the search results (definable in the user preferences). Freshmeat Popularity: From the Freshmeat site 5 : The popularity score superseded the old counters for record hits, URL hits and subscriptions. Popularity is calculated as

popularity =

(record hits + URL hits) ∗ (subscriptions + 1)

Again we have to stress here that these metrics are used by Freshmeat and of course they need further investigation and validation.

1.5

Software Metrics Validation

The metrics presented in this chapter try to measure a wide range of attributes of software. For each attribute of a piece of software (e.g. length) there are various kinds of metrics to measure them. This availability of various metrics for each attribute raises the question if a particular metric is suitable for measuring an attribute. The matter of suitability of a specific metrics is a very important area in
4 5

http://freshmeat.net/faq/view/27/ http://freshmeat.net/faq/view/30/

Revision: final

24

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

software engineering research and it is the reason of why metrics are questioned by researchers and there is a lot of discussion for them. According to Fenton [FP97] the way we want to validate metrics depends on whether we want just to measure an attribute or we want to measure in order to predict. Prediction of an attribute of a system (e.g. code quality or cost) is a core issue in software engineering. So, in order to perform metrics validation we must distinguish between these two types: • Measurement that is performed in order to assess an existing entity by numerically characterising one or more of its attributes, for example size. • Measurement that is performed in order to predict some attribute of a future entity, like the quality of the code. The validation procedure also can be distinguished in two types[BEM95]: • Theoretical validation. This kind of validation facilitates a mathematical formalism and a model. This is usually done by setting mathematical relations for each attribute and try to validate these relations. • Empirical validation. As Briand et al state [BEM95] empirical validation is the answer to the question “is the measure useful in a particural development environment, considering a given purpose in defining the measure and a modeler’s viewpoint?”. From these two approaches, empirical validation is the one which is widely used. It practically tries to correlates a measure with some external attributes of a software, for example complexity with defects. 1.5.1 Validation of prediction measurement

As Fenton states,validating a measurement conducting for prediction is the process of establishing the accuracy of the prediction by empirical means, that is by comparing model performance with known data. In other words, a prediction measurements valid if it makes accurate predictions. This kind of validation is widely used in software engineering research for cost estimation purposes, quality and reliability prediction. With this method, researchers form and test hypotheses in order to predict certain attributes of software or conduct formal experiments. Then they use mathematical (statistical) techniques to test their results, for example if a particular metric such as size is an accurate cost estimator. Other kinds of predictions are quality and fault proneness detection. Mathematical techniques used are regression analysis, logistic regression and also more sophisticated methods such as decision trees and neural networks. Examples of such a metric validation are [GFS05] and [BBM96] Revision: final 25

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

1.5.2

Validation of measures

Again, according to Fenton [FP97] validating a software measure is the process of ensuring that the measure is a proper numerical characterisation of the claimed attribute by showing that the representation condition is satisfied. As it is implied, this kind of validation facilitates theoretical validation. For example, for a metric that measures size, we form a model to represent a program and relation of that model to the notion of size. Let’s call the program P and the relation m(P). In order to validate the length we can use the following. If a program P1 is of length m(P1 ) and a program P2 of length m(P2 ) then this equation

m(P1 + P2 ) = m(P1 ) + m(P2 )
should be valid. If also this is valid

P1 < P2 ⇒ m(P1 ) < m(P2 )
then our relation, our metric, is valid. Although we are not going to discuss metrics validation in depth here, we are going to perform validation throughout our projects, especially when we are going to present new metrics for Open Source software development. A good place to start studying metrics validation is [BEM95] and [Sch92]. Both papers provide a lot of insight about metrics validation and also present mathematical techniques both for theoretical and empirical validation. A more recent study that discusses metrics is that of Kener and Bond [KB04b]. Another interesting paper which discusses how empirical research in software engineering should be conducted and contains a lot about validation is that of Kitchenham et. al. [KPP+ 02]. Two good examples of application of metrics validation are that of Briand et al [BDPW98] and Basili et. al. [BBM96]. A rather complete publication list on software metrics validation can be found here http://irb.cs.uni-magdeburg.de/sw-eng/us/bibliography/bib_10. shtml

Revision: final

26

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

2

Tools

Many publications mention measurement tool support and automation as important success factors for software measurement efforts and quality assurance [KSPR01], providing frameworks and general approaches [KRSZ00], or giving more specific solution architectures [JL99]. There is a great variety of research tools to support software metric creation, handling, and analysis; an overview on different types of software metrics tools is given in [Irb]. Wasserman [A.I89] introduces the concept of tools with vertical and horizontal architecture, with the former supporting activities in a single life cycle phase, such as UML design tools or change request databases, and the latter supporting activities over several life cycle phases, such as project management and version control tools. Fuggetta [Fug93], on the other hand, classifies tools as either single tools, workbenches supporting a few development activities, and environments supporting a great part of the development process. The above ideas for different kind of metrics tools certainly affected the functionality that commercial tools offer but still the most popular categorisation of metrics tools classifies metrics tools as either product metrics tools or process metrics tools. Product metrics tools measure the software product at any stage of its development, from requirements to installed system. Product metrics tools may measure the complexity of the software design, the size of the final program (either source or object code), or the number of pages of documentation produced. Process metrics tools, on the other hand, measure the software development process, such as overall development time, type of methodology used, or the average level of experience of the programming staff. In this chapter we are going to present tools, both Open Source and commercial that support and automate the measurement process.

2.1

Process Analysis Tools

The process of Open Source software development depends heavily on a repository, responsible for version control. The majority of the projects use mainly two versioning control systems for their repositories, CVS6 and Subversion7 . Many of the information needed in order to extract the various metrics is contained in that repositories. This information is • The code itself, along with historical data (changes, additions, etc). • Information regarding programmers (commiters) (number of commiters, usernames, etc). • Historical data about the productivity of commiters (number of commits, which part of code is committed by whom, etc.
6 7

http://www.nongnu.org/CVS/ http://subversion.tigris.org/

Revision: final

27

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 2: CVSAnalY Web Interface, Main Page

All these are stored in a repository and someone can find tools, available as Open Source software, to extract all the useful information from the repository. 2.1.1 CVSAnalY

CVSAnalY8 (CVS Analysis) is one of the first tools that accesses a repository in order to find information regarding an open source project. It has been developed by the Libresoft Group at the Universidad Juan Carlos in Spain and has already produced results, used in research in open source software [RKGB04]. The tool is licenced under the GNU General Public Licence. Specifically, CVSAnalY is a tool that extracts statistical information out of CVS and Subversion repository logs and transforms it in database SQL formats. The main tool is a command line tool. The presentation of the results is done with a web interface - called CVSAnalYweb - where the results can be retrieved and analysed in an easy way (after someone has run the command line main tool, CVSAnalY). The tools produces various results and statistics regarding the evolution of a project over time. A general view of the tool is shown in Figure 2. The tool stores historical data such as: • First commit logged in the versioning system. • Last commit (until the date we want to examine. • Number of days examined.
8

http://cvsanaly.tigris.org/

Revision: final

28

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Total number of modules in the versioning system. • Commiters. • Commits. • Files. • Aggregated Lines. • Removed Lines. • Changed Lines. • Final Lines. File type statistics for all modules: • File type. • Modules. • Commits. • Files. • Lines Changed. • Lines Added. • Lines Removed. • Removed files. • External. • CVS flag. • First commit. • Last commit. The tool also logs the inactivity rate for modules and commiters, commiters per module, Herfindahl-Hirschman Index for modules and also as mentioned before it produces helpful graphs. Example of graphs produced are: • Evolution of the number of modules. • Modules by Commiters (log-log).

Revision: final

29

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 3: CVSAnalY Web Interface, Evolution of the number of modules

• Modules by Commits (log-log). • Modules by Files (log-log). • Commiter by Changes (log-log). Examples of the graphs are shown in Figure 3 and Figure 4. The CVSAnalY tool is a rather useful one and it helps to gather data about the process of Open Source software development and data substantial to measure other metrics, especially process metrics. Another very important feature of CVSAnalY is the reconstruction of the repository in specific timelines. 2.1.2 GlueTheos

GlueTheos [RGBG04] has been developed to coordinate other tools used to analyse Open Source repositories. The tool is a set of scripts used to download data (sourcecode) from Open Source repositories, analyse them with external tools (developed from third parties) and store the results in a database for further investigation. The parts, which comprise GlueTheos are: • The core scripts act as a user interface interacting with the user and handle details like repository configuration, periods of analysis (the periodic snapshots from a repository), storage details, third party tools details and parameters.

Revision: final

30

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 4: CVSAnalY Web Interface, Commiter by Changes

• The downloading module which is responsible for downloading source code snapshots at specific dates and storing it locally. • The analysis module. Here user describes further details of the external tools used for source code analysis. These details include instructions on how to invoke the tool, which are the parameters to the tool and the output details of the tool. The module is also responsible for running these eternal tools. • The storage module. This module is responsible for the storage of the results created by the previous module. It takes the output of an analysis tool and formats it into an appropriate SQL command, suitable to store the result into a database. Generally the tools runs like this: 1. The user chooses which project to analyse (e.g. GNOME) and which periods to analyse (e.g. every month from December 2003 until September 2005). 2. Then it chooses an analysis tool (e.g. sloccount, which counts physical source lines of code9 ). The integration of the tool with the main set of scripts include description on how to call the tool, parameter passing and description of its output.
9

http://www.dwheeler.com/sloccount/

Revision: final

31

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 5: GlueTheos, Table that contains analysis of a project

3. The program retrieves the code of the project analysed for the configured dates, then it analyses the code with the external tool and stores the output in a database. The table of the database that contains the analysis results has as a column its output of the external tool. Figure 5 shows a table created by GlueTheos, which contains the output of sloccount (SLOC -source lines of code- and language type) for the files of the gnome core project at a specific date. GlueTheos is released under the GNU General Public Licence. 2.1.3 MailingListStats

MailingListStats10 analyses Mailman (and in future other mailing list manager software) archives in order to get statistical data out of them. Statistical data is transformed into XML and SQL to allow further analysis and research. This tool also includes a web interface.
10

http://libresoft.urjc.es/Tools/MLStats

Revision: final

32

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

2.2
2.2.1

Metrics Collection Tools
ckjm

ckjm11 calculates Chidamber and Kemerer object-oriented metrics by processing the bytecode of compiled Java files. The program calculates for each class the six metrics proposed by Chidamber and Kemerer as well as afferent couplings and the number of public methods. This application was developed by Professor Diomidis Spinellis, the coordinator of the SQO-OSS project. 2.2.2 The Byte Code Metric Library

The Byte Code Metric Library12 (BCML) is a collection of tools to calculate the metrics of the Java byte code classes or JAR files in directories, output the result into XML files, and report the result with HTML format. 2.2.3 C and C++ Code Counter

CCCC is a tool which analyses C / C++ files and generates a report on various metrics. The tool 13 is developed as a MSc thesis by Tim Littlefair and it is copyrighted by him. The tool is command line and analyses an input of a list of files and generates HTML and XML reports containing results. The metrics measured are the most common ones, specifically they are: • Summary table of high level metrics summed over all files processed in the current run. • Table of procedural metrics (i.e. lines of code, lines of comment, McCabe’s cyclomatic complexity summed over each module. • Table of four of the six metrics proposed by Chidamber and Kemerer. • Structural metrics based on the relationships of each module with others. Includes fan-out (i.e. number of other modules the current module uses), fan-in (number of other modules which use the current module), and the Information Flow measure suggested by Henry and Kafura, which combines these to give a measure of coupling for the module. • Lexical counts for parts of submitted source files which the analyser was unable to assign to a module. Each record in this table relates to either a part of the code which triggered a parse failure, or to the residual lexical counts relating to parts of a file not associated with a specific module.
11 12

http://www.spinellis.gr/sw/ckjm http://csdl.ics.hawaii.edu/Tools/BCML 13 http://cccc.sourceforge.net/

Revision: final

33

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 6: CCCC, Report for Procedural Metrics

Figure 6 shows the report for procedural metrics for an Open Source project, while Figure 7 shows the report for Object Oriented Metrics of the same project. 2.2.4 Software Metrics Plug-In for the Eclipse IDE

The Software Metrics 14 Plug In for Eclipse IDE is a powerful add-on for the popular Open Source software IDE Eclipse. It is installed, as its name denotes, as a plug in to Eclipse and it is distributed under the same licence as the Eclipse IDE itself. The tool measures Java code against a long list of metrics: • Lines of Code (LOC): Total lines of code in the selected scope. Only counts non-blank and non-comment lines inside method bodies. • Number of Static Methods (NSM): Total number of static methods in the selected scope. • Afferent Coupling (CA):The number of classes outside a package that depend on classes inside the package. • Normalised Distance (RMD): RM A + RM I − 1, this number should be small, close to zero for good packaging design. • Number of Classes (NOC): Total number of classes in the selected scope. • Specialisation Index (SIX): Average of the specialisation index, defined as NORM * DIT / NOM. This is a class level metric.
14

http://metrics.sourceforge.net/

Revision: final

34

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 7: CCCC, Report for Object Oriented Metrics

• Instability (RMI): CE / (CA + CE). • Number of Attributes (NOF): Total number of attributes in the selected scope. • Number of Packages (NOP): Total number of packages in the selected scope. • Method Lines of Code (MLOC): Total number of lines of code inside method bodies, excluding blank lines and comments. • Weighted Methods per Class (WMC): Sum of the McCabe Cyclomatic Complexity for all methods in a class. • Number of Overridden Methods (NORM): Total number of methods in the selected scope that are overridden from an ancestor class. • Number of Static Attributes (NSF): Total number of static attributes in the selected scope. • Nested Block Depth (NBD): The depth of nested blocks of code. • Number of Methods (NOM): Total number of methods defined in the selected scope. • Lack of Cohesion of Methods (LCOM): A measure for the cohesiveness of a class. Calculated with the Henderson-Sellers method: If m(A) is the number of methods accessing an attribute A, calculate the average of m(A) for all attributes, subtract the number of methods m and divide the result by (1-m). A low value indicates a class with a high degree of cohesion. A value close to 1 Revision: final 35

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

indicates a lack of cohesion and suggests the class might better be split into a number of (sub)classes. • McCabe Cyclomatic Complexity (VG): Counts the number of flows through a piece of code. Each time a branch occurs (if, for, while, do, case, catch and the ?: ternary operator, as well as the && and || conditional logic operators in expressions) this metric is incremented by one. Calculated for methods only. For a full treatment of this metric see McCabe [McC76]. • Number of Parameters (PAR): Total number of parameters in the selected scope. • Abstractness (RMA): The number of abstract classes (and interfaces) divided by the total number of types in a package. • Number of Interfaces (NOI): Total number of interfaces in the selected scope. • Efferent Coupling (CE): The number of classes inside a package that depend on classes outside the package. • Number of Children (NSC): Total number of direct subclasses of a class. • Depth of Inheritance Tree (DIT): Distance from class Object in inheritance hierarchy. The user can also set ranges and thresholds for each metric in order to track code quality. Examples of these ranges can be: • Lines of Code (Method Level): Max 50 - If a method is over 50 lines of code it is suggested that the method should be broken up for readability and maintainability. • Nested Block Depth (Method Level): Max 5 - If a block of code has over 5 nested loops, break up the method. • Lines of Code (Class Level): Max 750 - If a class has over 750 lines of code, split up the class and delegate it’s responsibilities. • McCabe Cyclomatic Complexity (Method Level): Max 10 - If a method has over 10 different loops, break up the method. • Number of Parameters (Method Level): Max 5 - A method should have no more than 5 parameters. If it does have, create an object and pass the object to the method.

Revision: final

36

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 8: Metrics, List of metrics

As someone can see from this list, the tool is rather extensive and the metrics measured is exhaustive. A view of the plugin displaying the results of a measurement is shown in Figure 8. The tool also displays the dependency connections among the various packages and classes of a project analysed as a connected graph. An example of this graph is shown in Figure 9.

2.3

Static Analysis Tools

These tools analyse a program’s source code and locate bugs and problematic constructions. If a tool simply collects metrics, then it is listed under metric collection tools. It is best to limit this page to tools that are open source, and candidates for SQO-OSS data generation. Wikipedia maintains an exhaustive list of tools. 2.3.1 FindBugs

FindBugs15 looks for bugs in Java programs. It is based on the concept of bug patterns.
15

http://findbugs.sourceforge.net

Revision: final

37

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 9: Metrics, Dependency Graph

2.3.2

PMD

PMD16 scans source code and looks for potential problems possible bugs, unused and suboptimal code, over-complicated expressions and duplicate code. 2.3.3 QJ-Pro

QJ-Pro17 is a tool-set for static analysis of Java source code: a combination of automatic code review and automatic coding standards enforcement. 2.3.4 Bugle

Bugle18 uses Google code search queries to locate security vulnerabilities.

2.4

Hybrid Tools

Hybrid tools analyse both process and project data.
16 17

http://pmd.sourceforge.net http://qjpro.sourceforge.net 18 http://www.cipher.org.uk/index.html?p=projects/bugle.project

Revision: final

38

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

2.4.1

The Empirical Project Monitor

The Empirical Project Monitor19 (EPM) provides a tool for automated collection and analysis of project data. The current version uses CVS, GNATS, and Mailman as data sources. 2.4.2 HackyStat

Hackystat20 is a framework for automated collection and analysis of software engineering product and process data. Hackystat uses sensors to unobtrusively collect data from development environment tools; there is no chronic overhead on developers to collect product and process data. Hackystat does not tie you to a particular tool, environment, process, or application. It is intended to provide in-process project management support. 2.4.3 QSOS

QSOS21 is a method, designed to qualify, select and compare free and Open Source software in an objective, traceable and argued way. It publicly available under the terms of the GNU Free Documentation License.

2.5

Commercial Metrics Tools

This section aims to document several popular commercial software metrics tools. When possible we attempted to assess properties of commercial metrics tools in a highly heterogeneous and ever-changing software development environment. Therefore, the chosen tools are able to support the generation and storage of metric data consistently and in a structured way and provide some degree of customisation with development specific parameters. Based on the most common categorisation of metrics tools mentioned earlier product and process metrics tools will be documented.

2.6

Process metrics tools

This section documents several software process metrics tools. Apart from the presentation of the tools the assessment of the capabilities of the tools will be performed when possible. The evaluation is based on three basic criteria, indicated by other studies [A.I89] involving platform independence, input/output functions and automation.
19 20

http://www.empirical.jp http://www.hackystat.org 21 http://www.qsos.org

Revision: final

39

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Platform The first step in utilising any tool is to install it on an operating system. In the worst case, tool’s platform requirements can not be fulfilled by an existing environment, which means a new OS would have to be added, i.e., bought, installed, maintained. Another platform issue is the database support: some tools are based on a metric repository and have to rely on some sort of relational database. The range of supported databases affects the tools’ platform interoperability. As some of the tools have both server and client components (for data storage and collection/reporting purposes, respectively), one has to distinguish these components’ platform interoperability separately. Input/output Software project quality tracking and estimation tools heavily rely on data from external sources such as UML modelling tools, source code analysers, work effort or change request databases etc. The ease of connecting to these applications through interfaces or file input substantially influences a metric tool’s efficiency and error-proneness. On the other hand, data often has to be exported for further processing in spread-sheets, project management tools or slide presentations. Reports and graphs have to be created and possibly viewed, posted on the Web, or printed. Automation A key aspect of metric data processing is automatic data collection. This can range from simple alerts sent to project managers at certain conditions, periodic extraction of metric information from external tools, to advanced scripting and programming capabilities. Missing automation usually requires tedious and expensive manual data input and makes measurement inconsistencies more likely, as measurements are performed by different persons. 2.6.1 MetriFlame

MetriFlame22 , a tool for managing software measurement data, is strongly based on the GQM approach [BCR94]. A goal is defined, then corresponding questions and metrics are determined, to assess whether a goal has been reached. Metrics can only be accessed through such a GQM structure; it is not possible to simply collect metrics without having to formulate goals and questions. The main elements of the MetriFlame tool environment are: the actual MetriFlame tool, data collectors and converters, and components for viewing the results. MetriFlame does not feature a project database; it stores all its data in different files with proprietary formats. The functionality that the tool offers is summarised in figure 1. MetriFlame supports 32bit Microsoft Windows environments (Windows 95 and later versions). The database converter requires the Borland Database Engine (BDE) in order to access the different types of databases. BDE is installed during the MetriFlame installation procedure. Data can be imported to MetriFlame by using the so called data converters,
22

http://www.virtual.vtt.fi

Revision: final

40

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 10: Metriflame functionality which are not part of the MetriFlame tool, but separate programs. These programs convert the data and generate structured text files, which can then be imported into MetriFlame. New data can also be entered manually. The process of data collection cannot be automated. Project data can only be saved in a MetriFlame project file; no other file format is available. Reports (graphs) can be saved as WMF, EMF, BMP , JPEG or structured text. MetriFlame does not feature an estimation model. 2.6.2 Estimate Professional

Estimate Professional23 is a tool to control project data, create project estimates based on different models or historical project data and visualise these estimates. Different scenarios can be created by changing project factors. Estimate Professional is an extended and improved version of “Estimate”, a freely available program, which can perform only basic size-based estimates, does not feature reporting and does not consider risk factors. Estimate Professional does not feature a project database; it stores all project information in a single file. Initially, project data is entered by creating a new project and starting the estimate wizard. After specifying project related information like type of project, current phase of project, maximum schedule, priority of a short schedule, one has to choose between size-based estimation, which focuses on artifact metrics (LOC, number of classes, function points), and effort-based estimation, which focuses on effort metrics (staff-months). Estimates in Estimate Professional are based on three models: Putnam Methodology, COCOMO II and Monte Carlo Simulation. Estimates can be calibrated in three ways: Using the outcome of historical projects from the project database, altering the project type
23

http://www.workflowdownload.com/

Revision: final

41

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 11: Estimate Professional. by choosing subtypes for parts of the project or tuning the estimation by changing productivity drivers like database size or programmer capability. A screenshot of the tool is presented in Figure 11. Estimate Professional supports MS Windows 95/98, NT 4.0 and 2000. For Installation on NT systems, administrator rights are required. Project data can be imported from a Microsoft Project file or from a CSV file. The process of data collection cannot be automated. Project data can be exported to a Microsoft Project file; project metrics can be exported into a CSV file. 2.6.3 CostXpert

The software cost estimation tool CostXpert24 produces estimates of project duration, costs, staff effort, labour costs etc. using software size, labour costs, risk factors and other input variables. The tool features mappings of source lines of code equivalents for more than 600 different programming languages. The main menu of the tool is presented in Figure 12. Import of project data is limited to manual entry. Data connectors to tools processing software artifacts do not exist. Data can be exchanged between different copies of CostXpert via CostXpert project files. The process of data collection can not be automated. Regarding the estimation process Cost Xpert integrates multiple software sizing methods, it is compliant with COCOMO and over 32 lifecycles and standards. Cost Xpert is designed to aid project control, facilitate process improvement and earn a greater return on investment (ROI). Especially for COTS products the tool is able to estimate the portion of the package that needs no modification but should be configured and parameterised, what portion of the package needs to be modified and the amount of functionality
24

http://www.costxpert.com/

Revision: final

42

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 12: Cost expert main menu that should be added to the system. Project data in a work breakdown structure can be exported to Microsoft Project or Primavera TeamPlay. The expected labour distribution can be exported to a CSV file. Customised project types, standards and lifecycles can be exported to so-called customised data files. Reports can be printed or exported as PDF, RTF or HTML files. Graphs can be exported as BMP WMF or , JPEG files. CostXpert integrates more than 40 different estimation models based on data from over 25.000 software projects. CostXpert supports MS Windows 95 and all later versions. CostXpert does not feature a project database; project data is stored in a project file in proprietary format. 2.6.4 ProjectConsole

ProjectConsole 25 is a Web-based tool for project control that offers project reporting capabilities to software development teams. Project information can be extracted from Rational tools or other third-party tools, is stored in a database and can be accesses through a Web site. Rational ProjectConsole makes it easy to monitor the status of development projects, and utilise objective metrics to improve project predictability. Rational ProjectConsole greatly simplifies the process of gathering metrics and reporting project status by creating a project metrics Web site based on data collected from the development environment. This Web site, which Rational ProjectConsole updates on demand or on schedule, gives all team members complete, up-to-date view of your project environment. Rational ProjectConsole collects metrics from the Rational Suite development platform and from third-party products, and presents the results graphically in a customisable format to help the assessment
25

http://www-128.ibm.com

Revision: final

43

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 13: Rational ProjectConsole. of the progress and quality. Rational ProjectConsole supports MS Windows XP; Windows NT 4.0 Server or Workstation, SP6a or later and Windows 2000 Server or Professional, SP1 or later. All the data is stored in a database, the so-called metric data warehouse. Supported databases include SQL Server, Oracle and IBM DB2. ProjectConsole needs a Web server (IIS or Apache Tomcat) to publish its data over a network (local network or the Internet). The project Web site can be viewed with any browser. ProjectConsole can extract metrics directly from Rational Clear-Quest, Requisite Pro, Rose, and Microsoft Project repositories. In addition, ProjectConsole provides so-called collection agents that can parse Rational Purify, Quantify, Coverage, and ClearCase data files. Automatic collection tasks can be scheduled to run daily, weekly or monthly at a specified date and time. The data is extracted from the source programs and stored in the metric data warehouse. The project Web site is automatically updated. Graphs are in stored in PNG files. Data can be published in tables and exported into HTML format. MS Excel 2000 or later can be used to import the HTML table format. ProjectConsole does not feature an estimation model. Figure 13 depicts the multi chart display of Project Console. 2.6.5 CA-Estimacs

Rubin has developed a proprietary software estimating model 26 that utilises gross business specifications for its calculations. The model provides estimates of total development effort, staff requirements, cost, risk involved, and portfolio effects. The ESTIMACS model addresses three important aspects of software managementestimation, planning, and control. The ESTIMACS system includes five modules.
26

http://www.ca.com/products/estimacs.htm

Revision: final

44

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

The first module is the System development effort estimator. This module requires responses to 25 questions regarding the system to be developed, development environment, etc. It uses a database of previous project data to calculate an estimate of the development effort. Staffing and cost estimator is another. Inputs required for this module are: the effort estimation from above, data on employee productivity, and salary for each skill level. Again, a database of project information is used to compute the estimate of project duration, cost, and staffing required. Hardware configuration estimator requires as inputs information on the operating environment for the software product, total expected transaction volume, generic application type, etc. Output is an estimate of the required hardware configuration. Risk estimator module calculates risk using answers to some 60 questions on project size, structure, and technology. Some of the answers are computed automatically from other information already available. Finally portfolio analyser provides information on the effect of this project on the total operations of the development organisation. It provides the user with some understanding of the total resource demands of the projects. 2.6.6 Discussion

The tools evaluated provide a broad variety of analysis capabilities, and different degrees of explicit estimation support. However, they all allowed storing and comparing project measures in a structured way. Certain conclusions can be drawn on whether the tools can integrate seamlessly in an existing and heterogeneous software development environment. All of the evaluated tools are only available on one operating system (MS Windows). This is particularly problematic for server components, as many times a dedicated server would have to be added to an otherwise Unix-based server farm. Some tools only work with particular database engines, for example Project Console. In addition to manual data entry, the tools generally are restricted to a few input file formats (e.g. Estimate Professional only reads Microsoft Project and CSV files). While communication with spreadsheet applications is usually supported, few tools can access development tools like integrated development environments (IDE) or requirement databases directly. Tools with advanced metric data collection capability (like MetricCenter) offer only a limited set of connectors to specific development tools, which have to be purchased separately. Their communication protocol is disclosed. Automation support is either not available (MetriFlame, Estimate Professional, CostXpert) or limited to pull operations (MetricCenter). The degree of flexibility with respect to defining new metrics and changing reports differs greatly, however all tools provide only basic reporting flexibility. This would not be a problem itself, if the tools would allow unrestricted data access for online analytical processing (OLAP) reporting tools, but this is not possible with most of the tools either. Data output for further processing is sometimes limited to CSV files and a proprietary file format (Metri-Flame). Tools often don’t support common reporting file Revision: final 45

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

formats like PDF. Output automation is supported by few of the evaluated tools (MetricCenter, Project-Console). Some tools, instead of supporting integration, seem to duplicate features, which normally are already available in medium and large-scale IT environments: Some tools introduce a proprietary file format (Metri-Flame), or are limited to a particular database system instead of accessing the companies reliable database infrastructure. Some basic graphical reporting and Web-publishing features are provided, instead of feeding advanced OLAP reporting tools, whose use would also automatically eliminate the need of duplicating features for the handling of user access rights. Finally, the difficulties in getting access to some tools provide an additional cost barrier in integrating them in existing IT environments and seem to indicate that at least some of these tools do not provide user interfaces with a low learning curve. Altogether, process engineers and portfolio managers operating in highly dynamic environments must still expect substantial costs when evaluating, integrating, customising, operating and continuously adapting planning and monitoring tools. Even tools with advanced architectures like MetricCenter offer a limited set of supported development tool, restricted customisation capabilities due to disclosed data protocols, and platform restrictions. Proprietary approaches to security and user access concerns are further complicating integration. Much work needs to be done to lower the technological barrier for collecting software metrics in a varying and changing environment. Possible approaches to some of the current problems are likely to embrace the support of modern file formats like XML, and light-weight data communication by using, for example, the SOAP protocol.

2.7

Product metrics tools

The initial target of product metrics tools was the assessment of objective measures of software source code regarding size and complexity. As experience has been gained with metrics and models, it has become increasingly apparent that metric information available earlier in the development cycle can be of greater value in controlling the process and results. Along with the calculation of several metrics values the tools attempt to support testing procedures as well taking into consideration the information coming from the metrics values. In this section a number of product metrics tools will be discussed. These tools were chosen because of their wide use or because they represent a particularly interesting point of view. The tools presented reflect the areas where most work on product metrics has been done. References have been provided for readers who are interested in further examining a tool.

Revision: final

46

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

2.7.1

CT C++ -CMT++-CTB

CT C++ , CMT++ and CTB 27 are all tools developed by the Finish company, Testwell and available from Verifysoft for Microsoft Windows, Solaris, HP-UX and Linux. They focus on test coverage (CT C++ ), metric analysis (CMT++) and unit testing (CTB) for C/ C++ source code. CT C++ is a coverage tool supporting testing and tuning of programs written in C and C++ . This coverage-analyser supports coverage for function, decision, statement, condition and multi-condition presenting the result in a text or HTML report. The analyser is available for coverage measuring in the host, for operating systems as well as for embedded systems. The tool is integrated in Microsoft Visual C++ , the Borland compiler and WindRiver Tornado. CMT++ is a tool for assessing code complexity. Code complexity has effect on how difficult it is to test and maintain an application. Complex code is likely to contain errors. Metrics like McCabe Cyclomatic Complexity, Halsteads Software Metrics and Lineof-Code-Mare are supported by the tool. The tool can be customised by the user for company coding standards. CMT++ identifies complex and error-prone code. As there is usually too little time to inspect all the code carefully, it is an important step to select the most error-prone modules. CMT++ also gives an estimation of the number of test cases needed to test all paths of a function and gives you an idea of how many bugs you should find to have a “clean” code. CTB is a moduletesting-tool for C programming language that allows the testing of the code at a very early development stage having as a result the prevention of bugs. As soon as the module compiles, the test bed can be generated on it without any additional programming. The tool supports specification based (black-box) testing approach from “ad-hoc”-trial to systematic script-based regression-tests. Tests can run in an interactive mode with a C-like command interface as well as script- or file-based and made automatic. Test based execution is as if the test driver would read the test main program and immediately execute it command-by-command showing what happens. CTB works together with coverage analysis tools, such as CT C++ . 2.7.2 Cantata++

Cantata++ 28 is a commercial tool for unit and integration testing, coverage and static analysis. It tool is built on Eclipse v3.2 Open Source development platform including the C Development Tools (CDT). Unit and test integration capabilities of the environment support automated test script generation by parsing source code to derive parameter and data information with stubs and wrappers automatically generated into the test script. Stubs provide programmable dummy versions of external software while wrappers are used for establishing programmable interceptions to the real external software. The building and the execution of tests, black and white
27 28

http://www.verifysoft.com http://www.ipl.com/products/tools/pt400.uk.php

Revision: final

47

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 14: Cantata++ V5 - a fully integrated Test Development and Analysis Environment box, is supported both by the tool and also via the developer’s build system. Verification of the code is also supported by providing sequential execution of test cases based on wrappers and stubs. The test cases defined in verification can be reused for inherited classes and template instantiation. Figures 14 and 15 present the environment of the tool. Coverage analysis provides measurement of how effective testing has been in executing the source code. Configurable coverage requirements are defined in rule sets that are integrated into dynamic tests resulting in Pass/Fail for coverage requirements. The coverage metrics used by the tool are the following: • Entry points • Call Returns • Statements • Basic Blocks • Decisions (branches) • Conditions MC/DC (for DO-178B) Cantata has certain features that support coverage especially for applications developed in Java such as reuse of JUnit tests with coverage by test case, and builds with ANT. Static analysis generates over 300 source code metrics. The results of these metrics are stored in reports that can used to help enforce code quality standards. The metrics defined are both procedural and product metrics. Procedural Revision: final 48

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 15: Automated Test Script, Stub and Wrapper generation metrics involve code lines, comments, functions and counts of code constructs. Product metrics calculate Myers, MOOSE, McCabe, MOOD, Halstead, QMOOD, Hansen, Robert Martin, McCabe, Object Oriented, Bansiya’s Class Entropy metrics. Cantata++ can be integrated with many development tools including debuggers, simulators/emulators, UML modelling, Project Management and Code execution profilers. 2.7.3 TAU/Logiscope

Logiscope 29 supports automated error-prone module detection and code reviews for bug detection. This is enabled by the use of quality metrics and coding rules to identify the modules that are most likely to contain bugs. Finally the tool provides direct connection to the faulty constructs and improvement recommendations. There is a set of predefined coding and naming rules or quality metrics, which can be customised to comply with specific types of project and organisational guidelines along with reuse industry standards. The main aspect of the tool is the establishment of best coding practices that are used both to test the existing code and to train developers. Logiscope supports three basic functions, RuleChecker, Audit and Testchecker. RuleChecker checks code against a set of programming rules, preventing language traps and code misunderstandings. There are over 220 coding and naming rules initially in the tool with potentials for other rules to be added. Logiscope Audit locates error-prone modules and produces quantitative information based on software metrics and graphs that is used for the analysis of problems and the rendering of corrective decisions. The decision may involve either the rewrite of the module or the more thorough testing. Software metrics templates used to evalu29

http://www.telelogic.com/products/logiscope/index.cfm

Revision: final

49

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 16: Results presented in Logiscope ate the code are ISO 9126 compliant. Templates as mentioned can be customised to fit project-specific requirements. Logiscope TestChecker measures structural code coverage and shows uncovered source code paths having as a result the discovery of bugs hidden in untested source code. TestChecker is based on a source code instrumentation technique that is adaptable to test environment constraints. Figure 16 shows the way results are depicted by Logiscope. Both the three functions of the tool are based on international recognised standards and models such as SEI/CMM, DO-178B and ISO/IEC 9126 and 9001. Several techniques that methodically track software quality for organisations at SEI/CMM Level 2 (repeatable) that want to reach Level 3 (defined) and above are supported. “Reviews and Analysis of the Source Code” and the “Structural Coverage Analysis” as required by the avionics standard, DO-178B, for software systems from Levels E to A are partially supported by Logiscope as well as “Quality Characteristics” as defined by ISO/IEC 9126. The Logiscope product line is available for both UNIX and Windows. 2.7.4 McCabe IQ

McCabe30 IQ manages software quality through advanced static analysis based on McCabe’s research in software quality measurement and tracks the system’s metric values over time to document the progress made in improving the overall stability and quality of the project. The tool identifies error-prone code by using several metrics: • McCabe Cyclomatic Complexity
30

http://www.mccabe.com/iq.htm

Revision: final

50

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 17: Battlemap in Mc Cabe IQ • McCabe Essential Complexity • Module Design Complexity • Integration Complexity • Lines of Code • Halstead By using the above metrics complex code is identified. Figure 17 shows an example of how complex code identification is presented to the user. The Battlemap uses colour coding to show which sections of code are simple (green), somewhat complex (yellow), and very complex (red). Figure 18 presents the metric statistics that the tool calculates. Another function supported is the tracking of redundant code by using a module comparison tool. This tool allows the selection of predefined search criteria or the establishment of new criteria for finding similar modules. After the selection of the search criteria the process is as follows: selection of the modules you used for matching, specification of programs or repositories that will be used for searching and finally localisation of the modules that are similar to the ones used for matching based on the search criteria selected. Then it is determined if there is any redundant code. If redundant code is found it is evaluated and if needed reengineered. The tool provides a series of data metrics. The parser analyses the data declarations and parameters in the code. The result of this analysis is the production of metrics based on data. There are two kinds of data-related metrics: global data and specified data. Global data refers to those data variables that are declared as global in the code. Based on the result of the parser’s data analysis reports are Revision: final 51

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 18: Presentation of the metrics statistics produced that show how global data variables are tied to the cyclomatic complexity of each module in code. As cyclomatic complexity and global data complexity increase, so does the likelihood that the code contains errors. Specified data refers to the data variables that are specified as what is called a specified data set in the data dictionary. In general, a data set is specified in the data dictionary one or more variables have to be located in the code in order to analyse their association with the complexity of the modules in which they appear. The tool includes a host of tools and reports for locating, tracking, and testing code containing specified data, as well as for enforcing naming conventions. The tool is platform independent and supports Ada, ASM86, C, C++ .NET, C++ , COBOL, FORTRAN, JAVA, JSP Perl, PL1, , VB, VB.NET 2.7.5 Rational Functional Tester (RFT)

Rational Functional Tester 31 is an automated functional and regression testing tool for Java, Visual Studio .NET and Web-based applications. It provides automated capabilities for activities such as data-driven testing and it includes pattern-matching capabilities for test script resiliency in the face of frequent application user interface changes. RFT incorporates support for version control to enable parallel development of test scripts and concurrent usage by geographically distributed teams. The tool includes several components. IBM Rational Functional Tester Extension for Siebel Test Automation provides automated functional and regression testing for Siebel 7.7 applications. Combining advanced test development techniques with the simplification and automation of basic test needs, Rational Functional Tester Exten31

http://www-306.ibm.com/software/awdtools/tester/functional/features/index.html

Revision: final

52

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

sion for Siebel Test Automation accelerates the process of system test creation, execution and analysis to ensure the early capture and repair of application errors. IBM Rational Manual Tester is a manual test authoring and execution tool for testers and business analysts. The tool enables test step reuse to reduce the impact of software change on manual test maintenance activities and supports data entry and verification during test execution to reduce human error. IBM Rational TestManager is a tool for managing all aspects of manual and automated testing from iteration to iteration. It is the central console for test activity management, execution and reporting supporting manual test approaches, various automated paradigms including unit testing, functional regression testing, and performance testing. Rational TestManager is meant to be accessed by all members of a project team, ensuring the high visibility of test coverage information, defect trends, and application readiness. IBM Rational Functional Tester Extension for Terminal-based Applications allows the testers to apply their expertise to the mainframe environment while continuing to use the same testing tool used for Java, VS.NET and Web applications. 2.7.6 Safire

SAFIRE 32 Professional is a fully integrated development and run-time environment optimised for the implementation, validation and observation of signalling systems. It is used for a wide range of applications, such as gateways, signalling testers and protocol analysers. The tool is based on international standards, such as UML, SDL, MSC, ASN.1 and TTCN (ITU-T, ETSI, ANSI, ISO). SAFIRE supports testing features for signalling systems that can be validated to various levels of confidence, from toplevel tests to detailed conformance tests according to international standards. The tests generated are automated, deterministic, reproducible and documented. The tool has a modular architecture that involves the following components: • SAFIRE Designer - graphical editor, viewer, compiler • SAFIRE Campaigner - test execution and report generator • SAFIRE Animator - slow motion replay (actions, events, behaviour) • SAFIRE Tracer - protocol analyser • SAFIRE Organiser - version control and project management • SAFIRE VM Virtual Machine - high performance virtual machine The component that is most involved in quality assurance is the Campaigner that supports automated execution of tests. This component creates, edits, manages and executes test campaigns allowing the configuration of parameters. Campaigner also
32

http://www.safire-team.com/products/index.htm

Revision: final

53

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

produces test report in the form of quality pass or fail modules. Also the tool allows automated repentance of certain tests. The quality rules that are used during the design and the testing of the code are the following: • System structure • Naming conventions-existence • Naming conventions-properties • SDL simplicity • Uniqueness • Modularity • Proper-functionality • Comments • Communication • Events • Behaviour 2.7.7 Metrics 4C

Metrics4C33 calculates software metrics for individual modules or for the entire project. These tools run interactively or in the background on a daily, weekly, or monthly basis. The software metrics calculated for an individual module include: • Lines of code • Number of embedded SQL lines • Number of blank lines • Number of comment lines • Total number of lines • Number of decision elements • Cyclomatic complexity • Fan out
33

http://www.plus-one.com

Revision: final

54

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

The above values are then summed to provide their respective project metrics. In addition, other project metrics calculated include: • Average project cyclomatic complexity • Project fan out metric (with and without leaf nodes) • Total number of procedures and functions • Total number of source code and header files • Lines of code in source code and header files • Total number of source code files unit tested • Number of embedded SQL statements • Lines of code unit tested • Percent of files unit tested • Integration Test Percentage The Integration Test Percentage (ITP) provides a numeric value indicating how much of the project’s source code has been tested and can be used to better prepare for Formal Qualification Testing (FQT). Output from Metrics4C can easily be imported into a spreadsheet program to graphically display the data. Metrics4C can also flag warnings if the lines of code or the cyclomatic complexity value exceeds a specified maximum. 2.7.8 Resource Standard Metrics

Resource Standard Metrics 34 for C/ C++ and Java in any operating system generates source code metrics. Source code quality metrics and complexity are measured by this tool from the written source code having as a target to evaluate the projects performance. Source code metric differentials can be determined between baselines using RSM code differential work files. Source code metrics (SLOC, KSLOC, LLOC) from this tool can provide line of code derived function point metrics. RSM is compliant with ISO9001, CMMI and AS9100. Typical functionality of RSM enables: • The determination of source code LOC, SLOC, KSLOC for C, C++ and Java • Measurement of software metrics for each baseline and determine metrics differentials between baselines
34

http://msquaredtechnologies.com/

Revision: final

55

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Capturing baseline code metrics independent of metrics differentials in order to preserve history. • Report of CMMI, ISO metrics for code compliance audit • Performance of source code static analysis, best used for code peer reviews • Remove of tabs, conversion from DOS to UNIX format. • Measurement and analysis of source code for outsourced or subcontracted code. • Measurement of cyclomatic code complexity and analysis of interface complexity for maintenance. • Creation of user defined code quality notices with regular expressions or utilisation of the 40 predefined code quality notices. 2.7.9 Discussion

Most of the testing and product metrics tools provide the online capability to record defect information including severity, class, origin, phase of detection, and phase introduced. Several tools automate the testing procedure by providing estimation of error prone code and automatically generating results and reports. Metrics tools provide a variety of metrics reports or transport data into spreadsheets or report generators. Query and search capabilities are also provided. Users have the capability to customise tools to meet their organisation’s unique requirements. For example, users can customise quality rules, workflow, queries, reports, and access controls. Other common features of the tools studied include: • Graphical user interface. • Integration to databases, spreadsheets, version control tools, configuration management systems, test tools, and E-mail systems. • Support for ad hoc queries and reports. • Support for standards, i.e., CMMI, DoD-STD-2167A and ISO 9000. • Support for distributed development. • Ability to link defects and track duplicate defect reports. Metrics capabilities of tools in most cases involve: • Data gathering. • Measurement analysis. • Data reporting. Revision: final 56

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

3
3.1

Empirical OSS Studies
Evolutionary Studies
Historical Perspectives

3.1.1

Back in 1971, in his book titled “The Psychology of Computer Programming,” Gerald M. Weinberg was probably the first who analysed the so-called “egoless programming,” meaning non-selfish, altruistic programming. This term was used in order to describe the function of a software development environment in which volunteers participate actively by discovering and fixing bugs, contributing new code, expressing ideas etc. These activities are without any direct material reward. Weinberg subsequently observed that when developers are not territorial about their code and encourage other people to look for bugs and potential improvements, then improvement happens much faster [Wei71]. Several years later, Frederick P Brooks, in his classic “The Mythical Man-Month: . Essays on Software Engineering,” predicted that OSS developers will play a significant role in software engineering in the future. In addition, he claimed that maintaining a widely used program is typically 40% or more of the cost of developing it. This cost is strongly affected by the number of users or developers of the specific project. As more people will find more bugs and other flaws, the overall cost of the software will be reduced. Brooks concluded [Bro75] that, this is why OSS can be competent and sometimes even better than conventionally-built software. In his influential article, “The Cathedral and the Bazaar,” Eric Steven Raymond, gathered and presented the main features of OSS development. Starting with the analysis of his own OSS project, Fetchmail, he distinguished the classical “Cathedrallike” way of developing a commercial software from the new, “Bazaar-like” world of Linux and other FOSS projects. Eventually, he came up with a series of lessons to be learned, which can very well serve as principles that make a FOSS project successful [Ray99]. According to OSS History written by Peter H. Salus [Sal], there are indications that OSS development has its roots in the 1980’s or even earlier. But Raymond’s article was actually the first attempt for a systematic approach to OSS and its methods. His work though has met a lot of opposition, both in the FOSS community [DOS99] and the academic circles [Beza, Bezb], as being too simplistic and shallow. No matter how controversial Raymond’s article is, its main contribution is that it raised a widespread interest in OSS empirical studies. Since the dawn of the new millennium, a satisfactory number of research essays on this subject have been published. Some findings of these essays are described below, in order to let us gain a deeper understanding on the evolution of several famous OSS projects.

Revision: final

57

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

3.1.2

Linux

The Linux operating system kernel is the best-known FOSS project worldwide, therefore it’s a case worth of a closer study. The Linux project started in 1991 as a private research project by a 22-years-old Finnish student named Linus Torvalds. Being dissatisfied with the existing operating systems, he started programming a kernel himself, based on code and ideas from Minix, a tiny Unix-like operating system. Linux’s first official release version 1.0 occurred in March 1994. Today, Linux is one of the dominant computer operating systems, enjoying worldwide acceptance. It is a large system: it contains over four millions lines of code and it releases new versions very often. It has occupied hundreds of developers, who have willfully dedicated a lot of their time to fix bugs, develop new code and report their ideas for its evolution. According to Wikipedia’s relative article, it is estimated that Linus Torvalds himself has contributed only about 2 per cent of Linux’s code, but he remains the ultimate authority on what new code is incorporated into the Linux kernel. Such a case is definitely a fine example of how a FOSS community can work successfully by gathering the powers of a large, geographically distributed community of software specialists. The growth of the Linux Operating System began by following two parallel paths: the stable and the development releases. The stable release contains features that have been already tested, showing a proven stability, ease of use and lack of bugs. The development release contains more features that are still in an experimental phase, therefore it lacks stability and it contains more bugs. As one would expect, there are more development releases than stable ones. Also, the features of development releases that have been adequately tested are incorporated in the next stable release. This development concept has played a big part in the project’s success, as it provides conventional users with a reliable operating system (the stable release) and at the same time giving software developers the freedom to experiment and try new features (the development release). Following Raymond’s analysis on development method of the Linux operating system, Godfrey and Tu presented a research of Linux’s evolution over the years from 1994 till 1999 [GT00]. As they say, most might think that as Linux got bigger and more complex, its growing pace should slow down. This is also what the well-known Lehman’s laws of software evolution suggest: “as systems grow in size and complexity, it becomes increasingly difficult to insert new code” [LRW+ 97]. In the same context, Turski analysed several large software systems - that were all created and maintained by small, predefined teams of developers using traditional management techniques. From his study, Turski posits that system growth is usually sub-linear. That is, a software system slows down as the system grows in volume and complexity [Tur96]. Also Parnas referred to this subject, by comparing software aging with human aging [Par94]. But the findings of Godfrey and Tu after studying the evolution of Linux, indicated Revision: final 58

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 19: Growth of the compressed tar file for the full Linux kernel source release, ([GT00], p.135). a different trend. The methodology that they employed was to examine Linux both at the overall system level and at each one of the major subsystems. In this way, they were able to study not just the whole system’s evolution of size, but each major subsystem’s volume as well. This concept can provide us with more information, as it is not obligatory that each and every subsystem follows the same evolution patterns with the overall system. A sample of 96 kernel versions was selected, including 34 stable releases and 62 development releases. Two main metrics were used in this research: the size of tar files and the number of the lines of code (LOC). A tar file includes all the source artifacts of the kernel, such as documentation, scripts and other, but no binary files. LOC were counted in two ways: with the Unix command wc -l (that included blank lines and comments) and with an awk script (that ignored blank lines and comments). Regarding the overall system’s growth, the results of this research show that the development releases grew at a super-linear rate over time, while the stable releases grew at a much slower rate (Figures 19 and 20). These tendencies are common for both metrics that were used. It is therefore clear that Linux’s development releases follow an evolution pattern that differs from the Lehman’s laws of software evolution. We can support the view that this happens due to the way development releases are built: they attract capable developers that are willing to contribute to the system’s growth. As the project’s popularity rises, more developers are attracted to it and more code is contributed. The stable releases, that follow a more conservative development path and don’t accept new contributions too easily, show a slower rate of size growth. As for the growth of major subsystems, Godfrey and Tu selected 10 of these Revision: final 59

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 20: Growth in the number of lines of code measured using two methods: the Unix command wc -l, and an awk script that removes comments and blank lines, ([GT00], p.135). subsystems: • drivers: contains the drivers for various hardware devices • arch: contains the kernel code that is specific to particular hardware architectures/CPU’s • include: contains most of the system’s include (header) source files • net: contains the main networking code • fs: contains support for various kinds of file systems • init: contains the initialisation code for the kernel • ipc: contains the code for inter-process communications • kernel: contains the main kernel code that is architecture independent • lib: contains the library code • mm: contains the memory management code Figure 21 shows the evolution of each one of these subsystems in terms of LOC. We notice that drivers subsystem is both the biggest subsystem and the one with the fastest growth. In Figure 22, a comparative analysis of each subsystem’s LOC versus the overall system’s LOC is presented. We can see that drivers occupy more than 60 Revision: final 60

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 21: Growth of SLOC of the major subsystems of Linux (development releases),([GT00], p.138). per cent of the total system’s size and this percentage is continuously growing. This fact can be explained as a result of Linux’s rising awareness: more users wish to run it with many different types of devices, therefore the respective drivers have to be included to the system. A recent observation of Linux’s evolution was published by [Rob05]. He employed a methodology similar to this of Godfrey and Tu, but examined all the available releases of Linux (both stable and development) till December 2004, instead of picking a sample. The metric that was used in this research was the SLOCCount tool, which counts source lines of code written in identified source code files. The kernel had grew a lot in comparison to the previous survey: the number of SLOC and the size of tar file were more than double. This trend is visible in Figures 23 and 24: the super-linearity of Linux’s evolution is even more remarkable during the last years. Like Godfrey and Tu, Robles also examined the evolution of Linux’s major subsystems, as we can see in Figures 25 and 26. The results were similar, as drivers is still the biggest subsystem, though its share of the total Linux kernel has decreased, mainly due to the removal of sound subsystem in early 2002. All in all, we conclude that the OSS communities’ power can push a project to super-linear growth, in contrast to the typical software evolution rules. Voluntary participation in a software’s development ensures that the participants are really interested on it both as developers and as users. In this case, software isn’t treated merely as a commercial product, but as a means of improving people’s lives. Linux is a very good example of such a case.

Revision: final

61

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 22: Percentage of SLOC for each major subsystem of Linux (development releases), ([GT00], p.138).

Figure 23: Growth of SLOC of Linux for all the stable and development releases, ([Rob05], p.89).

Revision: final

62

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 24: Growth of the tar file (right) and the number of files (left) for the full Linux kernel source release, ([Rob05], p.90).

Figure 25: Growth of SLOC of the major subsystems of Linux (development releases), ([Rob05], p.91).

Revision: final

63

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 26: Percentage of SLOC for each major subsystem of Linux (development releases), ([Rob05], p.93). 3.1.3 Apache

Another famous OSS project is the Apache web server. It began in early 1995 by Rob McCool, a software developer and architect who was 22 years old at that time. Apache was initially an effort to coordinate the improvement of NCSA (National Center for Supercomputing Applications) HTTPd program, by creating patches and adding new features. Actually this was the initial explanation of the project’s name: it was “a patchy” server. Later though, the project’s official website claimed that Apache name was given as a sign of respect to the native American tribe of Apache. Apache quickly attracted the attention of an initial core team of developers, who formed the “Apache Group,” and it was first launched in early 1996, as Apache HTTP version 1.0. That time, it was actually the only workable Open Source alternative to the Netscape web server. Since April 1996, it has reportedly been the most popular HTTP server on the internet, as it hosts over half of all websites globally. One of the most comprehensive research on Apache server was conducted by Audris Mockus, Roy T. Fielding and James Herbsleb in 2002 [MFH02]. In this research, they discuss about the way the Apache development occurred and they present some quantitative results of Apache’s development evolution. The following information is based on this article. As we mentioned earlier, the “Apache Group” was formed at the initial stage of the project and it was charged with the project’s coordination. It was an informal organisation of people, consisted entirely of volunteers, who all had other full-time jobs. Therefore they decided to employ a decentralised, scattered development concept, that supported asynchronous communication. This was achieved through the

Revision: final

64

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 27: The cumulative distribution of contributions to the code base, ([MFH02], p.321) use of e-mailing lists, newsgroups and the problem reporting system (BUGDB). Every developer may take part in the project, submit his contributions and then the “Apache Group” decides on the inclusion of any code change. Apache core developers are free to choose the project’s area that most attracts them and leave it when they are no more interested in it. Mockus, Fielding and Herbsleb studied several aspects of Apache’s development. Firstly, they examined the participation of the project’s development community, which counts almost 400 individuals, in the two main parts of the software’s development: code generation and bug fixes. In Figure 27, we can see the cumulative proportion of code changes (on the vertical axis) versus the top N contributors to the code base (on the horizontal axis), which are ordered by the number of Modification Requests (MRs) from largest to smallest. Code contribution is measured by 4 factors: MRs, Delta, Lines Added and Lines Deleted. The Figure shows that the top 15 developers contributed more than 83 per cent of MRs and deltas, 88 per cent of lines added, and 91 per cent of deleted lines. Similarly, Figure 28 shows the cumulative proportion of bug fixes (vertical axis) versus the top N contributors to bug fixing. This time, the core of 15 developers produced only 66 per cent of the fixes. These two figures show that the participation of a wide development community is more important in defect repair than in new code submission. We notice that, despite the broad overall participation in the project, almost all new functionalities are created by the core developers. A broad developers’ community though, is essential for bug fixing. Mockus, Fielding and Herbsleb made a comparative analysis of these findings to several commercial projects’ data. This study’s outcome was that in commercial projects, core developers’ contribution in the project’s evolution Revision: final 65

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 28: The cumulative distribution of fixes, ([MFH02], p.322) was significantly lower than in Apache. As an attempt to interpret these findings, we can argue that Apache core developers seem to be very productive compared to commercial software’s developers. This conclusion is strengthened given the fact that participation in Apache’s development is a voluntary, part-time activity. 3.1.4 Mozilla

Mockus, Fielding and Herbsleb [MFH02] present an analysis of another OSS project, the Mozilla web browser. Mozilla was initially created as a commercial project by Netscape Corporation, which (in January 1998) decided to distribute its communicator free of charge, and give free access to the source code as well - therefore turning it into a OSS project. Netscape was actually so impressed by Linux’s evolution, that they were attracted by the idea of developing an Open Source web browser. The project’s management was assigned to the “Mozilla Organisation,” now named “Mozilla Foundation.” Nowadays, the foundation coordinates and maintains the Mozilla Firefox browser and the Mozilla Thunderbird e-mail application, among others. Mockus, Fielding and Herbsleb investigate the size of Mozilla’s development community. By examining the project’s repository, they found 486 code contributors and 412 bug fixes contributors. In Figure 29, we can see the project’s external participation over time. The vertical axis represents the fraction of external developers and the horizontal axis represents time. It is clear that participation gradually increases over time, as a result of widespread interest and improved documentation. As an example, it is mentioned that 95 per cent of the people who created problem reports were external, and they committed 53 per cent of the total number of prob-

Revision: final

66

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 29: Trends of external participation in Mozilla project, ([MFH02], p.335) lem reports. Figure 30 shows the cumulative distribution of code contribution for seven Mozilla modules. In this case, the developer contribution does not seem to vary as much as in Apache project. Mozilla represents a way in which commercial and Open Source development approaches could be combined. The interdependence among Mozilla modules is high and the effort dedicated in code inspections is high. Therefore, Mozilla’s core teams are bigger than in Apache, employing more formal means of coordinating the project. But the fact is that, despite its commercial development roots, Mozilla managed to leverage the OSS community, achieve high participation and result in a high-quality product. 3.1.5 GNOME

GNOME is also one of the biggest and most famous OSS projects. It is a desktop environment for Unix systems and its name was formed as an acronym of the words “GNU Network Object Model Environment.” In 2004, Daniel M. German published a research of GNOME, in order to examine how global software development can lead to success [Ger04b]. The discussion below is based on that article. The GNOME project was started by Miguel de Icaza, a Mexican software programmer. Its first version was released in 1997 and contained one simple application and a set of libraries. Today, GNOME has turned into a large project, with more than two millions of LOC and hundreds of developers worldwide. In 2000, the GNOME Foundation (similar to Apache’s Software Foundation) was established. It is composed of four entities: the Board of Directors, the Advisory Board, the Executive Director and the members. Many of the participants in the Board of Directors are

Revision: final

67

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 30: The cumulative distribution of contributions to the code base for seven Mozilla modules, ([MFH02], p.336 fully employed in private companies. The Advisory Board is composed of corporate and non-profit organisations. Membership can be granted to any of the current contributors to the project, which can be non-programmers as well. By October 2003, the Foundation counted 320 members. The GNOME Foundation is also responsible to organise sub-committees that will run some of the project’s administrative tasks, like the Foundation membership Committee, the Fund-raising Committee, the Sysadmin Committee, the Release Team etc. German reaches several interesting conclusions by examining the contributions and the overall project’s evolution. First of all, an important factor of GNOME’s success is the wide participation in decision-making process. Developers are treated as equal partners of the project and are inspired by its goals, which explains their motivation to work. Secondly, an essential feature of GNOME is the use of multiple types of communication, like mailing lists, IRC and reports on the project’s current state of development. There are scheduled meetings about GNOME’s evolution, that boost collaboration and team-spirit between contributors. Moreover, the creation of task forces makes their members accountable and committed towards GNOME’s improvement. Finally, there are clear procedures and policies for conflict management, as well as a strong culture of creating documentation, so that contributors know what others are working on. 3.1.6 FreeBSD

FreeBSD is an open-source operating system that is derived from BSD, the version of Unix that has been developed by the University of California. The project started in

Revision: final

68

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 31: FreeBSD stable release growth by release number, ([IB06], p.207) 1993 and its current (end of 2006) stable version is 6.1. It is run by the FreeBSD developers that have commit access to the project’s CVS. As it is considered a successful OSS project, it has attracted scientific interest over its evolutionary process. The most recent publication on FreeBSD’s evolution has been committed by Clemente Izurieta and James Bieman [IB06]. Based on an earlier study of Trung Dinh-Trong and James Bieman [DTB04] that praised the system’s organizational structure, Izurieta and Bieman focused on examining the growth rate of FreeBSD stable releases since its inception, by employing metrics such as LOCs, number of directories, total size in Kbytes, average and median LOC for header (dot-h) and source (dot-c) files, and number of modules for each sub-system and for the system as a whole. This study indicates that FreeBSD follows a linear (and sometimes sub-linear) rate of growth, as it is demonstrated in figures 31 to 35. We observe that dot-c and dot-h files (figure 34) show a very slight growth in size, which is due to the fact that the system does not evolve in an uncontrolled manner, as Izurieta and Bieman explain. It also has to be clarified that in figure 35, contrib subsystem contains software contributed by users, and sys subsystem is the system’s kernel. As one could expect, sys is smaller in size and grows in a slower pace than contrib, because its content goes through a stricter validation process before its inclusion in the system. 3.1.7 Other Studies

During the last years, some horizontal studies of OSS projects have been published, in which several projects are examined collectively. Such an example is an article by Andrea Capiluppi, Patricia Lago and Maurizio Morisio [CLM04], in which they pick up 12 projects from the Freshmeat Open Source portal. These projects were

Revision: final

69

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 32: FreeBSD cumulative growth rate, ([IB06], p.207)

Figure 33: FreeBSD release sizes by development branch, ([IB06], p.208)

Revision: final

70

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 34: FreeBSD average and median values of dot-c and dot-h files, ([IB06], p.209)

Figure 35: FreeBSD contrib and sys sub-systems, ([IB06], p.210)

Revision: final

71

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

all “alive,” meaning that they had shown significant growth over time and there were still developers working on them by the date of the research. Actually, the authors report that during their research on Fresh Meat portal, they discovered that a significant percentage of the hundreds of accessible OSS projects were not evolving anymore, having no developers and no growth for a considerable amount of time. The authors concluded that that mortality of OSS projects is quite high. After an initial observation of the sample, they clustered the 12 projects into three categories: large, medium and small projects, as follows: Large projects: Mutt, ARLA Medium projects: GNUPARTED, Weasel, Disc-cover, XAutolock, Motion, Bubblemon Small projects: Dailystrips, Calamaris, Edna, Rblcheck The authors analysed some basic attributes of these projects, such as size, modules and number of developers. According to their findings, all projects had grown at a linear rate over time, both in terms of size and in terms of the number of developers. Some periodic fluctuations of the code’s size were noticed, mainly caused by internal redesigns of the software, but the long-term view has been upward in all cases. In large and medium projects, the core teams had grown as well, but in a limited way, which suggests that there is always a ceiling in the core project teams’ expansion. The same patterns of linear or sub-linear growth have been discovered for the number of modules, too. In a later study, Andrea Capiluppi, Maurizio Morisio and Juan Ramil proceeded to a further examination of the ARLA project, reaching similar conclusions [CMR04]. Finally, another interesting research has been carried out by James W. Paulson, Giancarlo Succi and Armin Eberlein [PSE04]. In order to test the effectiveness of OSS development process, they investigated the evolutionary patterns of three major OSS projects (Linux, GCC and Apache) in comparison to three closed-source software projects, the names of which were kept confidential. According to their findings, OSS development structure fosters creativity and constructive communication among the developers more effectively than traditional ways of software development, because the new functions and features that were added to OSS projects were bigger in number and in volume than the ones added to closed-source software projects. In addition, OSS projects perform faster fixing of bugs and other defects, because of the greater number of developers and testers that contribute to them. However, the evidence presented in this research does not support the arguments that OSS systems are more modular and grow faster than closed-source competitors. 3.1.8 Simulation of the temporal evolution of OSS projects

A generic structure for F/OSS simulation modeling The authors in [ASAB02] and later on in [ASS+ 05] described a general framework for F/OSS dynamical simulation models and the extra difficulties that have to be confronted relative to analogous models of the closed-source process. It is actually Revision: final 72

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

a framework for discrete-event simulation models which the authors presented as follows: 1. Much unlike closed source projects, in F/OSS projects, the number of contributors greatly varies in time and is based on the interest that the specific F/OSS project attracts. It cannot be directly controlled and cannot be predetermined by project coordinators. Therefore, an F/OSS model should a) contain an explicit mechanism for determining the flow of new contributors as a function of time and b) relate this mechanism to specific project-dependent factors that affect the overall “interest” in the project. 2. In any F/OSS project, any particular task at any particular moment in time can be performed either by a new contributor or an old one. In addition, almost all F/OSS projects have a dedicated team of “core” programmers that perform most of the contributions, while their interest in the project stays approximately the same. Therefore, the F/OSS simulation model must contain a mechanism that determines the number of contributions that will be undertaken per category of contributors (e.g. new, old or core contributors) at each time interval. 3. In F/OSS projects, there is also no direct central control over the number of contributions per task type or per project module. Anyone may choose any task (eg. code writing, defect correction, etc) and any project module to work on. The allocation of contributions per task type and per project module depend on the following sets of factors: (a) Programmer profile (eg. some programmers may prefer code testing to defect correcting). These factors can be further categorized as follows: i. constant in time (eg. the preference of a programmer in code-writing) and ii. variable with time (eg. the interest of a programmer to contribute to any task or module may vary based on frequency of past contributions). (b) Project-specific factors (eg. a contributor may wish to write code for a specific module, but there may be nothing interesting left to write for that module). Therefore, the F/OSS model should (a) identify and parameterise the dependence of programmer interest to contribute to a specific task/module on (i) programmer profile, (ii) project evolution and (b) contain a quantitative mechanism to allocate contributions per task type and per project module.

Revision: final

73

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

4. In F/OSS projects, because there is no strict plan or task assignment mechanism, the total number of Lines of Code (LOC) written by each contributor varies significantly per contributor and per time period, again in an uncontrolled manner. Therefore, project outputs such as LOC added, number of defects or number of reported defects are expected to have much larger statistical variance than in closed source projects. The F/OSS simulation model should determine delivered results of particular contributions in a stochastic manner, i.e. drawing from probability distributions. This is a similar practice to what is used in closed source simulation models, with the difference being that probability distributions here are expected to have a much larger variance. 5. In F/OSS projects there is no specific time plan for project deliverables. Therefore, the number of calendar days for the completion of a task varies greatly. Also, delivery times should depend on project specific factors such as the amount of work needed to complete the task. Therefore, task delivery times should be determined in a stochastic manner on the one hand, while average delivery times should follow certain deterministic rules. The authors concluded that the core of any F/OSS simulation model should be based upon a specific behavioural model that must be properly quantified in order to model the behaviour of project contributors in deciding a) whether to contribute to the project or not, b) which task to perform, c) which module to contribute to and d) how often to contribute. The behavioural model should then define the way that the above four aspects depend on a) programmer profile and b) project-specific factors. The formulation of a behavioural model must be based on a set of qualitative rules. Fortunately, previous case studies have already pinpointed such rules either by questioning a large sample of F/OSS contributors or by analysing publicly available data in F/OSS project repositories. As previous case studies identified many common features across several F/OSS project types, one certainly can devise a behavioural model general enough to describe at least a large class of F/OSS projects. Selecting a suitable equation that describes a specific qualitative rule is largely an arbitrary task in the beginning, however a particular choice may be subsequently justified by the model’s demonstrated ability to fit actual results. Once the behavioural model equations and intrinsic parameters are validated, then the model may be applied to other F/OSS projects. Application of an F/OSS simulation model General procedure Figure 36 shows the structure of a generic F/OSS dynamic simulation model. As in any simulation model of a dynamical system, the user must specify on input a) values to project-specific time-constant parameters, b) initial conditions for the project dynamic variables. These values are not precisely known from project start. One Revision: final 74

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

INPUT (Project-Specific)
Probability distribution parameters Behavioral model projectspecific parameters Initial conditions of dynamic variables

BEHAVIOURAL MODEL
PROBABILITY DISTRIBUTIONS

Behavioural model fixed parameters

Task Delivery times

Task Deliverables

OSS SIMULATION

OUTPUT TIME EVOLUTION OF DYNAMIC VARIABLES

Figure 36: Structure of a generic F/OSS dynamic simulation model. Figure was reproduced from [ASS+ 05]. may attempt to provide rough estimates for these values based on results of other (similar) real-world F/OSS projects. However, these values may be readjusted in the course of the evolution of the simulated project as real data becomes available. If the simulation does not get more accurate in predicting the future evolution of the project, by applying this continuous re-adjustment of parameters, it means that a) either some of the behavioural model qualitative rules are based on wrong assumptions for the specific type of project studied or, b) the values of behavioural model which are project-independent must be re-adjusted. Calibration of the model The adjustment of behavioural model intrinsic parameters is the calibration procedure of the model. According to this procedure, one may introduce arbitrary values to these parameters as reasonable ’initial guesses’. Then, one would run the simulation model, re-adjusting parameter values until simulation results satisfactorily fit the results of a real-world F/OSS project in each time-window of project evolution. More than one similar type F/OSS projects may be used in the calibration process. Validation of the model Once the project-independent parameters of the behavioural model are properly calibrated, the model may be used to simulate other F/OSS projects. Practical use of F/OSS simulation models

Revision: final

75

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Prediction of F/OSS project evolution. Project coordinators may obtain a picture of plausible evolution scenarios of the project they are about to initiate. Software users may also be interested in such prediction, as it would indicate when the software will most likely be available for use. This also applies to organizations, especially if they are interested to pursue a specific business model that is based on this software. • F/OSS project risk management. F/OSS projects are risky, in the sense that many, not easily anticipated, factors may affect negatively their evolution. Simulation models may help in quantifying the impact of such factors, taking into account their probability of occurrence and the effect they may have, in case they occur. • What-if analysis. F/OSS coordinators may try different development processes, coordination schemes (e.g. core programming team), tool usage, etc. to identify the best possible approach for initiating and managing their project. • F/OSS process evaluation. The nature of F/OSS guarantees that in the future we will observe new types of project organisation and evolution patterns. Researchers may be particularly interested in understanding the dynamics of F/OSS development and simulation models may provide a suitable tool for that purpose. Simulation studies and results Based on the general framework described earlier, the authors of [ASAB02] presented a formal mathematical model based on findings of F/OSS case studies. The simulation model was applied to the Apache project simulation outputs were compared to real data. The model was further refined in [ASS+ 05] and similarly applied to the gtk+ module of the GNOME project. Simulation outputs included the temporal evolution of LOC, active programmers, residual defect density, number of reported defects etc. Figures 37 and 38 compare simulation results and real data for LOC vs. time for the Apache project and gtk+ in respect. Conclusions In conclusion, the authors in both [ASAB02] and [ASS+ 05] claimed that existing case studies do not contain the complete set of data necessary for a full-scale calibration and validation of their simulation model. Despite this fact, qualitatively, the simulation results demonstrated the super-linear project growth at the initial stages, the saturation of project growth at later stages where a project reached a level of functional completion (Apache) and the effective defect correction, facts that agree with known studies.

Revision: final

76

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 37: Simulation results for the Apache project: Cummulative LOC difference vs. time for the Apache project. The bold line is the average of the 100 runs. The gray lines are one standard deviation above and below the average. The dashed vertical line shows the end of the time period for which data was collected in the Apache case study [MFH02]. Figure was reproduced from [ASS+ 05] .

Figure 38: LOC evolution in gtk+ module of GNOME project: Cumulative LOC difference vs. time. The bold line is the expectation (average) value of LOC evolution. The gray lines are one standard deviation above and below the average. The dashed vertical line shows approximately the end of the time period for which data was collected in the GNOME case study. Figure was reproduced from [ASS+ 05].

Revision: final

77

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

One of the most evident intrinsic limitations of the F/OSS simulation models, the authors claimed, comes from the very large variances of the probability distributions used. On output, this leads to large variances in the evolution of key project variables, a fact that naturally limits the predictive power of the model. Finally, the authors concluded that despite the aforementioned intrinsic and extrinsic limitations, their “first attempt” simulation runs, fairly demonstrated the model’s ability to capture reported qualitative and quantitative features of F/OSS evolution.

3.2

Code Quality Studies

There are no studies, regarding the code quality of Open Source software. Many early studies focus on evolutionary aspects of Open Source software and study evolution laws of Open Source software development. It is not until recently that Open Source code quality studies appeared in highly ranked journals (not white papers by consulting firms or subjective articles, but reviewed research), resulting the small number of available Open Source code quality studies. One of the first studies that examined the code quality in Open Source software was conducted by Stamelos et al [SAOB02]. In this study authors tried to measure the modularity and the structural quality of the code of 100 Open Source applications and tried to correlate the size of the application components with the user satisfaction. The measurement of the applications was conducted by a commercial tool (Telelogic Logiscope) and the quality was assessed against a quality standard very similar with that of ISO/IEC 9126. The standard was proposed by the tool itself and, as authors indicate, is used by more than 70 multinational companies in various areas. The model facilitated metrics that are a mixture of size metrics, structural metrics and complexity metrics, which can be found in this document in the metrics section. The paper also grounds its findings on statistical foundations. The tool measured each module of all applications and evaluates it against the built in model. For each criterion the tool outputs a recommendation level, namely ACCEPT, COMMENT, INSPECT, TEST and REWRITE. The result of the measurement is depicted in Table 1. As authors notice, the table shows that the mean value of the acceptable components is about 50%, a value that neither good nor bad, and it can be interpreted both ways. It suggests that either the code quality of the Open Source applications is higher than someone could expect, taken into account the nature of Open Source software development and the time of the study, or the quality is lower than the industrial code standard implied by the tool. Regarding the second part of their study, component size or metrics and user satisfaction, authors did not find any relationship between the majority of the metrics considered and user satisfaction. However they detected indication of relationship between the size of its component and user satisfaction (or else “external quality”

Revision: final

78

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Table 1: Modules percentage in its recommendation level, as studied by Stamelos et al. % ACCEPTED COMMENT INSPECT TEST REWRITE UNDEFINED Minimum 0 0 0 0 0 0 Maximum 100 66.66 50 16 100 7.69 Mean 50.18 30.95 8.55 4.31 5.57 0.42 SD 18.65 14.09 8.5 4.14 10.73 1.29 Median 48.37 31.83 7.65 3.55 3.2 0

of a project). The two size metrics that relate with satisfaction are the “Number of statements” and “Program Length”. This relation is negative, i.e. the bigger a component is, the worse the performance of its “external quality”. The authors at the end suggest that Open Source performs no worse of a standard implied by an industrial tool and they emphasise the need for more empirical studies in order to clarify Open Source quality performance. The authors suggest (in 2002) that in an Open Source project programmers should follow a programming standard and have a quality assurance plan, leading to a high quality code. This suggestion has been recently adopted by large Open Source projects like KDE35 . Another study from the same group that assesses the maintainability of open source software is that of Samoladas et al [SSAO04]. In this paper, authors studied the maintainability of five Open Source software projects and one closed source, a comparison that is not frequent in Open Source literature. The measurement was conducted in successive versions, allowing the study of the evolution of maintainability and how it behaves over time. The maintainability was measured using the Maintainability Index described in section 1.3.3 and the measurement was done with the help of a metrics package found in the Debian r3.0 distribution, which contains tools from Chris Lott’s page, and a set of Perl scripts to coordinate the whole process. The projects under study had certain characteristics: Two of them were pure open source projects (initiated as Open Source and continue to evolve as such), the other is an academia project that gave birth to an Open Source project, the fourth is a closed source project that opened its code and continued as open source, the fifth was an Open Source project that was forked to a commercial one, while itself continued as Open Source and the last one is the latter closed source, which code is available with a commercial, non-modifiable licence. The result of the study was that in all cases the maintainability of all projects deteriorates over time. When they compared the evolution of the maintainability of the closed source one versus its
35

http://www.englishbreakfastnetwork.org/

Revision: final

79

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 39: Maintainability Index evolution for an Open Source project and its closed source “fork” (Samoladas et al.). counterpant, the closed source performs worse than the Open Source project. The authors conclude that open source code quality, as it is expressed by maintainability, suffers from the same problems that have been observed in closed source software studies. They also point the fact that further empirical studies are needed in order to produce safe results about Open Source code quality. Another study of maintainability of Open Source projects and particularly the maintainability of the Linux kernel, was conducted by Yu et al. [YSCO04]. Here, authors study the number of instances of common coupling between the 26 kernel modules and all the other non kernel modules. As coupling they mean the degree of interaction between the modules and thus the dependency between them. In this document, coupling was also explained in section 1.3.1. Additionally, for kernel based software, they also consider couplings between the kernel and non kernel modules. The reason they studied coupling as a measure for maintainability is that, as authors suggest and explain, common coupling is connected to fault proneness, thus maintainability. The specific study is a follow up to previous ones, which the same team has conducted. In these previous studies, they examined 400 successive versions of the Linux kernel and tried to find relations between the size, as it is expressed by the lines of code, and the number of instances of coupling. Their findings showed that the number of lines of code in each kernel module increases linearly with the version number, but that the number of instances of common coupling between kernel modules and all others shows an exponential growth. In this new study they perform an in depth analysis of the notion of coupling in the Linux kernel. In order to perform Revision: final 80

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 40: Maintainability Index evolution for three Open Source project (Samoladas et al.). their new study, authors first refined the definition of coupling and defined different expressions of it (e.g global variables inside the Linux kernel, global variables outside the kernel, etc), by separating coupling into five categories and characterise them as “safe” and “unsafe”. Then, they constructed an analysis technique and metric for evaluating coupling and applied it to analyse the maintainability of the Linux kernel. The application of this classification of coupling in the Linux 2.4.20 kernel, showed that for a total 99 global variables (common expression of coupling) there are 15.110 instances of them, of which 1.908 are characterised as “unsafe”. Along with the results from their previous study (the exponential growth of instances) they conclude that the maintainability of the Linux kernel will face serious problems in the long term. A more recent paper from the same group compares the maintainability, as it expressed by coupling, of the Linux kernel with that of the FreeBSD, OpenBSD and NetBSD [YSC+ 06]. They applied the similar analysis as in [YSCO04] and compared the performance of Linux against the BSD family (as statistical formal hypotheses). Results showed that there Linux contains considerable more instance of common coupling than the BSD family kernels, making it more difficult to maintain and fault proness to changes. Authors suggest that the big difference between Linux and the BSD family kernels indicates that it is possible to design a kernel without having a lot of global variables and, thus, the Linux kernel development team does not take into account maintainability so much. A more recent study is that of Güne¸ Koru and Jeff Tian [KT05]. Here the two s

Revision: final

81

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

authors try to correlate change proness and structural metrics, like size, coupling, cohesion and inheritance metrics. They suggest, based on previous studies, that change prone modules are also defect prone and these modules can be spotted by measuring their structural characteristics. In short, authors measured two -largeOpen Source projects, namely Mozilla and OpenOffice, by using a large set of structural measures which fit into the categories mentioned before. The measurement was done with the Columbus36 tool. In addition with the help of a custom made (by them) Perl scripts, they counted the differences for each application from its immediately preceding revision. As the smallest software unit they considered the class. This measurement involved 800 KLOC and 51 measures for Mozilla and 2.700 KLOC and 46 measures for OpenOffice. With the results obtained they questions whether high change high change modules were the same as modules with the highest measurement values considering each metric individually. They also tried to compare the results with an older similar study of their own, a study that was conducted for six large scale projects in industry (IBM and Nortel) In order to answer the questions they created appropriate statistical, formal hypothesis and tests. The results showed that there is strong evidence that modules, which had the most changes, did not have the highest measurement values, a fact that was true for the previous industrial study. Authors also performed a similar analysis, but with clustering techniques. The second analysis resulted in the same statement, but also it pointed out that the high change modules were not the modules with the highest measurement values but those with fairly high measurement values. The latter was the main outcome of the paper and, as authors indicate, the same is true for the six industrial applications. Authors, trying to explain this, suggest that this fact holds because expert programmers in Open Source take on the difficult tasks and novice ones the easier ones. This might result in modules with the highest structural measures, which solve complex tasks, not to be the most problematic ones. Of course as they suggest this needs further investigation and is a central issue in their future studies. A very intersting paper, although not directly an Open Source code quality study, is that of Gyimóthy, Ferenc and Siket [GFS05]. The study has as its main goal, the validation of the Object Oriented Metrics Suite of Chidamber and Kemerer ( CK suite - as described in section 1.3.1) with the help of open source software, not the assessment of the quality of an Open Source software per se. Particularly they validated the CK metrics suite with the help of a framework-metrics collection tool named Columbus, which was mentioned previously, on an Open Source project, Mozilla. In order to perform their analysis, except from using Columbus to extract the metrics, they also collected information about bugs in Mozilla from the bugzilla database, the system that Mozilla uses for bug reporting and tracking. The validation of the
36

http://www.frontendart.com

Revision: final

82

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 41: Changes in the mean value of CK metrics for 7 version of Mozilla (Gyimóthy et al.) metrics was done with statistical methods such as logistic and linear regression, but also with machine learning techniques, like decision trees and neural networks. The latter techniques were used to predict fault proneness of the code. The methodology followed can be summarized in: 1. Analysis and calculation of metrics from the Mozilla code 2. Application of the four techniques (logistic and linear regression, decision tress and neural networks) to predict the fault proneness of the code 3. Analysis of the changes in the fault-proneness of Mozilla through seven versions using the results. It (the methodology) is well described in the paper. As authors admit, the challenge of the whole process was to associate the bugs from the Bugzilla database with the classes found in the source code. This association was complicated and demanded a lot of iterative work, it is described in the paper. From the “pure” software engineering part of the study, the validation of the metrics and the models predictiveness, the most interesting results is that the CBO metric (Coupling Between Object classes) seems to be the best in predicting the fault-proneness of classes. Someone is easy to notice that again the notion of coupling is strongly related to bugs and, thus, to maintainability. This fact demands further investigation and it has to be in our project’s research agenda. Regarding the “Open Source” part of the study, authors observed a significant growth of the values of 5 out of 7 CK metrics (the seventh is LCOMN - Lower of Cohesion on Methods allowing Negative Value, a metric not included in the CK metrics suite). Authors assume that this happened because of big reorganization of the Mozilla source code with version 1.2, causing this growth. Of course this justification needed further investigation. Figure 11 the changes of metrics for the seven versions of the Mozilla suite. To conclude, we could say that, although this study does not directly assesses Revision: final 83

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Open Source, it is a very good example of applying empirical software engineering research.

3.3
3.3.1

F/OSS Community Studies in Mailing Lists
Introduction:

Free and Open Source Software (F/OSS) development not only exemplifies a viable software development approach, but it is also a model for the creation of selflearning and self-organising communities in which geographically distributed individuals contribute to build a particular software. The Bazaar model [Ray99], as opposed to the Cathedral model of developing F/OSS has produced a number of successful applications (eg. Linux, Apache, Mozilla, MySQL, etc). However, the initial phase of most F/OSS projects does not operated at the Bazaar level and only successful projects make the transition from Cathedral to Bazaar style of software development [Mar04]. Participants who are motivated by a combination of intrinsic and extrinsic motives congregate in projects to develop software on-line, relying on extensive peer collaboration. Some project participants often augment their knowledge on coding techniques by having access to a large code base. In many projects epistemic communities of volunteers provide support services [BR03], act as distributing agents and help newcomers or users. The F/OSS windfall is such that there is increased motivation to understand the nature of community participation in F/OSS projects. Substantial research on Open Source software projects focused on software repositories such as mailing lists to study developer communities with the ultimate aim to inform our understanding of core software development activities. Mundane project activities which are not explicit in most developer lists have also received attention [SSA06], [LK03a]. Many researchers focus on mailing lists in conjunction with other software repositories [KSL03], [Gho04], [LK03a], [HM05]. These studies provided great insight into the collaborative software development process that characterises F/OSS projects. F/OSS community studies in mailing lists are important because on one hand, one major technical infrastructure F/OSS projects require is mailing lists. On the other hand, F/OSS projects are symbiotic cognitive systems where ongoing interactions among project participants generate valuable software knowledge - a collection of shared and publicly reusable knowledge - that is worth archiving [SSA06]. One form of knowledge repository where archiving of public knowledge takes place is the project’s mailing list. 3.3.2 Mailing Lists

Lists are active and complex living repositories of public discussions among F/OSS participants on issues relating to project development and software use. They con-

Revision: final

84

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

tain ’software trails’-pieces left behind by the contributors of a software project and are very important in educating future developers [GM03b] and non-developers [SSA06] on the characteristics and evolution of the project and software. Generally, a project will host many lists, each addressing a specifically area of need. For example, software developers will consult developer lists, participants needing help on documentation will seek links from lists associated with project documentation, beginners or newbies will confer with mentors’ lists, etc. Fundamentally, two forms of activities are addressed in lists; • core activities typified by developing, debugging, and improving software. Developer mailing lists are usually the avenues for such activities • mundane activities [KSL03], [MFH02], [LK03a]. Documentation, testing, localisation, and field support exemplifies these activities and they take place predominately in non-developer lists [SSA06] However, expert software developers, project and package maintainers take part in mundane activities in non-developer mailing lists. They interact with participants and help answer questions others posted. Sometimes they encounter useful issues which help them to further plan and improve code or overall software quality and functionality. In addition, although mundane activities display a low level of innovativeness, they are fundamental for the adoption of F/OSS [BR03]. 3.3.3 Studying Community Participation in Mailing Lists: Research methodology

Compared to the traditional way of developing proprietary software, F/OSS development has provided researchers with an unprecedented abundance of easily accessible data for research and analysis. It is now possible for researchers to obtain large sets of data for analysis or to carry out what [Gho04] referred to as ’Internet archaeology’ in F/OSS development. However, [Con06] remarked that collecting and analysing F/OSS data has become a problem of abundance and reliability in terms of storage, sharing, aggregation, and filtering of the data. F/OSS projects employ different kinds of repositories for software development and collaboration. From these repositories community activities can be analysed and studied. The figure below shows a methodology by which community participation in mailing lists may be studied. The methodology shows F/OSS project selection, choice of software repository and lists to analyse, data extraction schema, and data cleaning procedure used to extract results for analysing community participation in developer and non-developer mailing lists. Mailing lists participants interact by exchanging email messages. A participant posts a message to a list and may get a reply from another participant. This kind of interaction represents a cycle where posters are continuously internalising and Revision: final 85

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 42: Methodological Outline to Extract Data from Mailing Lists Archives. Modified from [SSA06] (p.1027).

Revision: final

86

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

externalising knowledge into the mailing lists. In any project’s mailing list, these posters could assume the role of knowledge seekers and/or knowledge providers [SSA06]. The posting and replying activities of the participants are two variables that can be compared, measured and quantified. The affiliation an individual participation has with others as a result of the email messages they exchange within the same list or across lists in different projects could be mapped and visualised using Social Network Analyses (SNA). For the construction of such an affiliation network or ’mailing list network’ see ([SSA06], pp. 130-131).

Revision: final

87

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

4
4.1

Data Mining in Software Engineering
Introduction to Data Mining and Knowledge Discovery

The recent explosive growth of our ability to generate and store data has created a need for new, scalable and efficient, tools for data analysis. The main focus of the discipline of knowledge discovery in databases is to address this need. Knowledge discovery in databases is the fusion of many areas that are concerned with different aspects of data handling and data analysis, including databases, machine learning, statistics, and algorithms. The term Data Mining is also used as a synonym to Knowledge Discovery in Databases, as well as to refer to the techniques used for the analysis and the extraction of knowledge from large data repositories. Formally, data mining has been defined as the process of inducing previously unknown and potentially useful information from databases. 4.1.1 Data Mining Process

The two main goals of data mining are the prediction and the description. The prediction aims at estimating the future value or predicting the behaviour of some interesting variables based on some other variables’ behaviour. The description is concentrated on the discovery of patterns that represents the data of a complicated database by a comprehensible and exploitable way. A good description could suggest a good explanation of the data behaviour. The relevant importance of the prediction and description varies for different data mining applications. However, as regards the knowledge discovery, the description tends to be more important than the prediction contrary to the pattern recognition and machine learning application for which the prediction is more important. A number of data mining methods have been proposed to satisfy the requirements of different applications. However, all of them accomplish a set of data mining tasks to identify and describe interesting patterns of knowledge extracted from a data set. The main data mining tasks are as follows: • Unsupervised learning (Clustering). Clustering is one of the most useful tasks in data mining process for discovering the groups and identifying interesting distributions and patterns in the underlying data. The clustering problem is about partitioning a given data set into groups (clusters) such that the data points in a cluster are more similar to each other than points in different clusters [JD88, KR90]. In the clustering process, there are no predefined classes and no examples that would show what kind of desirable relations should be valid among the data. That is why it is perceived as an unsupervised process [BL96]. • Supervised learning (Classification). The classification problem has been studied extensively in the statistics, pattern recognition and machine learning comRevision: final 88

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

munity as a possible solution to the knowledge acquisition or knowledge extraction problem [DH73] [WK91]. It is one of the main tasks in the data mining procedure for assigning a data item to a predefined set of classes. According to [FPSSR96], classification can be described as a function that maps (classifies) a data item into one of the several predefined classes. A well-defined set of classes and a training set of pre-classified examples characterise the classification. On the contrary, the clustering process does not rely on predefined classes or examples [BL96]. The goal in the classification process is to induce a model that can be used to classify future data items whose classification is unknown. • Association rules extraction. Mining association rules is one of the main tasks in the data mining process. It has attracted considerable interest because the rules provide a concise way to state potentially useful information that is easily understood by the end-users. Association rules reveal underlying “correlations” between the attributes in the data set. These correlations are presented in the following form: A → B , where A, B refer to sets of attributes in underlying data. • Visualisation of Data. It is the task of describing complex information through visual data displays. Generally, visualisation is based on the premise that a good description of an entity (data resource, process, patterns) will improve a domain expert’s understanding of this entity and its behaviour.

4.2

Data mining application in software engineering: Overview

A large amount of data is produced in software development that software organisations collect in hope of extracting useful information from them and thus better understanding their processes and products. However, it is widely believed that large amount of useful information remains hidden in software engineering databases. Specifically, the data in software development can refer to versions of programs, execution traces, error/bug reports, Open Source packages. Also mailing lists, discussion forums and newsletters could provide useful information about software. Data mining provides the techniques to analyse and extract novel, interesting patterns from data. It assists with software engineering tasks by better understanding software artifacts and processes. Based on data mining techniques we can extract relations among software projects and extracted knowledge. Then we can exploit the extracted information to evaluate the software projects and/or predict software behaviour. Below we briefly describe the main tasks of data mining and how they can be used in the context of software engineering [MN99]. • Clustering in software engineering The clustering produce a view of the data distribution. It can also be used to Revision: final 89

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

automatically identify data outliers. An example of using data mining in software engineering is to define groups of similar modules based on the number of modifications and cyclomatic number metrics (the number of linearly independent paths through a program’s source code). • Classification Classification is a function that maps (classifies) a data item into one of the several predefined classes. One of the widely used classification techniques is the decision trees. They can be used to discover classification rules for a chosen attribute of a dataset by systematically subdividing the information contained in this data set. Decision trees have been one of the tools that have been chosen for building classification models in the software engineering field. Figure 43 shows a classification tree that has been built to provide a mechanism for identifying risky software modules based on attributes of the module and its system. Thus based on the given decision tree we can extract the following rule that assists with making decision on errors in a module: IF(# of data bindings > 10) AND (it is part of a non real-time system) THEN the module is unlikely to have errors

• Association rules in software engineering Association discovery techniques discover correlations or co-occurrences of events in a given environment. Thus it can be used to extract information from coincidences in a dataset. Analysing for instance the logs errors discovered at software modules in a system we can extract relations between inducing events based on the software module features and errors categories. Such a rule could be the following: (large/small size, large/small complexity, number of revisions) → (interface error, missing or wrong functionality, algorithms or data structure error etc.) A number of approaches has been proposed in literature which based on the above data mining techniques aims to assist with some of the main software engineering tasks, that is software maintenance and testing. We provide an overview of these approaches in the following section. Also Table 2 summarises their main features. 4.2.1 Using Data mining in software maintenance

Data mining due to its capability to deal with large volumes of data and its efficiency to identify hidden patterns of knowledge, has been proposed in a number of research work as mean to support industrial scale software maintenance.

Revision: final

90

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 43: Classification tree for identifying risky software modules [MN99] Analysing source code repositories Data mining approaches have been extensively used to analyse source code version repositories and thus assist with software maintenance and enhancement. Many of these repositories are examined and managed by tools such as CVS (Concurrent Version Systems). These tools store difference information access across document(s) versions, identifies and express changes in terms of physical attributes, i.e., file and line numbers. However, CVS does not identify, maintain or provide any changecontrol information such as grouping several changes in multiple files as a single logical change. Moreover, it does not provide high-level semantics of the nature of corrective maintenance(e.g. bug-fixes). Recently, the interest of researchers has been focused on techniques that aim to identify relationships and trends at a syntactic-level of granularity and further associate high-level semantics from the information available in repositories. Thus a wide array of approaches that perform mining of software repositories (MSR) have been emerged. They are based on data mining techniques and aim to extract relevant information from the repositories, analyse it and derive conclusions within the context of a particular interest. These approaches based on [KCM05] can be classified based on: • Entity type and granularity they use ( e.g. file, function, statement, etc). • Expression and definition of software changes (e.g. deletion, etc). modification, addition,

• Type of question (e.g. market-basket, frequency of a type of change, etc). Revision: final 91

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Technique Data mining Classification Clustering

Approach [FLMP04] [KDTM06]

Input execution profiles & result(success/failure) source code behavioural or structural entities, attributes, metrics software entities e.g.functions

Output decision tree of failed executions Extract significant patterns from the system source code groups of similar classes, methods, data Prediction of failures, correlations between entities identification of additions, modifications,deletions of syntactic entities a network producing sets for function testing track of bugs syntactic and semantic changes syntax & semantic hidden dependencies syntax & semantic file coupling candidate entities for change

Association rules

[ZWDZ04]

Neural networks Differencing Pattern extraction Analysis of semantic graph CVS Annotations Semantic analysis

[LFK05]

input, output variables of software system source code, change history source code repositories version history of source code, classes file & comments

[WH05] [RRP04]

[GHJ98]

[GM03a]

Heuristic [HH04] CVS annotation heuristics

Table 2: Mining approaches in software engineering In the sequel, we introduce the main concepts used in MSR and then we briefly present some of the most known MSR approaches proposed in literature. Fundamental Concepts in MSR. The basic concepts with respect to MSR involve

Revision: final

92

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

the level of granularity of what type of software entity is investigated, the changes and the underlying nature of a change. Then most widely used concepts can be summarised to the followings: • An entity, e, is a physical, textual or syntactic element in software. For example, a file, line, function, class, comment, if-statement, etc. • A change is a modification, addition, deletion, to or of an entity. A change describes which entities are changed and where the change occurs. • The syntax of a change is a concise and specific description of the syntactic changes to the entity. This description is based on the grammar of the entities’ language. For instance, a condition was added to an if-statement; a parameter was renamed; assignment statement was added inside a loop etc. • The semantics of a change is a high level, yet concise description of the change in the entity’s semantics or feature space. For instance, a class interface change, bug fix, a new feature was added to GUI etc. MSR via CVS annotations. One approach is to utilise CVS annotation information. Gall et. al. [GHJ98] propose an approach for detecting common semantic (logical and hidden) dependencies between classes on account of addition or modification of particular class. This approach is based on the version history of the source code where a sequence of release numbers for each class in which its changes are recorded. Classes that have been changed in the same release are compared in order to identify common change patterns based on author name and time stamp from the CVS annotations. Classes that are changed with the same time stamp are inferred to have dependencies. Specifically, this approach can assist with answering questions such as which classes change together, how many times was a particular class changed, how many class changes occurred in a subsystem (files in a particular directory). An approach that studies the file-level changes in software is presented in [Ger04a]. The CVS annotations are utilised to group subsequent changes into what termed modification request (MR). Specifically this approach focus on studying bug-MRs and commentMRs to address issues regarding the new functionality that may be added or the bugs that may be fixed by MRs, the different stages of evolution to which MRs correspond or identify the relation between the developer and the modification of files. MSR via Data Mining. Data mining provides a variety of techniques with potential application to MSR. One of these techniques are the association rules. The work proposed by Zimmerman et al [ZWDZ04] exploit the association rules extraction technique to identify co-occurring changes in a software system. For instance, we want to discover relation between the modification of software entities. Then we aim

Revision: final

93

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

to answer the question when a particular source-code entity (e.g. a function A) is modified, what other entities are also modified (e.g. the functions with names B and C)? Specifically, a tool is proposed that parses the source code and maps the line numbers to the syntactic or physical-level entities. These entities are represented as a triple (filename, type, id). The subsequent entity changes in the repository are grouped as a transaction. An association rule mining techniques is then applied to determine rules of the form B, C → A. This techniques has been applied to opensource projects with a goal of utilising earlier version to predict changes in the later versions. In general terms, this technique enables the identification of additions, modifications, deletions of syntactic entities without utilising any other external information. It could handle various programming languages and assists with detecting hidden dependencies that cannot be identified by source code analysis. MSR via Heuristics. CVS annotation analysis can be extended by applying heuristics that include information from source code or source code models. Hassan et al [HH04] proposed a variety of heuristics (developer-based, history-based, codelayout-based (file-based)) which are then used to predict the entities that are candidates for a change on account of a given entity being changed. CVS annotations are lexically analysed to derive the set of changed entities from the source-code repositories. Also the research in [ZWDZ04] and [HH04] use source-code version history to identify and predict software changes. The questions that they answered are quite interesting with respect to testing and impact analysis. MSR via Differencing. Source-code repositories contain differences between versions of source code. Thus MSR can be performed by analysing the actual sourcecode differences. Such an approach that aims to detect syntactic and semantic changes from a version history of C code is presented by Raghavan [RRP04]. According to this approach, each version is converted to an abstract semantic graph (ASG) representation. This graph is a data structure which is used in representing or deriving the semantics of an expression in a programming language. A top-down or bottom-up heuristicsbased differencing algorithm is applied to each pair of in-memory ASGs. The differencing algorithm produces an edit script describing the nodes that are added, deleted, modified or moved in order to achieve one ASG from another. The edit scripts produced for each pair of ASGs are analysed to answer questions from entity level changes such as how many functions and functions calls are inserted, added or modified to specific changes such as how many if statement conditions are changed. Also in [CH04] a syntactic-differencing approach, which is called meta-differencing, is introduced. It allows us to ask syntax-specific questions about differences. According to this approach the abstract syntax tree (AST) information is directly encoded into the source code via XML format. Then we compute the added, deleted or mod-

Revision: final

94

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

ified syntactic elements based on the encoded AST. The types and prevalence of syntactic changes can be easily computed. Specifically, the approach supports the following questions: i Are new methods added to an existing class? ii Are there changes to pre-processor directives? iii Was the condition in an if-statement modified? According to the above discussion on MSR we can conclude that the types of questions that MSR can answer can be classified to two categories: • Market-basket questions. These are formulated as : IF A happens then what ELSE happens on a regular basis? The answer to such a question is a set of rules or guidelines describing situation of trends or relationships. This can be expressed as follows: if A happens then B and C happen X amount of the time. • Questions dealing with the prevalence or lack of a particular type or change. The type of questions often addresses finding hidden dependences or relationships which could be very important for impact analysis. MSR aims to identify the actual impact set after an actual change. However, the MSR techniques often give a “bestguess” for the change. Then the change may not explicitly be documented and thus sometimes it must be inferred. A clustering approach for semi-automated software maintenance In [KDTM06] presents a framework for knowledge acquisition from source code in order to comprehend an object-oriented system and evaluate its maintainability. Specifically, clustering techniques are used to assist engineers with understanding the structure of source code and assessing its maintainability. The proposed approach is applied to a set of elements collected from source code, including: • Entities that belong either to behavioural (classes, member methods) or structural domain (member data). • Attributes that describe the entities (such class name, superclass, method name etc). • Metrics used as additional attributes that facilitate the software maintainer to comprehend more thoroughly the system under maintenance. The above elements specifies the data input model of the framework. Another part of the framework is an extraction process which aim to extract elements and metrics from source code. Then the extracted information is stored in a relational Revision: final 95

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

database so that the data mining techniques can be applied. In the specific approach, clustering techniques are used to analyse the input data and provide a rough grasp of the software system to the maintenance engineer. Clustering produces overviews of systems by creating mutually exclusive groups of classes, member data, methods based on their similarities. Moreover, it can assist with discovering programming patterns and outlier cases (unusual cases) which may require attention. Another problem that we have to tackle in software engineering is the corrective maintenance of software. It would be desirable to identify software defects before they cause failures. It is likely that many of the failures fall into small groups, each consisting of failures caused by the same software defect. Recent research has focused on data mining techniques which can simplify the problem of classifying failures according to their causes. Specifically these approaches requires that three types of information about executions are recorded and analysed: i)execution profiles reflecting the causes of the failures, ii) auditing information that can be used to confirm reported failures and iii) diagnostic information that can be used in determining their causes. Classification of software failures A semi-automated strategy for classifying software failures is presented in [PMM+ 03]. This approach is based on the idea that if m failures are observed over some period during which the software is executed, it is likely that these failures are due to a substantially smaller number of distinct defects. Assume that F = {f1 , f2 , . . . , fm } is the set of reported failures and that each failure is caused by just one defect. Then F can be partitioned into k < m subsets F1 , F2 , . . . , Fk such that all of the failures in Fi are caused by the same defect di , 1 ≤ i ≤ k . This partitioning is called the true failure classification. In the sequel, we describe the main phases of the strategy for approximating the true failure classification: 1. The software is implemented to collect and transmit to the development either execution profiles or captured executions and then it is deployed. 2. Execution profiles corresponding to reported failures are combined with a random sample of profiles of operational executions for which no failures were reported. This set of profiles is analysed to select a subset of all profile features to use in grouping related failures. A feature of an execution profile corresponds to an attribute or element of it. For instance, a function call profile contains an execution count for each function in a program and each count is a feature of the profile. Then the feature selection strategy is as follows: • Generate candidate feature-sets and use each one to create and train a pattern classifier to distinguish failures from the successful executions. • Select the features of the classifier that give the best results. Revision: final 96

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 44: A clusters hierarchy 3. The profiles of reported failures are analysed using cluster analysis, in order to group together failures whose profiles are similar with respect to the features selected in phase 2. 4. The resulting classification of failures into groups is explored in order to confirm it or refine it. The above described strategy provides an initial classification of software failures. Depending on the application and the user requirements these initial classes can be merged or split so that the software failure are identified in an appropriate fashion. In [FLMP04], two tree-based techniques for refining an initial classification of failures are proposed and they are discussed below. Refinement using dendrograms. A dendrogram is a tree-like diagram used to represent the results of hierarchical clustering algorithm. One of the strategies that has been proposed in literature for refining initial failure clustering relies on dendrograms. Specifically, it uses them to decide how non-homogeneous clusters should be split into two or more sub-clusters and to decide which clusters should be considered for merging. A cluster in a dendrogram corresponds to a subtree that represents relationships among the cluster’s sub-clusters. The more similar two clusters are to each other, the farther away from the dendrogram root their nearest common ancestor is. For instance, based on the dendrogram presented in Figure 44we can observe that the clusters A and B are more similar than the clusters C and D. A cluster’s largest homogeneous subtree is the largest subtree consisting of failures with the same cause. If a clustering is too coarse, some clusters may have two or more

Revision: final

97

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 45: Merging two clusters. The new cluster A contains the clusters represented by the two homogeneous sub-trees A1 and A2 large homogeneous subtrees containing failures with different causes. Such a cluster should be split at the level where its large homogeneous subtrees are connected, so that these subtrees become siblings as Figure 46 shows. If it is too fine, siblings may be clusters containing failures with the same causes. Such siblings (clusters) should be merged at the level of their parent as Figure 45 depicts. Based on these definitions, the strategy that has been proposed for refining an initial classification of failures using dendrograms has three phases: 1. Select the number of clusters into which the dendrogram will be divided. 2. Examine the individual clusters for homogeneity by choosing the two executions in the cluster with maximally dissimilar profiles. If the selected executions have the same or related causes, it is likely that all of the other failures in the cluster do as well. If the selected executions do not have the same or related causes, the cluster is not homogeneous and should be split. 3. If neither the cluster nor its sibling is split by step 2, and the failures were examined have the same cause then we merge them. Clusters that have been generated from merging or splitting should be analysed in the same way, which allow for recursive splitting or merging. Refinement using classification trees. The second technique proposed by Francis et al, relies on building a classification tree to recognise failed executions. A classification tree is a type of pattern classifier that takes the form of binary decision tree. Each internal node in the tree is labelled with a relational expression that compares a numeric feature of the object being classified to a constant splitting value. On the other hand, each leaf of the tree is labelled with a predicted value, which class of interest the leaf represents. Having the classification tree, an object is classified by traversing the tree from the root to the leaf. At each step of the traversal prior to reach a leaf, we evaluate the expression at the current node. When the object reaches a leaf, the predicted value of that leaf is taken as the predicted class for that object. Revision: final 98

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 46: Splitting a cluster: The two new clusters (subtrees with roots A11 and A12) correspond to the large homogeneous subtrees in the old cluster. In case of software failure classification problem, we consider two classes, that is success and failure. The Classification And Regression Tree (CART) algorithms was used in order to build the classification tree corresponding of software failures. Assume a training set of execution profiles

L = {(x1 , j1 ), . . . , (xN , jN )}
where each xi represents an execution profile and ji is the result (success/failure) associated with it. The steps of building the classification tree based on L are as follows: • The deviance of a node t ⊆ L is defined as

d(t) =

1 Nt

ji − j(t))

2

where Nt is the size of t and j(t) is the average value of j in t. • Each node t is split into two children tR and tL . The split is chosen that maximises the reduction in deviance. That is, from the set of possible splits S, the optimal split is found by:

s∗ = argmins∈S d(t) −

NtL Nt d(tR ) − L d(tL ) Nt Nt

• A node is declared a leaf node if d(t) ≤ β , for some threshold β . • The predicted value for a leaf is the average value of j among the executions in that leaf.

Revision: final

99

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Analysing Bug Repositories Source code repositories stores a wealth of information that is not only useful for managing and building source code, but also a detailed log how the source code has evolved during development. Information regarding the evidence of source code refactoring will be stored in the repository. Also as bugs are fixed, the changes made to correct the problem are recorded. As new APIs are added to the source code, the proper way to use them is implicitly explained in the source code. Then, one of the challenges is to develop tools and techniques to automatically extract and use this useful information. In [WH05], a method is proposed which uses data describing bug fixes mined from the source code repository to improve static analysis techniques used to find bugs. It is a two step approach where the source code change history of a software project helps to refine the search for bugs. The first step in the process is to identify the types of bugs that are being fixed in the software. The goal is to review the historical data stored for the software project, in order to gain an understanding of what data exists and how useful it may be in the task of bug findings. Many of the bugs found in the CVS history are good candidates for being detected by statistic analysis, NULL pointer checks and function return value checks. The second step is to build a bug detector driven by these findings. The idea is to develop a function return value checker based on the knowledge that a specific type of bug has been fixed many times in the past. Briefly, this checker looks for instances where the return value from a function is used in the source code before being tested. Using a return value could mean passing it as an argument to a function, using it as part of calculation, dereferencing the value if it is a pointer or overwriting the value before it is tested. Also, cases that return values are never stored by the calling function are checked. Testing a return value means that some control flow decision relies on the value. The checker does a data flow analysis on the variable holding the returned value only to the point of determining if the value is used before being tested. It simply identifies the original variable the returned value is stored into and determines the next use of that variable. If the variable during its next use is an operand to a comparison in a control flow decision, the return value is deemed to be tested before being used. If the variable is used in any way before being used in a control flow decision, the value is deemed to be used before being tested. Also, a small amount of inter-procedural analysis is performed in order to improve the results. It is often the case that a return value will be immediately used as an argument in a call to a function. In these cases, the checker determines if that argument is tested before being used in the called function. Moreover, the checker categorises the warnings it finds into one of the following categories:

Revision: final

100

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Warnings are flagged for return values that are completely ignored or if the return value is stored but never used. • Warnings are also flagged for return values that are used in a calculation before being tested in a control flow statement. Any return value passed as an argument to a function before being tested is flagged, as well as any pointer return value that is dereferenced without being tested. However there are types of functions that lead the static analysis procedure to produce false positive warnings. If there is no previous knowledge, it is difficult to tell which function does not need their return value checked. Mining techniques for source code repository can assist with improving static analysis results. Specifically the data we mine from the source code repository and from the current version of the software is used to determine the actual usage pattern for each function. In general terms, it has been observed that the bugs catalogued in bug databases and those found by inspecting source code change histories differ in type and level of abstraction. Software repositories record all the bug fixed, from every step in development process and thus they provide much useful information. Therefore, a system for bug finding techniques is proved to be more effective when it automatically mines data from source code repositories. Mining the Source Code Repository Williams et al. [WH05] proposes the use of analysis tool to automatically mine data from the source code repository by inspecting every source code change in the repository. Specifically, they try to determine when a bug of the type they are concerned with is fixed. A source code checker is developed (as described above) which is used to determine when a potential bug has been fixed by a source code change. The checker is run over both version of the source code. If, for a particular function called in the changed file, the number of calls remain the same and the number of warnings produced by the tool decreases, the change is said to fix a likely bug. If we determine that a check has been added to the code, we flag the function that produces the return value as being involved in a potential bug fix in a CVS commit. The results of the mining is a list of functions that are involved in a potential bug fix in a CVS commit. The output of the function return value checker is a list of warnings denoting instances in the code where a return value from a function is used before being tested. A full description of the warning including the source file, line number and category of the warning including the source file, line number and category of the warning. Since there are many reasons that could lead a static analysis to produce a large number of false positive warnings, the proposed tool provide a ranking of the warnings. It tries to rank the warnings from least likely to most likely to be false positive. The rank is done in two parts. First, the function are divided into those

Revision: final

101

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

that are involved in a potential bug fix in a CVS commit and those that are not. Next, within each group, the functions are ranked by how often their return values are tested before being used in the current version of the software. 4.2.2 A Data Mining approach to automated software testing

The evaluation of software is based on tests that are designed by software testers. Thus the evaluation of test outputs is associated with a considerable effort by human testers who often have imperfect knowledge of the requirements specification. This manual approach of testing software results in heavy losses to the world’s economy. Thus the interest of researchers has been focused on the development of automated techniques that induces functional requirements from execution data. Data mining approaches can be used for extracting useful information from the tested software which can assist with the software testing. Specifically the induced data mining models of tested software can be used for recovering missing and incomplete specifications, designing a set of regression tests and evaluating the correctness of software outputs when testing new releases of the system. In developing a large system, the test of the entire application (system testing) is followed by the stages of unit testing and integration testing. The activities of system testing includes function testing, performance testing, acceptance testing and installation testing. The function testing aims to verify that the system performs its functions as specified in the requirements and there are no undiscovered errors left. Thus a test set is considered adequate if it causes all incorrect versions of the program to fail. It is then important that the selection of tests and the evaluation of their outputs are crucial for improving the quality of the tested software with less cost. Assuming that requirements can be re-stated as logical relationships between input and outputs, test cases can be generated automatically by techniques such as cause effect graphs [Pfl01] and decision tables [LK03b]. A software system in order to stay useful has to undergo continual changes. Most common maintenance activities in software life-cycle include bug fixes, minor modifications, improvements of basic functionality and addition of brand new features. The purpose of regression testing is to identify new faults that may have been introduced into the basic features as a result of enhancing software functionality or correcting existing faults. A regression test library is a set of test cases that run automatically whenever a new version of software is submitted for testing. Such a library should include a minimal number of tests that cover all possible aspects of system functionality. A standard way to design regression test library is to identify equivalence classes of every input and then use only one value from each edge (boundary) of every class. One of the main problems is the generation of a minimal test suite which covers as many cases as possible. Ideally such a test suite can be generated by a complete and up-to-date specification of functional requirements. However, frequent changes make the original requirements specifications, hardly Revision: final 102

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Figure 47: An example of Info-Fuzzy Network structure [LFK05] relevant to the new versions of software. Then to ensure effective design of new regression test cases, one has to recover the actual requirements of an existing system. Thus, a tester can analyse system specifications, perform structural analysis of the system’s source code and observe the results of system execution in order to define input-output relationships in tested software. An approach that aims to automate the input-output analysis of execution data based on a data mining methodology is proposed in [LFK05]. This methodology relies on the info-fuzzy network (IFN) which has an ‘oblivious’ tree-like structure. The network components include the root node, a changeable number of hidden layers (one layer for each selected input) and the target (output) layer representing the possible output values. The same input attribute is used across all nodes of a given layer (level) while each target node is associated with a value (class) in the domain of a target attribute. If the IFN model is aimed at predicting the values of a continuous target attribute, the target nodes represent disjoint intervals in the attribute range. A hidden layer l, consists of nodes representing conjunctions of values of the first l input attributes, which is similar to the definition of an internal node in a standard decision tree. The final (terminal) nodes of the network represent non-redundant conjunctions of input values that produce distinct outputs. Considering that the network is induced from execution data of a software system, each interconnection between a terminal and target node represents a possible output of a test case. Figure 47 presents an IFN structure where the internal nodes include the nodes (1,1), (1,2), 2, (3,1), (3,2) and the connect (1, 1) → 1 implies that the expected output values for a test case where both input variables are equal to 1, is also 1. The confectionist nature of IFN resembles the structure of a multi-layer neural network. Therefore, the IFN model is characterised as a network and not as a tree. A separate info-fuzzy network is constructed to represent each output variable. Thus we present below the algorithm for building an info-fuzzy network of a single

Revision: final

103

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

output variable. Network Induction Algorithm. The induction procedure starts with defining the target layer (one node for each target interval or class) and the “root” node. The root node represents an empty set of input attributes which are selected incrementally to maximise a global decrease in the conditional entropy of the target attribute. The IFN algorithm is based on the pre-pruning approach unlike algorithms of building decision trees such as CART and C4.5. Thus it assumes that when no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm performs discretisation of continuous input attributes “on-the-fly” by recursively finding a binary partition of an input attribute that minimises the conditional entropy of the target attribute [FI93]. The search for the best partition of attribute is dynamic and it is performed each time a candidate input attribute. Each hidden node in the network is associated with an interval of a discretised input attribute. The estimated conditional mutual information between the partition of the interval S at the threshold Th and the target attribute T given the node z is defined as follows:

M I (Th ; T /S, z =) =
t=0,...,MT −1 y=1,2

P (Sy ; Ct ; z) · log

P (Sy ; Ct /S, z) P (Sy /S, z) · P (Ct /S, z)

where • P (Sy ; Ct ; z) is an estimated conditional probability of a sub-interval Sy , given the interval S and the node z . • P (Ct /S, z) is an estimated conditional probability of a value Ct of the target attribute T the interval S and the node z . • P (Sy ; Ct ; z) is an estimated joint probability of a value Ct of the target attribute T , a sub-interval Sy and the node z .

Then the statistical significance of splitting the interval S by the threshold Th at the node z is evaluated using the likelihood-ratio statistic. A new input attribute is selected to maximise the total significant decrease in the conditional entropy as result of splitting the nodes of the last layer. The nodes of the new hidden layer are defined as the Cartesian product of split nodes of the previous layer and the discretised interval of the new input variables. If there is no input variable that decreases the conditional entropy of the output variable then the network construction stops. The IFN induction procedure is a greedy algorithm which is not guaranteed to find the optimal ordering of input attributes. Though some functions are highly sensitive to this ordering, alternative orderings will still produce acceptable results in most cases. Revision: final 104

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

An IFN-based environment for automated input-output analysis is presented in [LFK05]. The main modules of this environment are: • Legacy system (LS). This module represents a program, a component or a system to be tested in subsequent versions of the software. • Specification of Application Inputs and Outputs (SAIO). Basic data on each input and output variable in the Legacy System. • Random test generator (RTG). This module generates random combinations of values in the range of each input variable. • Test bed (TB). This module feeds training cases generated by the RTG module to the LS. The IFN algorithm is trained on inputs provided by RTG and outputs obtained from a legacy system by means of the Test Bed module. A separate IFN module is built for each output variable. The information derived from each IFN model can be summarised to the following: • A set of input attributes relevant to the corresponding output. • Logical (if... then ...) rules expressing the relationships between the selected input attributes and the corresponding output. The set of rules appearing at each terminal node represents the distribution of output values at that node. • Discretisation intervals of each continuous input attribute included in the network. Each interval represents an “equivalence” class, since for all values of a given interval the output values conform to the same distribution. • A set of test cases. The terminal nodes in the network are converted into test cases, each representing a non-redundant conjunction of input values / equivalence classes and the corresponding distribution of output values. The IFN algorithm takes as input the training cases that are randomly generated by the RTG module and the outputs produced by LS for each test case. The IFN algorithm repeatedly runs to find a subset of input variables relevant to each output and the corresponding set of non-redundant test cases. Actual test cases are generated from the automatically detected equivalence classes by using an existing testing policy.

4.3

Text Mining and Software Engineering

Software engineering repositories consists of text documents containing source code, mailing lists, bug reports, execution logs. Thus the mining of textual artifacts is Revision: final 105

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

requisite for many important activities in software engineering: tracing of requirements; retrieval of components from a repository; identify and predict software failures; software maintenance; testing etc. This section describes the state of the art in text mining and the application of text mining techniques in software engineering. Furthermore, a comparative analysis for the text mining techniques applied in software engineering is provided, and future directions are discussed. 4.3.1 Text Mining - The State of the Art

Text mining is the process of extracting knowledge and patterns from unstructured document text. It is a young interdisciplinary research field under the wider area of data mining engaged in information retrieval, machine learning and computational linguistics. The methods deployed in text mining, depending on the application, usually require the transformation of the texts into an intermediate structured representation, which can be for example the storage of the texts into a database management system, according to a specific schema. In many approaches though, there is gain into also keeping a semi-structured intermediate form of the texts, as for example could be the representation of documents in a graph, where social analysis and graph techniques can be applied. Independently from the task objective, text mining requires preprocessing techniques, usually levying qualitative and quantitative analysis of the documents’ features. In Figure 48, the diagram depicts the most important phases of the preprocessing analysis, as well as the most important text mining techniques. Preprocessing assumes a preselected documents representation model, usually the vector space model though the boolean and the probabilistic are other options. According to the representation model, documents are parsed, and text terms are weighted according to weighting schemes like the TF-IDF (Term Frequency - Inverse Document Frequency), which is based on the frequency of occurrence of terms in the text. Several other options are described in [Cha02, BYRN99]. Natural language processing techniques are also applied, the state of the art of which is well described in [Mit05, MS99]. Often, stop-words removal and stemming is applied. In favour of the use of natural language processing techniques in text mining, it has been shown in the past that the use of semantic linguistic features, mainly derived from a language knowledge base like WordNet word thesaurus [Fel98], can help text retrieval [Voo93] and text classification [MTV+ 05]. Furthermore, the use of word sense disambiguation (WSD) techniques [IV98] is important in several natural language processing techniques and text mining tasks, like machine translation, speech processing and information retrieval. Lately, state of the art approaches in unsupervised WSD [TVA07, MTF04], have pointed the way towards the use of semantic networks generated from texts, enhanced with semantic information derived from word thesauri. These approaches are to be launched in the text retrieval task, Revision: final 106

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Preprocessing Storage and Indexing
Feature Extraction Structured Representation Term Weighting Boolean Dimensionality Reduction Vector Text Keyword Characterization Probabilistic Natural Language Processing Non-Overlapping Lists Part of Speech Tagging Proximal Nodes

Document Collection

Word Sense Disambiguation Semi-Structured Representation Summarization Graph Phrase Detection Entity Recognition Word Thesauri/Domain Ontologies Link Analysis Meta-data

Text Mining
Clustering Classification Retrieval Social Analysis Domain Ontology Evolution Models/ Patterns/ Answers Processing Structured and/or semi-structured data

Figure 48: Preprocessing, Storage and Processing of Texts in Text Mining where it is expected that under certain circumstances the representation of texts as semantic networks can improve retrieval performance. Another important factor when tackling with unstructured text is the curse of dimensionality. While tackling millions or even billion of documents, the respective term space is huge and often prohibitive of applying any type of analysis or feature extraction. Towards this direction, techniques that are based in singular value decomposition, like latent semantic indexing, or removing of features with low scores based on statistical weighting measures are levied. Several examples of such techniques can be found in [DZ07]. Once feature extraction and natural language processing techniques have been applied on the document collection, storage takes place with the use of techniques like inverted indexing. Depending on the application of the text mining methods, a semi-structured representation of documents, like in [TVA07, MTF04], might be needed. In such cases, indexing of the respective information (i.e. node types, edge types, edge weights) is useful. The text mining techniques that are mentioned in Figure 49 are representative Revision: final 107

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

and frequently used in many applications. For example, clustering has already been used in information retrieval and it is already applied in popular web search engines, like in Vivisimo 37 . Text classification is widely used in spam filtering. Text retrieval is a core task with unrestricted range of applications varying from search engines to desktop search. Social analysis can be applied when any type of links between documents is available, like for example publications and references, or posts in forums and replies, and is widely used for authority and hubs detection (i.e. finding the most important people in this graph). Finally domain ontology evolution is a task where through the use of other text mining techniques, like clustering or classification, an ontology describing a specific domain can be evolved and enhanced with term features of new documents pertaining with the domain. This is really important in cases where the respective domain evolves fast, prohibiting the manual update of the ontology with new concepts and instances. 4.3.2 Text Mining Approaches in Software Engineering

Applying text mining techniques in software engineering is a real challenge, mostly because of the perplexed nature of the unstructured text. Text mining in software engineering has employed a wide range of text repositories, like document hierarchies, code repositories, bug report databases, concurrent versioning system logs repositories, newsgroups, mailing lists and several others. Since the aim is to define metrics which can lead to software assessment and evaluation, while the input data is unstructured and unrestricted text, the text mining processes in software engineering are hard to design and moreover to apply. The most challenging part is the selection and the preprocessing of the input text sources, along with the design of a metric that shall use one or more text mining techniques applied to these sources, while in parallel shall oblige to the existing standards for software engineering metrics. Discussion of some of the most recent approaches within this scope follows, while Figure 49 summarises the methods and their use. In [BGD+ 06], they used as text input the Apache developer mailing list. Entity resolution was essential, since many individuals used more than one alias. After constructing the social graph occurring from the interconnections between poster and replier, they made a social network analysis and came to really important findings, like the strong relationship between email activity and source code level activity. Furthermore, social network analysis in that level revealed the important nodes (individuals) in the discussions. Though graph and link analysis were engaged in the method, the use of node ranking techniques, like PageRank, or other graph processing techniques like Spreading Activation, did not take place. In [CC05] another text source has been used with the aim of predicting parts of the source code that will be influenced by fixing future bugs. More precisely, for each
37

Publicly available at http://vivisimo.com/

Revision: final

108

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Method

Text Input Source

Text Mining Technique Entity Resolution, Social Network Analysis

Output Weighting of OSS participants, Relationship of e-mail activity and commit activity Patterns in the development of large software projects (history analysis, major contributions) Similarity between new bug reports and source code files Prediction Predictions of source bugs Transformation of data into process events, Ordering of processing events Statistical measures for code changes and developers

[BGD+06]

E-mail archives of OSS software

[VT06]

CVS repositories

Text Clustering

[CC05]

CVS commit notes, Set of fixed bugs

Text Retrieval

[WK05]

CVS repositories, source code OSSD Web Repositories (Web pages, mailing lists, process entity taxonomy) Mailing lists, CVS logs, Change Log files

Text analysis, retrieval, classification Text Extraction, Entity Resolution, Social Network Analysis Text Summarization and Validation

[JS04]

[GM03]

Figure 49: Summary of Recent Text Mining Approaches in Software Engineering source file they used the set of fixed bugs data and the respective CVS commit notes as descriptors. With the use of a probabilistic text retrieval model they measure the similarity between the descriptors of each source file and the new bug description. This way they predict probably future affected parts of code by bug fixing. Still, the same method could have been viewed from a supervised learning perspective and classification along with predictive modelling techniques, would have been a good baseline for their predictions. Following the same goal, in [WH05] they mined the CVS repositories to obtain categories of bug fixes. Using a static analysis tool, they inspected every source code change in the software repository and they predicted whether a potential bug in the code has been fixed. These predictions are then ranked with the analysis of the contemporary context information in the source code (i.e. checking the percentage of the invocations of a particular function where the return value is tested before being used). The whole mining procedure is based on text analysis of the CVS commit changes. They have conducted experiments on the Apache Web server source code and the Wine source code, in which they showed that the mined data from the softwares’ repositories produced really good precision and certainly better than a baseline naive technique. From another perspective, text mining has been used in software engineering to validate the data from mailing lists, CVS logs, and change log files of Open Source

Revision: final

109

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

software. In [GM03a] they created a set of tools, namely SoftChange38 , that implements data validation from the aforementioned text sources of Open Source software. Their tools retrieve, summarise and validate these types of data of Open Source projects. Part of their analysis can mark out the most active developers of an Open Source project. The statistics and knowledge gathered by SoftChange analysis has not been exploited fully though, since further predictive methods can be applied with regards to fragments of code that may change in the future, or associative analysis between the changes’ importance and the individuals (i.e. were all the changes committed by the most active developer as important as the rest, in scale and in practice?). Text mining has also been applied in software engineering for discovering development processes. Software processes are composed of events such as relations of agents, tools, resources, and activities organised by control flow structures dictating that sets of events execute in serial, parallel, iteratively, or that one of the set is selectively performed. Software process discovery takes as input artifacts of development (e.g. source code, communication transcripts, etc) and aims to elicit the sequence of events characterising the tasks that led to their development. In [JS04] an innovative method of discovering software processes from open source software Web repositories is presented. Their method contains text extraction techniques, entity resolution and social network analysis, and it is based in process entity taxonomies, for entity resolution. Automatic means of evolving the taxonomy using text mining tasks could have been levied, so as for the method to lack strict dependency from the taxonomy’s actions, tools, resources and agents. An example could be text clustering on the open software text resources and extraction of new candidate items for the taxonomy arising from the clusters’ labels. Text clustering has also been used in software engineering, in order to discover patterns in the history and the development process of large software projects. In [VT06] they have used CVSgrab to analyse the ArgoUML and PostgreSQL repositories. By clustering the related resources, they generated the evolution of the projects based on the clustered file types. Useful conclusions can be drawn by careful manual analysis of the generated visualised project development histories. For example, they discovered that in both projects there was only one author for each major initial contribution. Furthermore, they came to the conclusion that PostgreSQL did not start from scratch, but was built atop of some previous project. An interesting evolution of this work could be a more automated way of drawing conclusions from the development history, like for example extracting clusters labels, map them to taxonomy of development processes and automatically extract the development phases with comments emerging from taxonomy concepts.
38

Publicly available at http://sourcechange.sourceforge.net/

Revision: final

110

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

4.4

Future Directions of Data/Text Mining Applications in Software Engineering

Defining software engineering metrics with the use of text mining can be no different process from following the existing standards for defining direct or indirect metrics for evaluating software using any background knowledge. The IEEE Standard 1061 [IEE98] defines a methodology for developing metrics for software quality attributes. A framework for evaluating proposed metrics in software engineering, according to the IEEE 1061 Standard is discussed in [KB04a]. The latter refers to ten questions that need to be answered when defining software evaluation measures. Though any design and implementation of a method using text mining for software evaluation must follow the aforementioned and/or related standards, there is a common place in how differently, aside or atop of the described techniques, can text mining be used in future directions. A short description of issues that would be interesting to address in the context of this project follows. • Social network analysis, for the purposes of discovering the important cluster of individuals in a software project, through using more sophisticated graph processing techniques, like PageRank or Spreading Activation. Actually social net analysis is a set of algorithms that exist long ago having been applied in other context. The ‘future direction’ is to extend and apply them in the context of SQO-OSS - aiming at ranking relevant entities appearing in software development. • Supervised Learning approaches, like text classification, based on predictive modelling techniques, for the purposes of predicting future bugs and/or possibly affected parts of code. A measure of future influence of bugs in the source code, associated with a weight and a prediction ranking can show a lot for the software quality. • Text clustering of the bug reports, and cluster’s labelling can be used to automatically create a taxonomy of bugs in the software. Metrics in that taxonomy can be defined to show the influence of generated bugs belonging in a category of bugs, to other categories. This can also be translated as a metric of bug influence across the software project. • Graph mining techniques to detect hidden structures in a OSS(Open Source Software) project. A complex graph can be created based on functions’ relations as defined by the function calls in a project. Then a program execution is a path in this graph. Using graph mining techniques (link analysis algorithms, min-cut algorithms), we could derive correlations of paths leading to errors; predict software behaviour assuming first k steps; statistically analyse large number of paths and make decisions. Revision: final 111

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Also we can assume graphs created from the existing OSS software and the communication data. This implies a graph G(V, E), where V = node/ node represents user, E = edge/ edge: e.g. email exchange. Applying mining techniques we can extract useful information from the graph and predict individual actions (i.e. what/when will be the next action of q user) and calculate aggregate measures regarding the software quality.

Revision: final

112

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

5

Related IST Projects

This section contains information about related IST projects. The following list was taken from the draft agenda of the Software Technologies Concertation Meeting, 25 September 2006, Brussels. The projects are presented in an alphabetical order.

5.1

CALIBRE

CALIBRE was an EU FP6 Co-ordination Action project that involved the leading authorities on libre/Open Source software. CALIBRE brought together an interdisciplinary consortium of 12 academic and industrial research teams from France, Ireland, Italy, the Netherlands, Poland, Spain, Sweden, the UK and China. The two-year project managed to: • Establish a European industry Open Source software research policy forum • Foster the effective transfer Open Source best practice to European industry • Integrate and coordinate European Open Source software research and practice CALIBRE aimed to coordinate the study of the characteristics of open source software projects, products and processes; distributed development; and agile methods. This project integrated and coordinated these research activities to address key objectives for open platforms, such as transferring lessons derived from open source software development to conventional development and agile methods, and vice versa. CALIBRE also examined hybrid models and best practices to enable innovative reorganisation of both SMEs and large institutions, and aimed to construct a comprehensive research road-map to guide future Open Source software research. To secure long-term impact, an important goal of CALIBRE was to establish a European Open Source Industry Forum, CALIBRATION, to coordinate policy making into the future. The CALIBRATION Forum and the results of the CALIBRE project were disseminated through a series of workshops and international conferences in the various partner countries. The first public deliverable of CALIBRE was to present an initial gap-analysis of the academic body of knowledge of Libre Software, as represented by 155 peerreviewed research artifacts. The purpose of this work was to support the wider CALIBRE project goal of articulating a road map for Libre Software research and exploitation in a European context. For the gap-analysis, a representative collection of 155 peer-reviewed Libre Software research artifacts was examined and it was attempted to answer three broad questions about each: • Who are we (the academic research community) looking at? Revision: final 113

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• What questions are we asking? industry • How are we trying to find the answers? The artifacts were predominantly research papers published in international journals or peer reviewed anthologies, and/or presented at international conferences. The papers were discovered through citation indices (e.g. EBSCO, Science-Direct, ACM Portal) and through recursion using the references cited within papers. Peerreview was the key criteria for inclusion, as this represented the official body of knowledge, however two particularly influential non-reviewed books [DOS99, Ray01] were also included. In the second publicly available report of CALIBRE, the development model of Libre software was addressed. This report described what the research community has learnt about those models, and the implications for future research lines that those lessons have. Among the different research approaches applied to understanding Libre software, there was a focus on the empirical study of Libre software development, based on quantitative data, usually available from the public repositories of the studied projects. In the report, the peculiarities of Libre software development from a research perspective were also studied, concluding that it is quite an interesting field in which to apply the traditional scientific methodology, thanks to this wealth of public data, covering large parts of the development activities and results. From this standpoint, the early and current research was reviewed, offering a sample of the most interesting and promising results, the tools, approaches and methodologies used to reach those, and the current trends in the research community. The report ended with two chapters summarising the most important implications of the current research for the main actors of Libre software development (Libre software developers themselves, companies interested in Libre software development, and the software industry in general), and a road-map for the future on this field. This report was not considered as a set of proven recommendations and forecasts. On the contrary, it intended to be a starting point for discussion, trying to highlight those aspects more relevant to its authors, but for sure missing many other of equal (or larger) interest. In the third deliverable of CALIBRE there was a focus on complexity as a major driver of software quality and costs, both in the traditional sense of software complexity and in the sense of complexity theory. The analysis of a benchmark database of 10 large Libre & open-source projects, suggested that: • Risk evaluations could adequately supplement cost estimations of Libre software products • Maintenance teamwork seems to be generally correlated with complexity metrics in large Libre software projects

Revision: final

114

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Libre software projects can be categorised first between small (I-Mode) and large (C-Mode) projects in the context of an entrepreneurial analysis of Libre software, and second thanks to a dynamic and open meta-maintenance forum which would provide a standard quality assessment model to all softwareenabled industries, and specially to the secondary software sector Another deliverable of CALIBRE was to present an overview of the field of distributed development of software systems and applications (DD). Based on an analysis of the published literature, including its use in different industrial contexts, the document provided a preliminary analysis which established the basic characteristics of DD in practice. The analysis resulted in a framework that structured existing DD knowledge by focusing on threats to communication, coordination and control caused by temporal distance, geographical distance, and socio-cultural distance. The purpose of this work was to support the wider CALIBRE project goal of articulating a road-map for DD in relation to Libre Software research and exploitation in a European context. Ultimately, this road-map would form a partial basis for the development of the next generation software development paradigm, which would integrate DD, Libre software and agile methods. The next deliverable of this project provided an analysis of the process dimension for distributed software development. This included an investigation of a number of company case studies in various contexts, and presented a reference model for successful distributed development. This model was tailored for distributed scenarios in which time differences are low, as is the case in intra-EU collaborations. The study was broadened to consider strategies for successful Libre (Free/Open Source) software development, and then consider the technology dimension of distributed development. This deliverable was positioned with respect to a road-map for research in the domain of Libre software development. The establishment of this research road-map was the objective of the next deliverable. This started with a discussion of some of the tensions and paradoxes inherent in FOSS generally, and which served as the engine driving the phenomenon. Then the emergent OSS 2.0 was characterised in terms of the tensions and paradoxes that are associated with it. Furthermore, a number of business strategies that underpin OSS 2.0 were identified. To exemplify the industrial impact of the phenomenon six interviews with leading industrial partners using Libre/OSS in different vertical domains were presented, forming a series of industrial viewpoints. Following this the discussion of the impact of OSS 2.0 was presented for the IS development process, and its wider implications for organisational and societal processes more generally. Finally, this document concluded with a road-map for European research on Libre/OSS summarising and highlighted the history of Free/Libre/OSS, the current status and the areas where more research is needed. Agile Methods (AMs) was the focus of another public deliverable of CALIBRE. AMs have grown very popular in the last few years and so has Libre Software. Both Revision: final 115

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

AMs and Libre Software push for a less formal and hierarchical, and more humancentric development, with a major emphasis on focusing on the ultimate goal of development -producing the running system with the correct amount of functionalities. This deliverable presented an attempt to deepen the understanding of the analogies between the two methods and to identify how such analogies may help in getting a deeper understanding of both. The relationships were analysed theoretically and experimentally, with a final, concrete case study of a company adopting both the XP development process and Libre Software tools. Other deliverables of CALIBRE reported on the groundwork for future research within the CALIBRE project, leading towards the overall project goal of articulating a road-map for Libre Software in the European context. The research was shaped by the concerns expressed by the CALIBRE industry partners in the various CALIBRE events to-date. Specifically, industry partners, notably Paul Everett of the Zope Europe Association (ZEA) have identified that the primary challenge for Libre software businesses was effectively delivering the whole product in a manner that takes account of, and in fact leverages, the unique business model dynamics associated with Libre software licensing and processes. The document described a framework for analysing Libre software business models, an initial taxonomy of model categories, and a discussion of organisational and network agility based on ongoing research within the ZEA membership. Another deliverable of the CALIBRE project presented a selection of product and process metrics defined in various suites, frameworks and categorisations to time. Each metric was analysed for citations and applications to both agile and Libre development approaches. Opportunities for migration and knowledge transfer between these areas were stressed and outlined. The document also summarised product maturity models available for Open Source software and emphasises the need for alternative approaches to shaping Open Source Process maturity models. CALIBRE project has produced the CALIBRE Working Environment (CWE). As a result, a deliverable described the first version of the CALIBRE Working Environment (CWE). The requirements for the system were described, and the way in which the CWE addresses these requirements was identified. The CWE requirements were identified collaboratively, in consultation with its users, and the system as it stands largely meets the needs of the users. The software and hardware used to implement the CWE was described, and areas for further work were identified. The current CWE is located at http://hemswell.lincoln.ac.uk/calibre/ and allows registered members to prepare content, with varying levels of dissemination (public, restricted to registered members and private), upload documents and files, add events to a shared calendar and archive mailing list information. The last publicly available deliverable of CALIBRE focused on Education and training on Libre (Free, Open Source) software. In this report, a scenario which could be considered as the second generation in Libre software training was pre-

Revision: final

116

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

sented: the compendium of knowledge and experiences needed to deal with the many facets of the Libre software phenomenon. For this goal, higher education was considered as the best possible framework. The main guidelines of such a program on Libre software were proposed. In summary, the studies designed in this report were aimed at providing students with the knowledge and expertise that would make them expert in Libre software. The programme provided capabilities and enhances skills to the point that students can deal with problems ranging from the legal or economic areas to the more technically oriented ones. It did not (intentionally) focus on a set of technologies, but approached the Libre software phenomenon from an holistic point of view. However, it was also designed to provide practical and real world knowledge. It could be offered jointly by several universities across Europe, within the framework of the ESHE, or adapted to the specific needs of a single one. In addition, it could also be adapted for non-formal training.

5.2

EDOS

EDOS stands for Environment for the development and Distribution of Open Source software. This is a research project funded by the European Commission as a STREP project under the IST activities of the 6th Framework Programme. The project involves universities - Paris 7, Tel Aviv, Zurich and Geneva Universities -, research institutes - INRIA - and private companies - Caixa Magica, Nexedi, Nuxeo, Edge-IT and CSP Torino. The project aims to study and solve problems associated with the production, management and distribution of Open Source software packages. Software packages are files in the RPM or Debian packaging format that contain executable programs or libraries, their files, along with metadata describing what’s in the package and what conditions are needed to use it. There are several problems associated with software packages. • Dependencies: Software packages may need other software packages to run, and often they don’t tell exactly what other packages they need but leave a large room for choice. Also, some software packages cannot be installed at the same time. This makes the job of tools that automatically download required software packages difficult. Distribution maintainers want to make sure that there is always a way of selecting available packages to correctly install every piece of software they include, and that users can upgrade their systems without loosing functionality. Work package 2 handled these issues. The stated goal of EDOS Work package 2 was: To build new generation tools for managing large sets of software packages, like those found in Free software distributions, using formal methods Revision: final 117

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

The focus was mainly on the issues related to dependency management for large sets of software packages, with a particular attention to what must be done to maintain consistency of a software distribution on the repository side, as opposed to maintaining a set of packages on a client machine. This choice is justified by the fact that maintaining the consistency of a distribution of software packages is essential to make sure the current distributions will scale up, yet it is also an invisible task, as the smooth working it will ensure on the end user side will tend to be considered as normal and obvious as the smooth working of routing on the Internet. In other words, the project was tackling an essential infrastructure problem, which was perfectly suited for an European Community funded action. Over the first year and a half of its existence, Work Package 2 team of the EDOS project has done an extensive analysis of the whole set of problems that are in its focus, ranging from upstream tracking, to thinning, rebuilding, and dependency managements for F/OSS distributions. • Downloading: Users need to download software packages from somewhere. This requires a lots of bandwidth and puts strains on mirrors that host those packages. This problem would be better solved with peer-to-peer methods. Work package 4 handles these issues. The goal of this work package is to investigate scalable and secure solutions to improving the process of distributing data (source code, binaries, documentation and meta-data) to end-users. The key issue in the code distribution process is the ability to transfer a large sized code base to a large number of people. In the case of Mandrake linux, for instance, this entails copying a code base of 20 Gigabytes to a community containing up to 4 million users (i.e. the number of installed versions of Mandrake linux). This community is growing so the problems have to be addressed. Currently the process is quite slow, as it takes 48 hours to copy from a master server to all mirror servers. This creates a latency problem that leads to inconsistencies at the user and developer side. This in turn can create awkward dependencies at the module level in future releases. This work package will test and evaluate two alternative architectures for data distribution that address the issue of latency and consistency. • Quality assurance: The complexity of the quality assurance process increases exponentially with the number of packages and the number of platforms. To maintain the workload manageable, Linux distribution developers are forced to reduce system quality, reduce the number of packages, or accept long delays before final releases of high quality system. Work package 3 handles these issues. The goal of the work package is to research and experiment solutions which will ultimately allow to dramatically reduce the costs and delays of quality assurance in the process of building an industry grade custom GNU/Linux distriRevision: final 118

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

bution or custom application comprising several. It will design, implement and experiment an integrated quality assurance framework based on code analysis and runtime tests, which operates at the system level. • Metrics: Following the “release early, release often” philosophy, Free and Open Source software is always in constant development and any serious project has many versions floating around : older but stable versions, and newer versions with new features but with more bugs. Free software can be of wildly varying quality. Quality metrics are defined, their relevance is assessed and they are implemented. Work package 5 handles these issues. The goal of work package 5 is to develop technology and products that will improve the efficiency of two key processes and one system. The two processes are the generation of a new version of a distribution from the previous version and the production of a customised distribution from an existing one. The system is the current inefficient mechanism of mirroring the Cooker data that needs to be replaced by a more efficient system. In the end, a demonstration that the processes have indeed been improved and the system will take place. Thus, the goal is to define a set of metrics to measure the efficiency of the processes in question. These metrics will include man power as measured in man months and elapsed time. The EDOS project attempts to solve those problems by using formal methods coming from the academic research groups in the project, to address in a novel way three outstanding problems: • Dependency management among large, heterogeneous collections of software packages. • Testing and QA for large, complex software systems. • The efficient distribution of large software systems, using peer-to-peer and distributed data-base technology. These problems were studied and various technical reports were produced explaining their importance and giving ways of mathematically expressing them, algorithms for solving associated problems and real-world statistics. A certain amount of software was also produced which is, of course, Free and Open Source : • Java software for the peer-to-peer distribution of software packages. debcheck/rpmcheck is a very efficient piece of Ocaml software for verifying that a Debian or RPM collection of packages does not contain non-installable packages.

Revision: final

119

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• The day-to-day evolution of the Debian packages, that is, its detailed history, can be browsed using anla. This also gives, for every day, reports on installable software packages and a global installability index for every day (Debian weather). • That history can be queried in the EDOS-designed Debian Query Language using the command-line tool history or the AJAX-based EDOS Console. • Ara is a search engine for Debian packages that allows arbitrary boolean combinations of field-limited regular-expressions, and that ranks results by popularity (again in Ocaml)

5.3

FLOSSMETRICS

FLOSSMetrics stands for Free/Libre Open Source Software Metrics. Industry, SMEs, public administrations and individuals are increasingly relying on Libre (Free, Open Source) software as a competitive advantage in the globalising, service-oriented software economy. But they need detailed, reliable and complete information about Libre software, specifically about its development process, its productivity and the quality of its results. They need to know how to benchmark individual projects against the general level. And they need to know how to learn from, and adapt, the methods of collaborative, distributed, agile development found in Libre software to their own development processes, especially within industry. FLOSSMETRICS addresses those needs by analysing a large quantity (thousands) of Libre software projects, using already proven techniques and tools. This analysis will provide detailed quantitative data about the development process, development actors, and developed artifacts of those projects, their evolution over time, and benchmarking parameters to compare projects. Several aspects of Libre software development (software evolution, human resources coordination, effort estimation, productivity, quality, etc.) will be studied in detail. The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about Libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and industrial exploitation of results. The FLOSSMetrics targets are to: • Identify and evaluate sources of data and develop a comprehensive database structure, built upon the results of CALIBRE (WP1, WP2). • Integrate already available tools to extract and process such data into a complete platform (WP2).

Revision: final

120

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Build and maintain an updated empirical database applying extraction tools to thousands of open source projects (WP3). • Develop visualisation methods and analytical studies, especially relating to benchmarking, identification of best practices, measuring and predicting success and failure of projects, productivity measurement, simulation and cost/effort estimation (WP4, WP5, WP6, WP11). • Disseminate the results, including data, methods and software (WP7). • Provide for exploitation of the results by producing an exploitation plan, validated with the project participants from industry especially from an SME perspective (WP8, WP9, WP10). The main results of FLOSSMETRICS will be: a huge database with factual details about all the studied projects; some higher level analysis and studies which will help to understand how Libre software is actually developed; and a sustainable platform for continued, publicly available benchmarking and analysis beyond the lifetime of this project. With these results, European industry, SMEs, as well as public administrations and individuals will be able to take informed decisions about how to benefit from the competitive advantage of Libre software, either as a development process or in the evaluation and choosing of individual software applications. The project methodologies and findings go well beyond Libre software with implications for evolution, productivity and development processes in software and services in general. FLOSSMETRICS is scheduled in three main phases (running partially in parallel). The first one will set up the infrastructure for the project, and the first version of the database with factual data. During the second phase most of the studies and analysis will be performed, and the contents of the database will be enlarged and improved. During the third phase the results of the project will be validated and adapted to the needs of the target communities. The usability of the results of the project (datasets and studies) will be targeted to several different users: SMEs developing or using Libre software (or even interested in it), industrial players developing Libre software, and the Libre software community at large. Based on the feedback obtained in these contexts, a complete exploitation strategy will also be designed. Dissemination to these communities will be performed using the project website, specific presentations at conferences, and by organising a series of workshops. Wide impact of the results will be supported by using open access licenses for all output documents. The data is also expected to be useful for the scientific community, which could use it for their research lines, thus helping to improve the general understanding of Libre software development.

Revision: final

121

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

The impact of the project is expected to be large in the Libre software development realm (and in the whole software development landscape). FLOSSMETRICS will produce the most complete and detailed view of the current landscape of Libre software, providing not only a static snapshot of how projects are performing now, but also historical information about the last ten years of Libre software development.

5.4

FLOSSWORLD

Free Libre and Open Source Software - Worldwide Impact Study The FLOSSWorld project aims to strengthen Europe’s leadership in research into FLOSS and open standards, building a global constituency with partners from Argentina, Brazil, Bulgaria, China, Croatia, India, Malaysia and South Africa. So far, FLOSSWORLD is a European Union funded project involving 17 institutions from 12 countries spanning Europe, Africa, Latin America and Asia, to undertake a worldwide study on the impact of select issues in the context of Free/Libre Open Source Software (FLOSS). Context Free/Libre/Open Source Software (FLOSS) is arguably one of the best examples of open, collaborative, internationally distributed production and development that exists today, resulting in tremendous interest from around the world, from government, policy, business, academic research and developer communities. The problem Empirical data on the impact of FLOSS, its use and development is still quite limited. The FP5 FLOSS project and FP6 FLOSSPOLS project have helped fill in the gaps in knowledge about why and how FLOSS is developed and used, but have necessarily been focused on Europe. FLOSS is a global phenomenon, particularly relevant in developing countries, and thus more knowledge on FLOSS outside Europe is needed. Project objectives FLOSSWorld primarily aims to strengthen Europe’s leadership in international research in FLOSS and open standards, and to exploit research and policy complementarities to improve international cooperation, by building a global constituency of policy-makers and researchers. It is expected that FLOSSWorld will enhance Europe’s leading role in research in the area of FLOSS and strongly embed Europe in a global network of researchers and policy makers, and the business, higher education and developer communities. FLOSSWorld will enhance the level of global awareness related to FLOSS development and industry, human capacity building, standards and interoperability and e-government issues in the geographical regions covered by the consortium. The project will result in a stronger, sustainable research community in these regions. Broad constituency-building exercises risk losing momentum after initial workshops and meetings without specific actions Revision: final 122

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

to sustain a focus. FLOSSWorld will perform three global empirical studies of proven relevance to Europe and third countries, which will provide a foundation for FLOSSWorld’s regional and international workshops. The studies will cover topics such as impact of being in a FLOSS community on career growth and prospects, motivational factors in choice of FLOSS, perspectives from user community towards FLOSS, inter-regional differences in FLOSS development methodology, etc. A four track approach FLOSSWorld is designed around three research tracks, each providing insights and gathering empirical evidence on important aspects of FLOSS usage and development: 1. Human capacity building: investigating FLOSS communities as informal skills development environments, with economic value for employment 2. Software development: spotting the regional and international differences technical, organisational, business - between FLOSS projects across countries 3. e-Government policy: reporting adopted policies and behaviour of governments around the world towards FLOSS, open standards and interoperability 4. Workshops and working group activities to build an international research and policy development constituency: Following and in parallel with the research tracks will be a fourth track, for regional and international workshops and focused working groups from the represented target regions for building further collaboration. The first phase focuses on actual collaboration by implementing tasks 1 to 3, thus, the second phase focuses on analysis and building concrete future collaborations. Global dissemination is part of the second track, as is the engagement of organisations outside the FLOSSWorld consortium. Schedule FLOSSWorld is funded by 6th Framework Programme and it is a 2-year project. In the following table, there is the schedule of the project. Goals of Workshops During workshops all consortium partners (17 in all) are brought together with additional participants from their countries, and observers from the organisations listed as having provided letters of support to the FLOSSWorld project. Workshop participants are experts representing the interests of the Open Source community, government, businesses, researchers and higher education institutes, as appropriate for the workshop questions. Some participants will take a more active role as specific questions are addressed, but in principle all the three research tracks will be treated in each single workshop.

Revision: final

123

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

Date 1/05/2005 Nov 05 - Mar 2006

Action Start 1st regional workshops

Subject Discuss research questions, interact

Place Buenos Aires, Beijing, Mumbai, Sofia (Bulgaria), Nairobi (Kenya) Brussels, Belgium

26/04/2006 – 28/04/2006 Nov 2005 – Jul 2006 Aug 2006 2006 Oct 2006 2007 – – Sep Feb

1st International Workshop On –going survey and study Analysis 2nd Regional and International Workshop Finalise Recommendations End Discuss survey results, policy issues

Feb – Apr 2007 30/04/2007

On-going survey ing target groups: 1. Private sector

FlossWorld is conducting worldwide surveys among the follow-

2. Government sector 3. Open Source participants community 4. Higher Education Institutes - Administrators 5. Higher Education Institutes - IT Managers Furthermore, there are different questions from country to country to ensure international comparability - i.e. using local currencies in the questionnaire and localised scales when asking about income or expenditure levels, introduction of additional questions that are unique to each country’s context. The FLOSSWorld survey, is at least to become an indicator on local OSS perception, usage and adoption as compared to other countries in the world.

Revision: final

124

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

5.5

PYPY

The PyPy project has been an ongoing Open Source Python language implementation since 2003. In December 2004 PyPy received EU-funding within the Framework Programme 6, second call for proposals ("Open development platforms and services" IST). PyPy is an implementation of the Python programming language written in Python itself, flexible and easy to experiment with. The long-term goals of this project are to target a large variety of platforms, small and large, by providing a compiler tool suite that can produce custom Python versions. Platform, memory and threading models are to become aspects of the translation process - as opposed to encoding low level details into the language implementation itself. Eventually, dynamic optimisation techniques - implemented as another translation aspect - should become robust against language changes. A consortium of 8 (12) partners in Germany, France and Sweden are working to achieve the goal of an open run-time environment for the Open Source Programming Language Python. The scientific aspects of the project is to investigate novel techniques (based on aspect-oriented programming code generation and abstract interpretation) for the implementation of practical dynamic languages. A methodological goal of the project is also to show case a novel software engineering process, Sprint Driven Development. This is an Agile methodology, providing a dynamic and adaptive environment, suitable for co-operative and distributed development. The project is divided into three major phases, phase 1 has the focus of developing the actual research tool - the self contained compiler, phase 2 has the focus of optimisations (core, translation and dynamic) and in phase 3 the actual integration of efforts and dissemination of the results. The project has an expected deadline in November 2006. PyPy is still, though EU-funded, heavily integrated in the Open Source community of Python. The methodology of choice is the key strategy to make sure that the community of skilled and enthusiastic developers can contribute in ways that wouldn’t have been possible without EU-funding.

5.6

QUALIPSO

Goals The Integrated Project (QualiPSo) aims at making a major contribution to the state of the art and practice of Open Source Software. The goal of the QualiPSo integrated project is: to define and implement technologies, procedures and policies to leverage the Open Source Software development current practices to sound and well recognised and established industrial operations. Revision: final 125

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

The project brings together software companies, application solution developers and research institutions and will be driven by the need for having for OSS software the appropriated level of trust which makes OSS development an industrial and wide accepted practice. To reach this goal the QualiPSo project will define, deploy and launch the QualiPSo Competence Centres in Europe (4), Brazil (1) and China (1) all of the making use of the QualiPSo Factory. Exploitation of results will be achieved through different routes, but with the common theme of partners incorporating these results in current or planned products. Under their founders partners the QualiPSo project will be closely related with important OSS Communities such as QbjectWeb and Morfeo. With the economy moving towards new open models, the potential impact of QualiPSo will be across the entire chain of software system development, proposing an integrated approach along many dimensions: • technically, through a focus on complementary problem areas addressed by strong research teams, • industrially, through application partners from different sectors who share a common vision for the potential of services, • managerially, through the creation of a strong management structure based on an entrepreneurial company, • internationally with partners from different countries coming from different continents, • Individually, through strong existing working relationships between partners. The need to sustain and advance the QualiPSo solutions in the future requires an open sustainability approach. QualiPSo is open in the following ways: • its use of open standards and the Open Source software development approach • it is based on an open community to enlarge and enforce its resources and input from researchers, scientists, art professionals and users • it is open to expansion, by inserting new application scenarios and other project results in a "plug and play" manner. The project will be structured into the following classes of activities: • Problem activities: These activities provide the foundation and technological content upon which the project is built. • Legal Issues: This activity addresses the need for a clear legal context in which OSS will be able to evolve within the European Union. Revision: final 126

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Business Models: This activity addresses the need to incorporate new software development models that can cope with the OSS peculiarities. • Interoperability: This activity addresses the needs of the software industry for standards based interoperable software. • Trustworthy Results: This activity addresses the need for the definition of clearly identified and tested quality factors in OSS products. • Trustworthy Processes: This activity addresses the need for the definition of an OSS-aware standard software development methodology. • Project activities. The project activities are cross-cutting activities that take the results generated by the problem activities, integrate them in a coherent framework and assess and improve their applicability using the selected application scenarios. Project activities also include all issues related to industrialisation, dissemination, standardisation, and exploitation of the resulting framework. These activities are the following: • QualiPSo Factory: This activity integrates the results achieved in the prototyping phase of the problem activities to create the QualiPSo environment. • QualiPSo Competence Centre: this activity aims to develop the means for continuous and sustainable (beyond the scope of the project) centralisation of reference information concerning quality OSS development. • Promotion and support: this activity aims develop awareness for the QualiPSo results within the global OSS community. • Demonstration • Training: This activity will focus on providing training services both in classroom and through the internet in order to evangelise the results of QualiPSo. Coordination To achieve its ambitious goal QualiPSo will pursue the following objectives: • Define methods, development processes, and business models for the implementation and deployment of Open Source Software systems to insure intensive software consumers that Open Source projects conform to the standards required to provide industry level software. • Design and implement a specific environment where different tools are integrated to facilitate and support the development of viable industrial OSS systems. This environment will include a secure collaborative platform able to Revision: final 127

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

guarantee that there is no facetious intrusion in the development of code. This also implies that necessary audit of the liability of the software for the IT players to be able to indemnify their users in case of problem caused by the software will be supported. • Implement specific tools for benchmarking to check the expected quality of OSS that will prove non-functional properties, such as robustness and scalability, for supporting major critical applications. The evaluation of these qualities will be carried out in a rigorous, yet practical way what will encompass both static (i.e. related to the structure of OSS) and dynamic (i.e. related to the execution and use of OSS) aspects. • Implement and support better practices in respect to management of information (including source code, documentation and info exchanged between actors involved in a project) in order to improve the productivity of development and evolution of OSS systems. • Demonstrate interoperability which is at the centre of Open Standards commonly implemented in OSS by providing test suites and qualified integration stacks. • Understand the legal conditions by which OSS products are protected and recognised, without violating the OSS spirit. • Develop a long lasting network of professionals concerned by the quality of Open Source Software for the enterprise computing.

5.7

QUALOSS

The strategic objective of this project is to enhance the competitive position of the European software industry by providing methodologies and tools for improving their productivity and the quality of their software products. To achieve this goal, this proposal aims to build a high level methodology to benchmark the quality of Open Source software in order to ease the strategic decision of integrating adequate F/OSS1 components into software systems. The results of the QUALOSS project directly address the strategic objective 2.5.5 of providing methodologies to use Open Source software into industrial development, to enable its benchmarking, and to support its development and evolution. Two main outcomes of the QUALOSS project achieve the strategic objectives by delivering an assessment methodology for gauging the evolvability and robustness of Open Source software and a tool that mostly automate the application of the methodology. Unlike current assessment techniques, ours combines data from software products (its source code, documentation, etc) with data about the developer

Revision: final

128

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

community supporting the software products in order to estimate the evolvability and robustness of the evaluate software products. In fact, QUALOSS takes advantage of information widely available in F/OSS repositories that often contains both kind of information, that is, software product data and data produced by the developer community while developing and maintaining the software product. Although tools aim to automate most of the procedure of applying quality models, it is unlikely that every aspect can be computed hence pointers from the user will be needed. This is why tools will be accompanied by a user manual specifying, first, the manual activities to perform when applying quality models and second, how to use the outcomes of the manual activities in combination with tools to finally estimate the evolvability and robustness of the selected F/OSS component. In the end, tools and the user manual provide the user with integrated assessment methodology to gauge the quality of F/OSS components. Ultimately, tooled methodology reaches the strategic objectives stated above. By integrating more evolvable and robust F/OSS components in their solutions, organisation will spend less time fighting with the F/OSS component hence will be more productive. This proposition will studied through cases studies. This instrumented method will allow increasing the productivity and improving the software quality by integrating evolvable and robust Open Source software. In a more quantifiable way, the targets of QUALOSS project are: • to increase the productivity of software companies by 30% • to decrease the average number of defects by 10% • to decrease the effort to modify a software by 20% The QUALOSS consortium is composed of leading research organisations in the field of measurement, software quality and Open Source as well as a panel of industry representatives (including SMEs) involved in Open Source projects.

5.8

SELF

SELF will be a web-based, multi-language, free content knowledge base written collaboratively by experts and interested users. The SELF Platform aims to be the central platform with high quality educational and training materials about Free Software and Open Standards. It is based on world-class Free Software technologies that permit both reading and publishing free materials, and is driven by a worldwide community. The SELF Platform is a repository with free educational and training materials on Free Software and Open Standards and an environment for the collaborative creation of new materials. Inspired by Wikipedia, the SELF Platform provides the materials in different languages and forms. The SELF Platform is also an instrument Revision: final 129

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

for evaluation, adaptation, creation and translation of these materials. Most importantly, the SELF Platform is a tool to unite community and professional efforts for public benefit. The general strategic objectives of the SELF project are: • Bring together universities, training centres, Free Software communities, software companies, publishing houses and government bodies to facilitate mutual support and exchange of educational and training materials on Free Software and Open Standards. • Centralise, transmit and enlarge the available knowledge on Free Software and Open Standards by creating a platform for the development, distribution and use of information, educational and training programmes about Free Software and its main applications. • Raise awareness and contribute to the building of critical mass for the use of Free Software and Open Standards. The concrete project objectives of the SELF project are: • Research the state of the art of currently available Free Software educational and training programmes and detect the potential gaps. • Create an open platform for the development, distribution and use of information, educational and training programmes on Free Software and Open Standards. • Develop educational and training materials concerning Free Software and Open Standards. The project aims for including information on at least 50 software applications in the initial period. • Make the SELF platform self-sustainable by creating an active community of individuals and institutions (universities, training centres, Free Software communities, software companies, publishing houses and government bodies) around it. The SELF project aims for involving at least 150 members in the SELF community by the end of the project. While the SELF platform will be started by the members of the consortium, its final goal is to become a community of different interested parties (from governments and educational institutes to companies) that can not only exploit the SELF materials but also participate in its production. The commercial and educational interests of exploiting the SELF materials will assure the self-sustainable character of the SELF Platform beyond the EC funding period. The SELF Project aims for involving at least 150 members in the SELF community by the end of the project. This project starts from three main assumptions: Revision: final 130

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

1. Free Software and Open Standards are crucial to support the competitive position of the European software industry. 2. The real and long term technological change from private to Free Software can only come by investing in education and training. 3. The production of educational and training materials on Free Software and Open Standards should be done collaboratively by all the parties involved. That is why the SELF platform will have two main functions. It will be simultaneously a knowledge base and a collaborative production facility. On the one hand, it will provide information, educational and training materials that can be presented in different languages and forms: from course texts, presentations, e-learning programmes and platforms to tutor software, e-books, instructional and educational videos and manuals. On the other hand, it will offer a platform for the evaluation, adaptation, creation and translation of these materials. The production process of such materials will be based on the organisational model of Wikipedia. In short, SELF will be a web-based, multi-language, free content knowledge base written collaboratively by experts and interested users.

5.9

TOSSAD

Europe, as a whole, has a stake in improving the usage of F/OSS in all branches of IT and public life, in general. F/OSS communities throughout Europe can achieve better results through co-ordination of their research activities/programmes that reflect the current state-of-the-art. The main objective of the tOSSad project is to start integrating and exploiting already formed methodologies, strategies, skills and technologies in F/OSS domain in order to help governmental bodies, educational institutions and SMEs to share research results, establish synergies, build partnerships and innovate in an enlarged Europe. More precisely, the tOSSad project aims at improving the outcomes of the F/OSS communities throughout Europe through supporting the coordination and networking of these communities by means of state-of-the-art studies, national program initiations, usability cases, curriculum development and implementation of collaborative information portal and web based groupware. Main tOSSad coordination activities are: • F/OSS study (Work package 1) • F/OSS national programs (Work package 2) • F/OSS usability study (Work package 3)

Revision: final

131

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• F/OSS curriculum development (Work package 4) • Dissemination and exploitation (Work package 5) Work package 1 has the intention of producing a report detailing both the current status of F/OSS adoption in European countries, and the barriers that such future adoption might face. It has the intention of producing a report detailing both the current status of F/OSS adoption in European countries, and the barriers that such future adoption might face. The main goal is to give a clear picture of the current status (usage, implementation, adoption, penetration, government policies, etc.) of F/OSS related to following topics: • The technical barriers that hinder F/OSS usage in a larger scale • Infra-structural weaknesses in some European countries • Usability and accessibility • Operating system specific technical problems • Social barriers that hinder F/OSS usage in a larger scale • Educational weakness • Cultural readiness • Political and financial problems • Market problems (existing monopolies of any sort) • Current and future trends and opportunities The main deliverable of the WP1 is a report entitled ”F/OSS Study”. Work package 2 aims to start up national programmes for improved usage of F/OSS in some of the target countries and develop guidelines that will be used for F/OSS adoption in the public sector. This Work package aims to start up national programmes for improved usage of F/OSS in some (at least one) of the target countries and develop guidelines that will be used for F/OSS adoption in the public sector. As part of this Work package, an expert group (containing individuals from partners, as well as policy makers from governmental bodies) will be established in the kickoff meeting in which all participants will attend. This expert group will also help national and regional government institutes understand the benefits of F/OSS and Open Source components where possible. A main goal of Work package WP2 is to produce a road-map for F/OSS adoption. The deliverables of the work package are designed according to this main goal. Work package 2 tasks: Revision: final 132

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Organising one workshop aiming to determine the requirements for national programs with special focuses on best practices and success stories, F/OSS in the public sector and migration strategies. • Preparing research documents which can be proposed to be added to the National ICT Programmes. These documents should focus on the following items: • Usability centres, F/OSS R&D and solution centres • Making use of F/OSS for e-learning • F/OSS training and certification solutions for IT people, developers and users making use of existing or new training institutions • Catalysing the formation of Open Source communities and participation in the development of Open Source software as part of global projects • Collaborative models of joint development between F/OSS target countries" and MS with superior F/OSS adoption. • Building partnerships within the public and private sectors and civil society, as well as regionally within Europe. • Preparing not only high-line case histories, but also all the details needed to copy and implement F/OSS solutions locally. • Lobbying the national strategies decision makers in public sector by putting forward reports on economical and social benefits of F/OSS usage. These reports can include success stories in Europe and worldwide. • Developing guidelines towards F/OSS adoption and dissemination in public bodies. The major objectives of Work package 3 are to tackle the obstacles and leading to a breakthrough of usability in F/OSS, by assuring that usability will be paid more attention in F/OSS in the future. Within the usability work package of tOSSad the major objectives are to tackle obstacles and leading to a breakthrough of usability in F/OSS, by assuring that usability will be paid more attention in F/OSS in the future. To reach these objectives, besides the intensive spreading of awareness, the following three major areas will be addressed within Work package 3: • State of the art usability based on both in depth desk research and an empirical survey in F/OSS. If appropriate, the survey will be integrated into the empirical investigations conducted in Work package 1

Revision: final

133

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

• Usability test of selected F/OSS components with a specific focus towards desktop applications, personal information management (PIM) and office applications • Based on the test results and research in the area of tomorrow’s usability requirements (thinking of mobile end devices, voice interaction, wearable) F/OSS gaps will be detected. Derived from these recommendations for future research directions will generated • Guideline taking both the attention of usability aspects during F/OSS development and the conduction of usability testing into account Thereby a recurrent user involvement for usability assurance during shared developments via mockups for inclusion in F/OSS development environment will be focused. Work package 4 gathers partners with deep and complementary knowledge in software engineering, university curricula development, e-learning and collaborative learning, application of Open Source methodology and business models to real world problems. WP4 partners shall work together in order to define one or more broadly accepted, detailed curricula for F/OSS. There will be a focus in particular on items 2 and 3 below (courses and curricula about F/OSS operating system Linux related system applications, and courses and and F/OSS software development tools), not excluding studying and giving suggestions on items 1, 4 and 5. Work package 4 curriculum development items are as follows: • Courses and curricula about using the most popular F/OSS desktop applications - F/OSS office automation software, mail applications, Web browsers, Wiki’s, etc. - even on proprietary operating systems. • Courses and curricula about F/OSS server application & management - Linux operating system Application Server (Tomcat), Web Server (Apache), databases, middleware, and related system applications. • Courses and curricula about F/OSS software development tools IDE (Eclipse), Versioning System and related tools. • Courses and curricula about how to develop and take advantage of F/OSS software and software engineering of F/OSS. They are related to ongoing research on methodologies and tools for F/OSS development, and aim to train software developers able to build, customise and consult on F/OSS applications, being active members of the F/OSS development community. • Use of F/OSS software in computer science courses and curricula, as a cheap and powerful mean to help understanding the computer science concepts.

Revision: final

134

References
[A.I89] A.I.Wasserman. The architecture of case environments. look,pp 13-22, 1989. CASE Out-

[Alb79]

A. J. Albrecht. Measuring application development. Proceedings of IBM Applications Development Joint SHARE/GUIDE Symposium, Monterey, CA, pp 83-92, 1979. Ioannis P Antoniades, Ioannis Stamelos, Lefteris Angelis, and George . Bleris. A novel simulation model for the development process of open source software projects. Software Process Improvement and Practice, vol.7, pp 173-188, 2002. Ioannis P Antoniades, Ioannis Samoladas, Ioannis Stamelos, Lefteris An. gelis, and George Bleris. Dynamical Simulation Models of the Open Source Development Process, chapter 8, pages 174–202. Idea Group Inc., 2005. Victor Basili, Lionel Briand, and Walcelio Melo. A validation of objectoriented design metrics as quality indicators. IEEE Transactions on Software Engineering, Vol. 22, No. 10, pp 751-761, 1996. V. Basili, G. Caldiera, and D. Rombach. Encyclopedia of Software Engineering, Vol. 1, pages 528-532. John Wiley and Sons, 1994.

[ASAB02]

[ASS+ 05]

[BBM96]

[BCR94]

[BDPW98] L. Briand, J. Daly, V. Porter, and J. Wuest. A comprehensive empirical validation of product measures for object-oriented systems. IEEE METRICS Symposium, Washington D.C, USA, 1998. [BEM95] Lionel Briand, Khaled El Emam, and Sandro Morasca. Theoretical and empirical validation of software product measures, technical report isern-95-03. Technical report, ISERN, 1995. Nikolai Bezroukov. Open source software development as a special type of academic research (critique of vulgar raymondism). Nikolai Bezroukov. A second look at the cathedral and the bazaar.

[Beza]

[Bezb]

[BGD+ 06] C. Bird, A. Gourley, P Devanbu, M. Gertz, and A. Swaminathan. Min. ing email social networks. In Proceedings of International Workshop on Mining Software Repositories (MSR-06)., 2006. [BL96] M. Berry and G. Linoff. Data Mining Techniques For marketing, Sales and Customer Support. John Willey and Sons Inc., 1996.

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[BR03]

A. Bonaccorsi and C. Rossi. Why open source can succeed. opensource.mit.edu/papers/rp-bonaccorsirossi.pdf, 2003.

http://

[Bro75]

Frederick P Brooks. The Mythical Man-Month: Essays on Software En. gineering. Addison-Wesley, 1975.

[BYRN99] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley., 1999. [CC05] G. Canfora and L. Cerulo. Impact analysis by mining software and change request repositories. In Proceedings of the 11th IEEE International Software Metrics Symposium (METRICS-05)., 2005. M.L. Collard and J.K. Hollingsworth. Meta-differencing: An infrastructure for source code difference anlysis. Kent State University, Kent, Ohio USA, Ph.D. Dissertation Thesis, 2004. S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann., 2002. K. Crowston, J. Howison, and H. Annabi. Information systems success in free and open source software development: Theory and measures. Software Process Improvement and Practice, 11, pp. 123-148, 2006. S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Transactions in Software Engineering, Vol. 20, 1994, pp. 476-493, 1976. Andrea Capiluppi, Patricia Lago, and Maurizio Morisio. Software engineering metrics: What do they measure and how do we know? 10th International Software Metrics Symposium, METRICS 2004, 2004. Andrea Capiluppi, Maurizio Morisio, and Juan F. Ramil. Structural evolution of an open source system: a case study. In Proceedings of the 12th IEEE International Workshop on Program Comprehension (IWPC), Bari, Italy, June 24-26, 2004, 2004. S.M. Conlin. Beyond low-hanging fruit: Seeking the next generation in floss data mining. In IFIP International Federation for Information Processing (IFIP), Vol. 203, Open Source Systems, pp. 261-266,, 2006. R.O. Duda and P Hart. Pattern Classification and Scene Analysis. John .E. Wiley and Sons, 1973. Chris DiBona, Sam Ockman, and Mark Stone. Open Sources: Voices from the Open Source Revolution. OReilly and Associates, 1999. 136

[CH04]

[Cha02]

[CHA06]

[CK76]

[CLM04]

[CMR04]

[Con06]

[DH73]

[DOS99]

Revision: final

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[DSA+ 04]

I. Deligiannis, I. Stamelos, L. Angelis, M. Roumeliotis, and Shepperd. M. A controlled experiment investigation of an object oriented design heuristic for maintainability. The Journal of Systems and Software, 72, pp 129-143, 2004.

[DSRS03] I. Deligianis, M. Shepperd, M. Roumeliotis, and I Stamelos. An empirical investigation of and object oriented design heuristic for maintainability. The Journal of Systems and Software, 65, pp 127-139, 2003. [DTB04] Trung Dinh-Trong and James Bieman. Open source software development: A case study of freebsd. In Proceedings of the 10th IEEE International Symposium on Software Metrics, 2004. C. Ding and H. Zha. Spectral clustering, ordering and ranking statistical learning. Springer Verlag, Computational Science and Engineering., 2007. C. Fellbaum. WordNet – an electronic lexical database. MIT Press., 1998. U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributed for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993.

[DZ07]

[Fel98] [FI93]

[FLMP04] P Francis, D. Leon, M. Minch, and A. Podguraki. Tree-based method for . classifying software failures. In Proceedings of the 15th International Symposium on Software Reliability Engineering, 2004. [FP97] Norman Fenton and Shari Lawrence Pfleeger. Software Metrics - A Rigorous Approach. International Thomson Publishing, London, 1997.

[FPSSR96] U. M. Fayyad, G. Piatesky-Shapiro, P Smuth, and Uthurusamy R. Ad. vances in Knowledge Discovery and Data Mining. AAAI Press, 1996. [Fug93] A. Fuggetta. A classification of case technology. Computer, 26(12):25-38, 1993. D. M. German. An empirical study of fine-grained software modifications. In Proceedings of 20th IEEE International Conference on Software Maintenance (ICSM’04), 2004. Daniel German. Software process improvement and practice, vol.8, 2004. Software Process Improvement and Practice, 8:201–215, 2004. Tibor Gyimøthy, Rudolf Ferenc, and IstvÃan Siket. Empirical validation s ˛ of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software Enginering, 31(10):897–910, 2005. 137

[Ger04a]

[Ger04b]

[GFS05]

Revision: final

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[GHJ98]

H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based on product release history. In Proceedings of the 14th IEEE International Conference in Software Maintainance, 1998. A.R. Ghosh. Clustering and dependencies in free/open source software development: Methodology and tools. Firstmonday, 8(4), 2004. D. German and A. Mockus. Automating the measurement of open source projects. In Proceedings of the 3rd Workshop on Open Source Software Engineering, 25th International Conference on Software Engineering (ICSE-03)., 2003. M. German and A. Mockus. Automating the measurement of open source projects. In Proceedings of the First International Conference on Open Source Systems. Genova, Italy, pp. 100-107, 2003. Michael W. Godfrey and Qiang Tu. Evolution in open source software: A case study. 16th IEEE International Conference on Software Maintenance (ICSM’00), 2000. M. H. Halstead. Elements of software science. 1977. A. Hassan and R.C. Holt. Predicting change propagation in software systems. In Proceedings of 26th International Conference on Software Maintenance (ICSM’04), 2004. S. M. Henry and D. Kafura. Software structure measurements based on information flow. IEEE Transactions in Software Engineering, Vol. SE-7, 1981, pp. 510-518, 1976. Koch S. Hahsler M. Discussion of a large-scale open source data collection methodology. In Proceedings of the 38th Hawaii International Conference on System Sciences (IEEE, HICSS ’05-Track 7), Jan 03-06, Big Island, Hawaii, page 197b., 2005. Clemente Izurieta and James Bieman. The evolution of freebsd and linux. In ACM/IEEE International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil, 21-22 September, 2006, 2006. IEEE. Standard for a software quality metrics methodology, revision. IEEE Standards Department., 1998.

[Gho04]

[GM03a]

[GM03b]

[GT00]

[Hal77] [HH04]

[HK76]

[HM05]

[IB06]

[IEE98]

[Irb] [IV98] N.M. Ide and J. Veronis. Word sense disambiguation: The state of the art. Computational Linguistics, 24:1–40., 1998. 138

Revision: final

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[JD88]

A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988. T. Jokikyyny and C. Lassenius. Using the internet to communicate software metrics in a large organization. In Proceedings of GlobeCom’99, 1999. C. Jones. Backfiring: Converting lines of code to function points. IEEE Computer, Vol. 28, No. 11, pp 87-88, 1995. C. Jensen and W. Scacchi. Data mining for software process discovery in open source software development communities. In Proceedings of International Workshop on Mining Software Repositories (MSR-04)., 2004. Stephen H. Kan. Metrics and Models in Software Quality Engineering. Addison Wesley Professional, 2003. C. Kaner and P B. Bond. Software engineering metrics: What do they . measure and how do we know? In Proceedings of the 10th International Software Metrics Symposium., 2004. Cem Kaner and Walter Bond. Software engineering metrics: What do they measure and how do we know? 10th International Software Metrics Symposium, METRICS 2004, 2004. H. Kagdi, L. Colland, and J. Maletic. Towards a taxonomy of approaches for mining of source code repositories. In Proceedings of International Workshop on Mining Software Repositories(MSR), 2005.

[JL99]

[Jon95]

[JS04]

[Kan03]

[KB04a]

[KB04b]

[KCM05]

[KDTM06] Y. Kannelopoulos, Y. Dimopoulos, C. Tjortjis, and C. Makris. Mining source code elements for comprehensing object-oriented systems and evaluating their maintainability. SIGKDD Explorations, Vol.8, Issue 1, 2006. [KPP+ 02] Barbara Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Peter W. Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg. Premilinary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering, Vol. 28, No. 8, pp 721-733, 2002. L. Kauffman and P Rousseeuw. Finding Groups in Data: An Introduction .J. to Cluster Analysis. John Wiley and Sons, 1990. R. Kempkens, P Rsch, L. Scott, and J. Zettel. Instrumenting measure. ment programs with tools. technical report 024.00/e. Technical report, Fraunhofer IESE, March 2000. 139

[KR90]

[KRSZ00]

Revision: final

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[KSL03]

G. Krogh, S. Spaeth, and K. Lakhani. Community, joining, and specialisation in open source software innovation: a case study. Research Policy, Vol. 32, pp. 1217-1241, 2003. S. Komi-Sirvi, P Parviainen, and J. Ronkainen. Measurement automa. tion: Methodological background and practical solutions-a multiple case study. n Proceedings of the 7th International Software Metrics Symposium (Metrics 2001), London, 2001. G Koru and J. Tian. Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Transactions on Software Engineering, Vol. 31, No. 6, pp 625-642, 2005. On line Document. Business readiness rating for open source. BRR 2005 - RFC 1, http://www.openbrr.org, 2005. M. Last, M. Friedman, and A. Kandel. The data dimining approach to automated software testing. In Proceeding of the SIGKDD Conference, 2005. M. Lorenz and J Kidd. Object Oriented Software Metrics, A Practical Guide. Prentice-Hall, Englewood Cliffs, N.J., 1994. Hippel von E. Lakhani K. How open source software works: "free" userto-user assistance. Research Policy, 32:923–943., 2003. M. Last and A. Kandel. Automated test reduction using an info-fuzzy network. Annals of Software Engineering, Special Volume on Computational Intelligenece in Software Enginnering, 2003.

[KSPR01]

[KT05]

[lD05]

[LFK05]

[LK94]

[LK03a]

[LK03b]

[LRW+ 97] M. M. Lehman, J. F. Ramil, P D. Wernick, D. E. Perry, and W. M. Turski. . Metrics and laws of software evolution - the nineties view. 4th International Software Metrics Symposium (METRICS’97), 1997. [Mar04] Michlmayr Martin. Managing volunteer activity in free software projects. In Proceedings of the 2004 USENIX Annual Technical Conference, Freenix Track, pp.93-102., 2004. T. J. McCabe. A complexity measure. IEEE Transactions in Software Engineering, Vol. 2, No. 4, December 1976, pp. 308-320, 1976. Audris Mockus, Roy T. Fielding, and James Herbsleb. Two case studies of open source software development: Apache and mozilla. ACM Transactions on Software Engineering and Methodology, vol.11, no.3, 2002.

[McC76]

[MFH02]

Revision: final

140

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[Mit05]

R. Mitkov. The Oxford Handbook of Computational Linguistics. Oxford University Press., 2005. M. Mendonca and Sunderhaft N. Mining software engineering data: A survey. Report (SPO700-98-D-400), 1999. C.D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press., 1999. R. Mihalcea, P Tarau, and E. Figa. PageRank on semantic networks, . with application to word sense disambiguation. In Proceedings of the 20th International Conference on Computational Linguistics (COLING04)., 2004.

[MN99]

[MS99]

[MTF04]

[MTV+ 05] D. Mavroeidis, G. Tsatsaronis, M. Vazirgiannis, M. Theobald, and G. Weikum. Word sense disambiguation for exploiting hierarchical thesauri in text classification. In Proceedings of the 9th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-05)., 2005. [OH94] P Oman and J. Hagemeister. Constructing and testing of polynomials . predicting software maintainability. Journal of Systems and Software 24, 3 (March 1994): 251-266., 1994. David Lorge Parnas. Software aging. Proceedings of the 16th International Conference on Software Engineering, 1994. Shari Lawrence Pfleeger. Software Engineering: Theory and Practice. Prentice-Hall, 2nd edition, 2001.

[Par94]

[Pfl01]

[PMM+ 03] A. Podgurski, W. Masri, Y. McCleese, M. Minch, J. Sun, B. Wang, and W. Masri. Automated support for classifying software failure reports. In Proceedings of the 25th International Conference on Software Engineering, 2003. [PSE04] James W. Paulson, Giancarlo Succi, and Armin Eberlein. An empirical study of open-source and closed-source software products. IEEE Transactions on Software Engineering, vol.30, No.4, pp 246-256, 2004. Eric Steven Raymond. The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O’Reilly and Associates, 1999. Eric Steven Raymond. How to become a hacker, 2001.

[Ray99]

[Ray01]

Revision: final

141

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[RGBG04] Gregorio Robles, JesÞs M. GonzÃalez-Barahona, and Rishab Aiyer ˛ Ghosh. Gluetheos: Automating the retrieval and analysis of data from publicly available repositories. Proceedings of the Mining Software Repositories Workshop. 26th International Conference on Software Engineering (Edinburgh, Scotland), 2004. [Rie96] Arthur J. Riel. Object Oriented Design Heuristics. Addison Wesley Professional, 1996.

[RKGB04] Gregorio Robles, Stefan Koch, and Jesus M. Gonzalez-Barahona. Remote analysis and measurement of libre software systems by means of the CVSAnalY tool. In Proceedings of the 2nd ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS), Edinburg, Scotland, UK, 2004. [Rob05] Gregorio Robles. EMPIRICAL SOFTWARE ENGINEERING RESEARCH ON LIBRE SOFTWARE: DATA SOURCES, METHODOLOGIES AND RESULTS. PhD thesis, Dept. of Informatics. Universidad Rey Juan Carlos, Madrid, Spain., 2005. S. Raghavan, R. Rohana, and A. Podgurski. Dex: A semantic-graph differencong tool for studying changes in large code bases. In Proceedings of 20th IEEE International Conference on Software Maintenance (ICSM’04), 2004. Peter H. Salus. The daemon, the gnu and the penguin.

[RRP04]

[Sal]

[SAOB02] Ioannis Stamelos, Lefteris Angelis, Apostolos Oikonomou, and Georgios L. Bleris. Code quality analysis in open source software develop˘S ment. Information Systems Journal, 12(1):43âA¸ 60, 2002. [Sch92] Norman Schneidewind. Methodology for validating software metrics. IEEE Transactions on Software Engineering, Vol. 18, No. 5, pp 410-422, 1992. W. Stevens, G. Myers, and L. Constantine. Structured design. IBM Systems Journal, 13, 2, 1974. S.K. Sowe, I. Stamelos, and L. Angelis. Identifying knowledge brokers that yield software engineering knowledge in oss projects. Information and Software Technology, 48, 11(November 2006): 1025-1033, 2006. Ioannis Samoladas, Ioannis Stamelos, Lefteris Angelis, and Apostolos Oikonomou. Open source software development should strive for even greater code maintainability. Communications of the ACM, ˘S 47(10):83âA¸ 87, 2004. 142

[SMC74]

[SSA06]

[SSAO04]

Revision: final

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[Tur96]

Wladyslaw M. Turski. Reference model for smooth growth of software systems. IEEE Transactions on Software Engineering, vol.22, no.8, 1996. G. Tsatsaronis, M. Vazirgiannis, and I. Androutsopoulos. Word sense disambiguation with spreading activation networks generated from thesauri. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07)., 2007. E.M. Voorhees. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th International Conference on Research and Development in Information Retrieval (SIGIR-93)., 1993. L. Voinea and A. Telea. Mining software repositories with cvsgrab. In Proceedings of International Workshop on Mining Software Repositories (MSR-06)., 2006. Gerald M. Weinberg. The Psychology of Computer Programming. Van Nostrand Reinhold, 1971. C.C. Williams and J.K. Hollingsworth. Automating mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering 31(6):466–480., 2005. S.M. Weiss and C. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. Morgan Kauffman, 1991. K. D. Welker and P W. Oman. Software maintainability metrics models . in practice. Crosstalk, Journal of Defense Software Engineering 8, 11 (November/December 1995): 19-23, 1995. E. Yourdon and P Coad. Object-Oriented Design. Prentice-Hall, Engle. wood Cliffs, N.J., 1991. L. Yu, S. Schach, K. Chen, G. Heller, and J. Offnutt. Maintainability of the kernels of open source operating systems: A comparison of linux with freebsd, netbsd and openbsd. The Journal of Systems and Software, 79, 807-815, 2006.

[TVA07]

[Voo93]

[VT06]

[Wei71]

[WH05]

[WK91]

[WO95]

[YC91]

[YSC+ 06]

[YSCO04] Liguo Y., S.R. Schach, K. Chen, and J. Offnutt. Categorization of the common coupling and its application to the maintainability of the linux kernel. IEEE Transaction on Software Engineering, Vol. 30, No. 10, pp 694-706, 2004.

Revision: final

143

D2 / IST-2005-33331

SQO-OSS

22nd January 2007

[ZWDZ04] T. Zimmermann, P Weibgerber, S. Diehl, and A. Zeller. Mining version . histories to guide software changes. In Proceedings of 26th International Conference on Software Engineering (ICSE’04), 2004. XML Extensible Markup Language SQL Structured Query Language HTML Hypertext Markup Language IDE Integrated Development Environment UML Unified Modeling Language CSV Comma Separated Values COTS Commercial off-the-shelf

Revision: final

144

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.