You are on page 1of 106
Quality Metrics for Maintainability of Standard Software Master Thesis Dipl.-Ing. Oleksandr Panchenko Mentors: Matr.Nr. 724084 Dr.-Ing.

Quality Metrics for Maintainability of Standard Software

Master Thesis

Dipl.-Ing. Oleksandr Panchenko

Mentors:

Matr.Nr. 724084

Dr.-Ing. Bernhard Gröne, Hasso-Plattner-Institute for IT Systems Engineering

Dr. Albert Becker, SAP AG, Systems Applications Products in Data Processing

23.02.2006, Potsdam, Germany

Abstract

The handover of software from development to the support department is accompanied by many tests and checks, which prove the maturity and the readiness for „go to market“. However, these quality gates are not able to assess the complexity of the entire product and predict the effort of maintenance. This work aims the researching of metric-based quality indicators in order to be able to assess the most important maintainability aspects of the standard software. The static source code analysis is selected as the method for mining information about the complexity. The research of this thesis is restricted to the ABAP and Java environment. The used quality model is derived from the Goal Question Metric approach and extends it for purposes of the current thesis. After literature research, the quality model was expanded by standard metrics and some special newly invented metrics. The selected metrics were validated theoretically against numerical properties using Zuse’s software measurement framework and practically against the ability to predict the maintainability using experiments. After experiments with several SAP-projects, some metrics were recognized as reliable indicators of the maintainability. Some other metrics can be used to find non-maintainable code and provide additional metric-based audits. For semi- automated analysis, few tools were suggested and an XSLT converter was developed in order to process the measurement data and prepare reports. This thesis should prepare the basis for further implementation and usage of the metrics.

Zusammenfassung

Vor der Softwareübergabe von der Entwicklung zur Wartung werden zahlreiche Tests und Untersuchungen durchgeführt, die überprüfen sollen, ob das Produkt bereits reif genug ist, um an den Markt zu gehen. Obwohl die Qualitätskontrollen sehr umfangreich sind, wurden die gesamte Softwarekomplexität und der Aufwand für die zukünftige Wartung bisher kaum berücksichtigt. Deshalb setzt sich die vorliegende Arbeit zum Ziel, die verschiedenen auf Metriken basierten Qualitätsindikatoren, die die wichtigsten Aspekte der Wartbarkeit von Standardsoftware einschätzen, zu untersuchen. Als Komplexitätsanalysemethode wurde die statische Quellcodeanalyse ausgewählt. Die Untersuchung ist auf die ABAP- und Java-Umgebung beschränkt. Das Qualitätsmodell ist von der „Goal Question Metric“ - Methode abgeleitet und auf die Anforderungen der vorliegenden Arbeit angepasst. Nach ausführlicher Literaturrecherche wurde das Qualitätsmodell um bereits vorhandene und neu entwickelte Metriken erweitert. Die numerischen Eigenschaften der ausgewählten Metriken wurden mit Hilfe des Messsystems von Zuse theoretisch validiert. Um die Aussagefähigkeit von Metriken einzuschätzen, wurden praktische Studien durchgeführt. Experimente mit ausgewählten SAP-Projekten bestätigten einige Metriken als zuverlässige Wartbarkeitsindikatoren. Andere Metriken können verwendet werden, um nicht wartbaren Programmcode zu finden und zusätzliche auf Metriken basierte Audits zu liefern. Für ein halbautomatisches Vorgehen wurden einige Werkzeuge ausgewählt und zusätzlich eine XSLT entwickelt, um Messdaten zu aggregieren und Berichte vorzubereiten. Die vorliegende Arbeit soll sowohl als Grundlage für weitere Forschungen als auch für zukünftige Implementierungen dienen.

Abbreviations

A

Abstractness

ABAP

Advanced Business Application Programming (Language)

AMC

Average Method Complexity

AST

Abstract Syntax Tree

Ca

Afferent Coupling

CBO

Coupling between Objects

CDEm

Class Definition Entropy (modified)

Сe

Efferent Coupling

COBISOME Complexity Based Independent Software Metrics CLON Clonicity

CQM

Code Quality Management

Distance from Main Sequence

Degree of Cohesion (indirect)

Information

CR

Comments Rate

CYC

Cyclic Dependencies

D

D2IMS

Development to IMS

DCD

Degree of Cohesion (direct)

DCI

DD

Defect Density

DIT

Depth of Inheritance Tree

DOCU

Documentation Rate

FP

Function Points

FPM

Functions Point Method

GQM

Goal Question Metric

GVAR

Number of Global Variables

H

Entropy

I

IF

Inheritance Factor

IMS

Installed Base Maintenance & Support

In

Instability

ISO

International Standards Organization

KPI

Key Performance Indicators

LC

Lack of Comments

LCC

Loose Class Cohesion

LCOM

Lack of Cohesion of Methods

LOC

Lines Of Code

LOCm

Average LOC in methods

m

Structure Entropy

MCC

McCabe Cyclomatic Complexity

Maintainability Index

Halstead volume

MEDEA

Metric Definition Approach

MI

MTTM

Mean Time To Maintain

NAC

Number of Ancestor Classes

NDC

Number of Descendent Classes

NOC

Number Of Children in inheritance tree

NOD

Number Of Developers

NOM

Number of Methods

NOO

Number Of Objects

NOS

Number of Statements

OO-D

OO-Degree

PIL

Product Innovation Lifecycle

RFC

Response For a Class

SAP

Systems, Applications and Products in Data Processing

SMI

Software Maturity Index

TCC

Tight Class Cohesion

U

Reuse Factor

UML

Unified Modeling Language

XML

eXtensible Markup Language

XSLT

eXtensible Stylesheet Language Transformation

V

WMC

Weighted Methods per Class

ZCC

ZIP-Coefficient of Compression

Table of content

1. Introduction

. 2. Research problem description .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

11

13

Different methods for source analysis

.

.

.

.

.

13

Metrics vs. audits

.

.

.

.

.

.

13

.

.

.

.

.

.

15

Types of maintenance

.

.

.

.

.

.

.

.

.

.

.

.

.

15

Goal of the work

.

16

.

.

.

.

.

.

17

. Maintainability index (MI) .

.

.

.

.

.

.

17

Functions point method (FPM)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

17

Key performance indicators (KPI)

.

18

Maintainability assessment

.

18

Abstract syntax tree (AST) .

.

19

Complexity based independent software metrics (COBISOME)

Kaizen

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

19

19

ISO/IEC 9126 quality model

.

19

4. Quality model – goals and questions .

.

.

.

.

.

20

Goal Question Metric approach

.

.

.

.

.

.

20

Quality model

. Size-dependent and quality-dependent metrics

.

.

.

.

.

.

.

.

.

21

24

5. Software quality metrics overview

.

.

.

.

.

.

25

Model: Lexical model

. Metric: LOC – Lines Of Code .

.

.

.

.

.

. Metric: CR – Comments Rate, LC – Lack of Comments

.

.

.

.

.

25

25

26

Metric: CLON – Clonicity .

.

.

.

.

.

26

Short introduction into information theory and Metric:

CDEm – Class Definition Entropy (modified)

.

.

.

27

Model: Flow-graph

. Metric: MCC – McCabe Cyclomatic Complexity

.

.

.

Model: Inheritance hierarchy

.

.

.

.

.

.

.

.

.

35

35

36

. Metric: NAC – Number of Ancestor Classes

.

.

.

37

Metric: NDC – Number of Descendant Classes .

.

.

37

Geometry of Inheritance Tree

.

.

.

.

.

37

Metric: IF – Inheritance Factor

.

.

.

.

.

40

Model: Structure tree

.

.

.

.

.

.

.

40

Metric: CBO – Coupling Between Objects

.

.

.

40

Metric: RFC – Response For a Class

.

.

.

.

42

Metric: m – Structure entropy

. Metric: LCOM – Lack of Cohesion Of Methods .

.

.

.

.

.

.

43

45

Metric: D – Distance from main sequence

.

.

.

.

.

.

50

Metric: CYC – Cyclic dependencies

Metric: NOM – Number Of Methods and WMC – Weighted

.

51

Methods per Class .

.

.

.

.

.

.

53

Model: Structure chart

. Metric: FAN-IN and FAN-OUT

.

.

.

. Metric: GVAR – Number of Global Variables

.

Other models

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

53

54

54

55

. Metric: DOCU – Documentation Rate

.

.

.

.

55

Metric: OO-D – OO-Degree

.

.

.

.

.

.

.

55

Metric: SMI – Software Maturity Index

.

.

55

Metric: NOD – Number Of Developers

.

.

.

.

56

Correlation between metrics

. Metrics selected for further investigation

.

.

.

. Size-dependent metrics and additional metrics .

.

.

.

.

.

.

.

56

57

59

6. Theoretical validation of the selected metrics

.

.

.

.

.

.

60

Problem of misinterpretation of metrics .

.

.

60

Types of scale

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

61

Types of metrics

Conversion of the metrics .

.

.

.

.

62

64

Other desirable properties of the metrics

.

.

.

.

67

Visualization .

.

.

.

.

.

.

.

.

67

7. Tools ABAP-tools .

.

.

.

.

.

. Transaction SE28

. Z_ASSESSMENT .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

70

70

70

70

CheckMan, CodeInspector .

.

.

.

.

.

70

AUDITOR .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

71

Java-tools .

Borland Together Developer 2006 for Eclipse

.

.

.

.

71

71

Code Quality Management (CQM)

.

.

.

.

72

CloneAnalyzer Tools for dependencies analyze

.

.

.

.

.

.

.

.

.

.

.

.

72

72

JLin .

.

.

.

.

.

.

.

.

73

Free tools: Metrics and JMetrics

.

.

.

.

.

73

Framework for GQM-approach

.

.

.

.

.

.

74

8. Results

.

.

.

.

.

.

.

.

.

.

75

Overview of the code examples to be analyzed .

Experiments .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

75

76

Admissible values for the metrics

.

.

84

Interpretation of the results

.

.

.

.

.

.

85

Measurement procedure

.

.

.

.

.

.

.

85

9. Conclusion

.

.

.

.

.

.

.

.

.

88

10. Outlook

.

.

.

.

.

.

.

.

.

.

90

References

.

.

.

.

.

.

.

.

.

.

92

Appendix

.

.

.

.

.

.

.

.

.

.

97

1. Introduction

The Product Innovation Lifecycle (PIL) of SAP is divided into five phases with a set of milestones. A brief overview of the PIL can be seen in figure 1.1. Consider the milestone (so called Quality-Gate) “Development to IMS” (D2IMS) in more details. The “Installed- based Maintenance and Support Department” (IMS) gets the next release through such a Quality-Gate with the start of the Main Stream phase and will support it for the rest of its lifecycle. The Quality-Gate is a formal decision to hand over the release responsibility to IMS and is based on a readiness check, which proves the quality of the release by a set of checks. However, this check aims to establish the overall quality and absence of errors and is not intended for determination of how easy is it to maintain the release. For correct planning of resources for the maintenance the IMS needs additional information about those attributes of software that impact maintainability. This information will not influence the decision of the Quality-Gate, but will help IMS developers in the planning of resources and analyzing of the release. Such information can also support the code reviews and allows earlier feedback to the development. This thesis aims at filling this gap by providing of a set of indicators, which describe the release from the viewpoint of its maintainability, and provides instructions how these should be interpreted. The second goal is a set of indicators, which can find the badly maintainable code. Detailed descriptions of the PIL concept can be found at SAPNet Quicklink /pil or in

[SAP04].

1. Introduction The Product Innovation Lifecycle (PIL) of SAP is divided into five phases with a

Figure 1.1: Goals of Quality-Gate “Development to IMS”. [SAP05 p.43]

With maintainability is meant the attribute for how easy and rapidly the process of maintaining the software is. The high maintainability means smooth and well

structured software, which can be easily maintained. Other definitions of the maintainability like “likelihood of errors” are out of scope of this work. On time of the Quality-Gate D2IMS, the product has already been completely developed and tested. Thus the complete source code is accessible. However the product is only about to “go to market” and no data about customer messages or errors is available. Consequently, only internal static properties of the software can be analyzed at this point of time. One way of approaching this problem is to investigate the dependency between the maintainability of the software and its design, with the goal to find the design properties that can be used as maintainability indicators. Since standard software is usually very large and no human analysis is possible, such finding should be taken by an automated device and must be objective. Thus only objective measures can be used. The subject of this thesis is the complexity of the software, which often leads to the badly maintainable code. Metrics provide a mathematical fashion for a purposeful describing of certain properties of the objective. After comprehension of the maintainability basis and finding the design peculiarities which impact the maintainability, this thesis proposes a way to describe these properties of the design using the metrics. Consequently, several selected metrics should be able to indicate the most important aspects of maintainability, and the overall quality of the software. Moreover it is commonly accepted that a bad code or lack of design is much easier to discover than a good code. Therefore it should not be a big challenge to find code that could cause problems for maintenance. All in all the solution of this task allows: deep understanding of the essence of the maintainability and its factors, estimating the quality of the product from viewpoint of the maintainability, appropriate planning the resources for the maintenance and providing the earlier feedback to the development. A more detailed problem description and the goals of this thesis are presented in chapter 2. This thesis is composed in the following way: chapter 3 gives an overview of the related work. The quality model, which is used to determine the essence of the maintainability, is discussed in chapter 4. Chapter 5 provides short descriptions of the metrics – candidates for extending the quality model. Chapter 6 supplements metric’s descriptions with theoretical validation. Tools, which can be used for the software measurement, are discussed in chapter 7. Experiments and results are discussed in chapter 8. Conclusions are given in chapter 9 and a short outlook in chapter 10 finishes this thesis.

2. Research Problem Description

Different Methods for Source Code Analysis

All methods for the analysis of the source code can be divided into two groups: static and dynamic methods. The static methods work with the source code directly and don’t demand running the product. This quality allows using of the static methods in earlier phases of the lifecycle and is one of the important requirements for this thesis. To the static methods belong metrics and audits. To the dynamic methods belong metrics, tests and Ramp-Up. Dynamic methods can also consider dynamic complexity of the product, for example not only how the connections between modules are organized, but also how often these are actually used. Here and further in this paper with a module is meant any compilable unit: a class in Java and a program, a class, a function module etc. in ABAP. Noteworthy, that the dynamic methods can show different results in different environments. Above all it is important for applications which provide only generic framework and the customer composes its own product using provided functions (for example solutions for the business warehousing). Metrics for dynamic complexity are usually based on an UML specification of project, for instance state-diagrams, and are analyzed using colored Petri-Nets. The next possibility is collecting of statistical information about the running application. Several experts mean, that improving of only few modules (which are most often used) can significantly improve the quality of the entire system. Noteworthy, modules, which are often used from other modules, are more sensible to the changes and should have better quality. All methods except metrics are relatively good investigated at SAP and also supported with the tools. The author believes that the main reasons of program’s maintainability are placed directly in the code and indicators can be extracted from the static code without supplementary dynamic analysis. Three main activities of the maintenance are:

analyzing the source code, changing the program and testing the changes. Therefore, most of the time maintainer works also with the static code.

Metrics vs. Audits

Two main types of the static code analysis are distinguished: metrics and audits. The metrics provide a description of the measured code, which means a homomorphic mapping from empirical objects to real numbers in order to describe their (objects) certain properties. The homomorphism in this case means “a mapping from the empirical relational system to the formal relational system which preserves all relations and structures between the considered objects” [ZUSE98 p.641]. Empirical objects can have different relations in between, for example: one program is larger than another, or one program is more understandable than another one. Of course researcher wants the metric to preserve such relations. Such mapping also means that metrics should be considered only in context of some relation between empirical objects. Common example of the

homomorphic mapping is presented in figure 2.1. More to the theoretic framework of the software measurement see [ZUSE98, in particular p.p. 103-130]. An example of metric is LOC (Lines Of Code), this metric preserves relation “Analyzability”, since smaller programs are in common easier to understand than larger programs. Other metric like NOC (Number Of Children in inheritance tree) preserves relation “Changeability”, since class with many sub-classes is more difficult to change, than class with only few or no sub-classes.

Empirical Objects Numerical Objects Metric P1 M1 Empirical Numerical Relations Relations Metric P2 M2 PN MN
Empirical Objects
Numerical Objects
Metric
P1
M1
Empirical
Numerical
Relations
Relations
Metric
P2
M2
PN
MN

Figure 2.1: Metric - Mapping between Empirical and Numerical objects which preserves all relations

According to Zuse’s measurement framework [ZUSE98], the specifying of the metric includes the following steps:

Identify attributes for real world entities

Identify empirical relations for such attributes

Identify numerical relations corresponding to each empirical relation

Define mapping from real world entities to numbers

Check that numerical relations preserve and are preserved by empirical relations

In opposite to the metrics, audits are just verification of adherence to some rules or development standards. Usually audit is a simple calculation of violation of these rules or patterns. SAP uses a wide range of audit-based tools for ABAP: CHECKMAN, Code Inspector, Advanced Syntax Check, SLIN; and Java: JLin, Borland Together. Audits help developers find and fix the code potentially having errors and increase awareness of the quality in development. However audits are bad predictors of the maintainability, because though an application is conformant to the development standards, it could be poorly maintainable. Moreover, the audits give concrete recommendation to developers, but are not able to characterize the quality of the product in general. The second reason for rejecting of the audits is absence of the complexity analysis – main part of the maintainability analysis. Further in this work only metrics will be considered. Approaches for finding of the appropriate metrics are discussed in chapter 4. Research of numerical properties of metrics is discussed in chapter 6.

Based on the metric definition, the following scenarios of usage are thinkable:

Compare the certain attributes of two or several software systems Formally describe the certain attributes of the software Prediction. If a strong correlation between the metrics is found, the value of one metric can be predicted based on the values of another metric. For example if the relation between some complexity metric (product metric) and the fault probability (process metric) is found, one can predict the probability of a fault in certain module based on its complexity Keep track of evolution of the software. Comparing different versions of the same software allows drawing conclusions about the evolution and trend of the product

Classification of the Metrics

In [DUMK96 p.p. 4, 8] Dumke considers three improvement (measurement) areas of the software development: software products, software processes and resources, and gives metrics classification of each area. Product metrics describe properties of the product (system) itself and thus depend on internal qualities of the product only. The examples of product metrics are number of lines of code or comments rate. Process metrics describe an interaction process between the product and its environment, the environment can be also the people, who develop or maintain the product. The examples of the process metrics are number of problem closed during the month or mean time to maintain (MTTM). Obviously, the maintainability can be measured directly in the process of maintenance using process metrics like MTTM. However this maintainability assessment should be made before the maintenance begins and these metrics are available. Thus this thesis tries to predict the maintainability in earlier phases of the lifecycle using the product metrics only. Resources metrics describe properties of the environment. The examples of resource metrics are number of developers in a team or amount of available memory on a server. This work is concentrated purely on the software product measurement. However the process metrics also can be used for empirical validation of the product metrics, because these can measure the maintainability directly and prove the prediction, which has been made by the product metrics. Once an appropriate correlation between the product metrics and the process metrics is established, one can talk about empirically validated product metrics.

Types of Maintenance

There are three main types of maintenance (based on [RUTH]):

Corrective – making it right (also called repairs)

o

To

correct

residual

faults:

specification, design, implementation,

documentation, or any other types of faults o Time consuming, because each repair must go through the full development lifecycle.

o On average, ~20% of the overall maintenance time (however at IMS reaches 60%, and with beginning of the Extended Maintenance even up to

100%)

Adaptive – making it different (functional changes) Responses to changes in environment, in which the product
Adaptive – making it different (functional changes)
Responses to changes in environment, in which the product operates
Changed hardware
Changed connecting software, e.g. new database system
Change data, e.g. new phone dialing codes
On average, ~20% of maintenance time (at IMS 30%, with the time 10%)
o
o
o
o
o
Perfective – making it better
Change software to improve it, usually requested by client
Add functionality
Improve efficiency – for example performance (also called polishing)
Improve maintainability (also called preventative maintenance)
On average, ~60% of the maintenance time (at IMS only 10-20%)
o
o
o
o
o

For the IMS most important and time consuming is the corrective maintenance. However, this thesis doesn’t distinguish between special types of maintenance, because the general the process is the same for all types of maintenance. Nevertheless, the results of this analysis especially can be used for the planning of the preventative maintenance.

Goal of the Work

This thesis is going to answer the question: What are the metrics are able to do? In order to wrap this question into more practical task, the following formulation will be used: the set of metric-based indicators for the maintainability of the standard software should be found in order to be able to assess or predict the maintainability, based on the internal qualities of the software in earlier phases of the lifecycle. No singular metric can adequately and exhaustively evaluate the software and too many metrics may lead to the informational overload. Thus a well-chosen subset of about 12 measures should be selected and analyzed. For each metric admissible boundaries and recommendable values should be defined. Moreover, possible interpretations of the results and its meaning for the maintainability should be prepared. Since the measurement is going to be made in the automatic manner, overview of the most suitable tools should be provided and the description of the measurement process should be prepared. Detailed description, implementation hints and examples should be prepared for each metric. In order to fulfill all these requirements, the theoretical and empirical validation of the selected metrics also should be done. The approach must not use additional information sources (except the source code) like:

requirements, business scenarios, documentation etc. In current work only metric-based static code analysis is considered.

3. Related Works and Projects

In this chapter several relevant projects are introduced. This description should give an idea what has been done in this field so far. Besides the selected projects, a wide range of articles and books were written to research single metrics and measurement frameworks. These are not included in this chapter, but mentioned or referenced further in this thesis.

Maintainability Index (MI)

Hybrid measures are not measured directly from the software, but are a combination of other metrics. Most popular form for the combination is polynomial, nevertheless, there are also other forms. Such combination is used for having one resulting number of the whole evaluation. However this desire brings researches to the problem of hiding information. Hybrid measures show attributes of empirical object incompletely and imperfectly. One attempt to present the maintainability as a single hybrid measure is the Maintainability Index from Welker and Oman [WELK97], which includes some models with various member metrics and coefficients. One of them is the improved, four-metric

  • MI model:

  • MI = 171 – 5,2 ln(Ave-V) – 0,23 Ave-MCC – 16,2 ln(Ave-LOC) + 50 sin((2,4 Ave-CR))

where:

Ave-V is the average Halstead volume per module,

Ave-MCC is the average McCabe Cyclomatic Complexity per module,

Ave-LOC is the average number of lines of code per module,

Ave-CR is the average per cent of comments per module.

Many of the metrics used here are discussed in chapter 5. The research in [WELK97] gives the following indications on the meaning of the MI

values:

MI < 65: poor maintainability

65 < MI < 85: fair maintainability

85 > MI: excellent maintainability

Nevertheless, all used metrics are intra-modular and don’t concern inter-modular dependencies, which highly impact the maintainability, thus Maintainability Index was rejected from the further investigation. However, using this approach led to an interesting observation: MI was messed on two different points of time for the modules of the same system and it was shown, that less maintainable modules became more difficult to maintain, while good maintainable modules kept the good quality with the time.

Functions Point Method (FPM)

The Functions point method suggests assigning to each module, class, input form etc. certain number of functions points (FP) depending on its complexity. The sum of all

points predicts the development or maintenance effort for the whole application. Assumed that developer can daily implement certain number of FP on average, manager can predict number of developers and time needed. FPM is perfectly applicable at early project phases and allows predicting the development and maintenance effort when source code is not yet available. It also suits strong data- oriented concept of SAP applications. Nevertheless, in case of this work source code is already available and it could be difficult to conversely calculate the number of FPs, which were implemented. Especially, it could be difficult in case of the product, which has been bought from outside and no project or design documentation is available. To make matter worse, FP are subjective and don’t suit requested objective model. Also, these measures were rather designed for cost estimations (before source code is available) than for the measurement. Thus in the best way one can collect information from source code directly, not using FPM as additional layer of abstraction. For readers, who are interested in FPM, the following sources are recommendable: [AHN03],

[ABRA04b]

Key Performance Indicators (KPIs)

The goal of this project is definition, measurement and interpretation of basic data and KPIs for the quality of the final product. For assessment of the product quality several (ca. 30) direct indicators were selected, this means that the data is collected immediately in the process of maintenance. Examples of the selected indicators are:

Number of messages - Number of customer messages income per quarter

Changed coding - Sum of inserted and deleted lines in all notes with category “program error” divided by the total number of lines of coding (counted per quarter) Callrate - Number of weekly incoming messages per number of active installations Defect Density (DD) - Defined as the number of defects (weighted by severity and latency) identified in a product divided by the size and complexity of the product Nevertheless, the earliest possible availability of such indicators is at the end of Ramp- Up phase. Thus in context of current thesis it is only possible to use these KPIs for validating of the developed metrics. For more details about KPIs see [SAP05c].

Maintainability Assessment

This project aims assessing of the maintainability of the SAP products shortly before handover to IMS. Thus the goal is nearly the same as with the current thesis. However the assessment chosen in this project is audit-based. Several aspects of the maintainability are inspected and the list with questions is prepared. Expert should analyze the product in manual manner, answer them and fill out a special form. After that the final conclusion about the maintainability can be automatically reported. Main drawbacks of the suggested method are the manual character of the assessment and only one single resulting value, which is difficult to interpret. In this project also some

primitive metrics like lines of code and comments rate are suggested and the tool for supporting of these metrics is provided.

Abstract Syntax Tree (AST)

In this project ABAP code is analyzed and a method for building an abstract syntax tree is suggested. A plug-in for Eclipse is also developed in order to automate this method. The plug-in allows saving the AST into an XML-document and analyzing it. Based on this technique, some metrics for ABAP can be implemented. Another way to use the AST is to find clones – copied fragments of coding.

Complexity Based Independent Software Metrics

This is a master thesis about Complexity Based Independent Software Metrics (short:

COBISOME). The main point of this work is to find an algorithm for converting a set of correlated metrics into a set of independent metrics. Such conversion creates the list of virtual independent (orthogonal) metrics, what allows examining different aspects of the software independently and thus more effectively. Nevertheless, the complicated transformations and aggregation of several metrics to one make the analysis more difficult at the same time. For more details see [SAP05b].

Kaizen

Objective of the project Kaizen is to analyze selected SAP code to understand it better and look for ways to continually improve it. Three possible objectives of the code improvement are:

Improve readability and general maintainability

Reduce cost of service enabling

Enable future enhancements in functionality (when well understood)

Kaizen will focus on objectives #1 and #2 as applicable to most SAP code. One of the first steps of the project is the analysis of the maintainability metrics.

ISO 9126 – Standard Quality Model

The ISO 9126 quality model was proposed as an international standard for the software quality measurement in 1992. It is a derivation of the McCall model (see appendix A). This model associates attributes and sub-characteristics of the software to one of the areas (so called characteristics) in hierarchical manner. For the area “Maintainability” the following attributes are arranged: analyzability, changeability, stability, testability and compliance. Although one has these attributes, the measuring of the quality still is not easy. This model is customizable, but not very flexible and in many cases not applicable. Hence this model is not common acceptable and only few tools are based on the ISO model.

4. Quality Model – Goals and Questions

Goal Question Metric Approach

A quality model is a model explaining the quality from certain point of view. An objective of the quality model could be products, processes or projects. Most of the models suggest a decomposition principle, where a more general characteristic is decomposed into several sub-characteristics and further into metrics. Various metric definition approaches (MEDEAs) were developed. Most effective are the hierarchical quality models organized in a top-down fashion: it must be focused, based on goals and models and at the same time provide appropriate detailing. “A bottom-up approach will not work because there are many observable characteristics in software, but which metrics one uses and how one interprets them it is not clear without the appropriate models and goals to define the context” [BASI94 p.2]. Nevertheless bottom- up approach is useful by the metrics validation, when the metrics are already selected. The most flexible and very intuitive approach is the Goal Question Metric MEDEA (GQM), which suggests hierarchical top-down model for selecting of the appropriate metrics. This model has at least three levels:

Conceptual level (goals): This level presents a measurement goal, which is derived from business goals. In case of this thesis the measurement goal would be “good maintainable software”. However, in order to facilitate formalizing of the top goal, the GQM specifies a template, which includes a purpose, an object, a quality issue, a viewpoint and a context. The formalized goal is given in the next section (see p. 21). Since the top goal is usually very complex, it can be broken down into several sub-goals in order to make easier the interfacing with the underlying levels Operational level (questions): As the goals are presented on the very abstract conceptual level, each goal should be refined into several quantifiable questions, which introduce more operational level and hence are more suitable for the interpretation. Answers to these questions have to determine whether the corresponding goal is being met. “Questions try to characterize the object of measurement with respect to a selected quality issue and to determine its quality from the selected viewpoint” [BASI94 p. 3]. Hence, questions help to understand the essence of the measurement objective and find the most appropriate indicators for it. These indicators could be explicitly formalized within an optional Formal level Quantitative level (metrics): Metrics placed on this level should provide all quantitative information to adequately answer the questions. Hence, metrics are a refinement of the questions into quantitative product measures. The metrics should provide enough sufficient information to answer the questions. The same metric can be used to answer multiple questions

Optional Tools level can be included into the model in order to show the tool assignment for the metrics The abstract example of the GQM model is illustrated in figure 4.1. A more detailed description of the GQM and step-by-step procedure of using it are described in

[SOLI99].

“GQM is useful because it facilitates identifying not only the precise measures required, but also the reasons why the data are being collected” [PARK96, p. 53]. It is possible to range the impact of metrics on questions using weight coefficients, to make clear, which metric is more important. However, the used in this thesis model doesn’t aim to describe, which weights the metrics have. Author believes that the best way is to give the analyst full freedom in his decision. The analyst can decide dependently on the situation, which indicator is more important.

Optional Tools level can be included into the model in order to show the tool

Figure 4.1: The Goal Question Metric approach (abstract example)

The measurement process using the GQM-approach includes four main steps:

Definition of the top goal and the goal hierarchy

Definition of the list of the questions, which explain the goals

Selection of the appropriate metric set; theoretical and empirical analysis of each

metric; selection of the measurement tools Collecting measurement data and interpreting of the results

The first three steps are intended for the definition of the GQM-Quality model, the last step means the actually measurement and interpretation and can be repeated many times.

Quality Model

According to the GQM goal specification, the major goal for the maintainability purpose is: to assess (purpose) maintainability (quality issue) of standard software (object) from IMS’s viewpoint (viewpoint) in order to manage it and find possible ways to improve it (purpose) in the ABAP and Java environment (context). The question for the major goal could be “How easy is the location and fixing of an error in the software?”, but this question is very vague and can only be answered with the process metrics like MTTM. As it was mentioned before, measuring such process metrics is only possible during the

maintenance and thus is inappropriate for purposes of this work. Let’s call such goals external goals, because the degree of the goal achievement also depends on some external motive. The degree of achievement of internal goals depends only on internal properties of the software and hence can be described relatively early in the lifecycle. This major goal is highly complex and it is difficult to create appropriate questions for it, thus complex hierarchy of goals should be used including top goal, goals and sub- goals. Moreover, on the bottom of the hierarchy should be placed internal goals only, so that questions will be addressed only to the internal goals. The goal hierarchy is depicted on figure 4.2, where blue boxes present external goals. Such decomposition allows sensible selection of questions and necessary granularity. The full model is presented in appendix B.

maintenance and thus is inappropriate for purposes of this work. Let’s call such goals external goals,

Figure 4.2: Mapping of external and internal goals

The used quality model is based on several validated and acknowledged quality models as: ISO 9126 standard quality model, McCall quality model, software quality characteristics tree from Boehm and Fenton’s decomposition of maintainability. Corresponding parts of these models can be found in appendix A or in [KHAS04]. Several sub-goals and metrics also were taken from [MISR03]. After examination of these quality models, theoretical speculation and research of the literature in this field, the following areas (goals) were recognized as important for maintainability of the software:

Maturity

Clonicity

Analyzability

Changeability

Testability

The goals Maturity and Clonicity are described together with corresponding metrics in chapter 5 (see p.55 and p.26, correspondently). Next, the aspects Analyzability, Changeability and Testability are discussed. The Analyzability is probably the most important factor of the maintainability. Nearly all metrics used in the model are also presented in the Analyzability area. First, author

wanted to include also goal Localizing into the model, which would characterize how easy it is to localize (find) the fault in the software. But later it was found, that most of metrics for this goal are already included in Analyzability and Localizing was rejected from the model. The following sub-goals should be fulfilled in order to create the easy comprehensible software:

Algorithm Complexity - Keeping easy the internal (algorithm) complexity

Selfdescriptiveness - Providing of the appropriate internal documentation (naming conventions and comments) Modularity - Keeping the modules small and appropriate encapsulation of the functionality into the modules (cohesiveness) Structuredness - Proper organization of the modules in the entire structure Consistency - Keeping the development process easy and well organized. There are a lot of researches trying to determine whether the well organized development process leads to good quality of the product. However no evident relation was found. Nevertheless, the maintainer is sometimes confused if he sees that the module was changed many times by different developers. Consistency in this context means clear distribution of tasks between developers. For the changeability (or easiness of making changes in the software) it is important to have proper design of software, which allows the maintenance without side-effects. The quality model includes the goals Structuredness, Modularity and Packaging in this area. Whereas the structuredness has several different aspects:

Coupling describes the connectivity between classes

Cohesiveness describes the functional unity of a class

Inheritance describes the properties of inheritance trees

The Testability means easiness of the testing and the maintenance of test-cases. Bruntink in [BRUN04] investigates the testability from the perspective of unit testing and distinguishes between two categories of source code factors: factors that influence the number of test cases required testing the system (let’s call the goal for these factors “Value”), and factors that influence the effort required to develop each individual test case (Let’s call the goal for these factors “Simplicity”). Noteworthy, that with the “Value” the number of the necessary test-cases is meant and not a number of actually available test-cases. Consequently, for the high maintainability is important to keep the “Value” small. Nevertheless, most efforts in the field of the test coverage are concentrated on the low procedure level, for example the percentage of the tested statements within a class. The quality model includes several metrics for the testability validated in [BRUN04]. In the SAP system important part of the complexity is included in the parameters for customization, however the experts argue, that most of the customization complexity is already included in source code, where the parameters are read and processed. The impact of individual metrics on the maintainability is discussed in greater detail in chapter 5.

Size-dependent and Quality-dependent Metrics

Before the individual metrics can be discussed, one important property of the metric, namely size-dependency, should be introduced. Some metrics are highly sensible of the project size. That means such metrics will show higher values whenever software grows. Such metrics are size-dependent. Other metrics are quality-dependent and measure purely quality independent of size. That means larger software can have smaller values of such metrics. A good example of size-dependent metric could be LOC (Lines Of Code) because it continuously grows with each new statement. The metric Ave-LOC (average number of LOC in module or method) is on the contrary independent of the size and imparts important characteristic of the quality – the modularity. In order to be able to compare software of very different size, usage of the quality- dependent metrics is more preferable. However many size-dependent metrics impart the qualitative attributes of software as well, but they are too sensible of the size and need to be converted before usage in order to reinforce the quality constituent of the metric. Moreover, few size-dependent metrics should be included into the quality model, in order to gain some insight about the size of the considered system. For this purpose the metrics Total-LOC – total lines of code and Total-NOO – number of all objects (modules) are suggested. “Although size and complexity are truly two different aspects of software, traditionally various size metrics have been used to help indicate complexity” [ETZK02, p. 1]. Consequently, the metric that assesses a code complexity of a software component by the use of a size analysis alone will never provide a complete view of complexity.

5. Software Quality Metrics Overview

In this chapter all metrics, which are supposed to be used in the quality model, will be discussed in more details. Many metrics are complex and difficult to measure directly, thus it is usual to build some abstraction of the system called model and measure the attributes of this model. There are five main models that suit for software product measurement in context of this thesis. Since the properties of a metric depend on the model where the metric is measured in, all metrics are grouped in sets concerning the model they belong to. In the literature two major classes of software measures can be found. They are based on modules and entire software systems and are called respectively intra- and inter- modular measures. Metrics, based on lexical model and flow-graph, are intra-modular, metrics, based on inheritance hierarchy model, structure tree and structure chart, are usually inter-modular.

Model: Lexical Model

This model is intended for intra-modular measures and constitutes plain text in the programming language. It is also possible to partition the text into simple tokens and analyze the frequencies of usage for these tokens.

Metric: LOC – Lines Of Code

The metric LOC counts the number of lines of code excluding white spaces and the comments which take a whole line. Total LOC in the system reveals quantitative meaning and shows first of all the size of the system. It has no qualitative meaning because both small and huge system could be maintainable or not. In qualitative sense can be used the metric Ave-LOC – average amount of LOC per module or class. This metric shows how good the system is split in parts. It is widely accepted that small modules are in common easier to understand, change and test than bigger ones. However, a system with a large number of small modules has a large number of relations between them and is complex as well. See chapter “Correlation between metrics” for more details. If you really want to compare code written by different people, you might want to adjust for different styles. One simple way of doing it is to only count open braces and semicolons or full stops for ABAP (this works fairly well for ABAP and Java). From this point of the view metric NOS (Number of Statements) is more universal. However, in large systems both metrics are strongly correlated because of a mixture of different programming styles, and have the same empirical and numerical properties. Noteworthy that Java has a more compact syntax. In [WOLL03, p.5] it is shown that a program written in Java has about 1,4 times more functionality than an ABAP program of equal length. This should be considered by estimation of admissible values for LOC. Probably, the LOC is the most important metric because many other metrics correlate with LOC. Therefore the approximate value of other more complicated metrics could be

easily estimated by LOC. For example figure 5.14 depicts the correlation between LOC and WMC (Weighted Methods per Class).

Metrics: CR – Comments Rate, LC – Lack of Comments

It is obvious that comments in code help to understand it. Hence metric CR (Comments Rate) is a good candidate for the analyzability. CR is a ratio of the sum of all kinds of comments (full-line comments, part-line comments, JavaDoc, etc.) to LOC (Lines Of Code). CR is easy to calculate and interpret. However, many comments are created automatically and do not provide any additional information about the functionality. Noteworthy, these comments help to better lay out the source code and make it more readable, but do not help the maintainer in understanding the code. Additionally, a piece of code could be commented out and will be counted as the comment. Through the modern systems for versioning, many developers leave such fragments in the code. In this case CR can reach 70 – 80 %, what is much overstated. The metric which takes into account such “comments” and automatically generated comments, is no more trivial. Therefore CR should be considered critically and the maintainer should understand, that the CR could be overstated. During experiments it was detected that interfaces and abstract classes have very high amount of comments and only few LOC. Hence many interfaces and abstract classes increase overall percentage of comments. Noteworthy that CR is the only the metric in the quality model, which values become better with increasing. All other metrics should be minimized. Thus one new metric is suggested. LC (Lack of Comments) indicates deficiency of CR to the optimal value and is calculated using following approach: LC = 100 – Median-CR. Since CR is a percentage measure, the arithmetic mean must not be used. The difference should be calculated for the aggregated value for the entire system. This substitution will not worse numerical properties of the metric, since CR already has relatively bad numerical properties (see chapter 6). Now all metrics in the quality model should be minimized.

Metric: CLON – Clonicity

The code cloning or the act of copying the code fragments is a widespread technique for implementation and the acceleration of the development should not be underestimated. But cloning is also a well known problem for the maintenance. Clones increase work and cognitive load for maintainers because of many reasons [RIEG05]:

The amount of code, that has to be maintained, is increased

When maintaining or enhancing a piece of code, duplication multiplies the work

to be done Since usually no record of the duplications exists, one cannot be sure that a

defect has been eliminated from the entire system without performing a clone analysis If large pieces of software are copied, parts of the code may be unnecessary in the new context. Lacking a thorough analysis of the code, they may however not be

identified as such. It may also be the case that they are not removable without a major refactoring of the code. This may, firstly, result in dead code, which is never executed, and, secondly, such code increases the cognitive load of future maintainers Larger sequences repeated multiple times within a single function make the code unreadable, hiding what is actually different in the mass of code. Code is then also likely to be on different levels of detail, slowing down the process of understanding If all copies are to be enhanced collectively at one point, the necessary enhancements may require varying measures in the cases where copies have evolved differently. As an extreme case, one can imagine that a fix introduced in the original code actually breaks the copy Exact and parameterized clones are distinguished. Finding of exact clones is easier and is language independent. Parameterized clones are more difficult to find, but emphatically more helpful, because clones are often insignificantly changed already by coping. In [RYSS] various techniques for clone finding are classified. These techniques can be roughly classified into three categories:

string-based, i.e. the program is divided into a number of strings (typically lines) and these strings are compared against each other to find sequences of duplicated strings token-based, i.e. a lexer tool divides the program into a stream of tokens and then searches for series of similar tokens parse–tree based, i.e., after building a complete parse-tree one performs pattern matching on the tree to search for similar sub–trees. Parse-tree based technique was considered also during ASP project. Choosing of the technique should be made according to the goal of the measurement. The finding of all possible clones for next audits preferably will use the token or parse- tree based technique. In context of this thesis it is more interesting to know only approximately number of clones and thus the simple and quick string-based technique can be used. The next important property of string-based technique is language independency, since the ABAP and Java environments are considered. As most important indicator is suggested the metric CLON (Clonicity), which is a ratio of LOC in all detected clones to the Total-LOC. This metric should give an idea about usage of the copy-paste in development process and consequently about redundancy of the final product.

Short Introduction into Information Theory and Metric: CDEm – Class Definition Entropy (modified)

Methods for describing of complexity

There are many methods, which allow describing the complexity of the system. Only

few of them are listed below (partially taken from [CART03]):

Human observation and (subjective) rating. The weakness of such evaluation is too subjective manner and the required human involvement.

Number of parts or distinct elements. Nevertheless, size and complexity are truly two different aspects of software. Despite of this fact traditionally various size metrics have been used to help indicate the complexity. However many such metrics are size-dependent and don’t allow the comparing of systems of different size. It is also not always clear, what should be counted as a distinct part. Number of parameters controlling the system. Here the same comments as by number of parts can be applied. Minimal description in some model/language presents some kind of abstraction. Obviously a system, which has smaller minimal description, is easier, than a system with larger minimal description. In this method a model (a description) includes only relevant information, thus the redundant information, which intensifies size without incrementing complexity, is avoided. Information content (how is defined/measured information?) Minimal generator/constructor (what machines/methods can be used?)

Minimum energy/time to construct. Several experts argue that the system, which needs more time to be designed (implemented), is more complex. Obviously, the study of complex systems is going to demand that the analyst uses some kind of statistical method. Next, after short introduction into information theory, entropy-based metrics for supporting of some above mentioned methods are discussed.

Information

Remark: all following theses are considered in terms of the probability. Consider a process of reading of a random text, whereas it is supposed that alphabet is initially known to the reader. The reading of each next symbol can be seen like an event. The probability of this event depends on the symbol and its place within the text. Examine a measure related to how surprising or unexpected an observation or event is

and let’s call this measure information. Thus the information, which is gotten from each new symbol, is in this context an amount of new knowledge, which reader gets from this symbol. It is obvious that information inversely related to the probability of the event: if the probability of occurrence of “i” is small, reader would be quite surprised if the outcome actually was “i”. Conversely, if probability of certain symbol is high (for example the probability of occurrence of the symbol “i” after “t” in the word “information” tend to 1), reader will not get much information form this symbol. Let’s describe the information measure more scientifically. For that Shannon proposed four axioms:

Information is a non-negative quantity: I(p) >= 0 If two independent events occur (whose joint probability is the product of their individual probabilities), then the information reader gets from observing the events is the sum of the two informations: I(p1*p2) = I(p1)+I(p2) I(p) is continuous and monotonic function of the probability (slight changes in probability should result in slight changes in information) If an event has probability 1, reader gets no information from the occurrence of the event: I(1) = 0

Deriving from these axioms one can get the definition of information in terms of probability: I(p) = −log 2 (p). More detailed description of this derivation can be found in [CART03] or [FELD02]. Index 2 means binary character of the events. In this case units for the information are bits. However other indexes are also possible.

Entropy

Each symbol in the text brings different amount of information. Interesting would be the average amount of information within the text. For this propose term entropy is introduced. After simple transformation the following expression for entropy can be

derived:

Deriving from these axioms one can get the definition of information in terms of probability: I(p)

Noteworthy, that H(P) is not a function of X. It is a function of the probability distribution of the random variable X. Entropy has the following important property:

0<=H(P)<=log(n). H(P) = 0, when exactly one of the probabilities is one and all the rest are zero (only one symbol is possible). H(P) = log(n) only when all of the events have the same probability 1/n. It is surely wanted to maximize H by a uniform distribution. Everything is equally likely to occur - you can't get much more uncertain than that. Since maximal possible entropy is known, the normalized Entropy can be introduced:

Deriving from these axioms one can get the definition of information in terms of probability: I(p)

It is important, since entropy is project size dependent. Remarkable, that entropy logarithmic depends on size: doubling of the size increments maximal entropy by one point. Next some possible interpretations of entropy are listed:

Entropy of a probability distribution is just the expected value of the information of the distribution Entropy is also related to how difficult it is to guess the value of a random variable X [FELD02, p.p. 5-7]. One can show that H(X) <= Average # of Yes-No Questions to Determine X <= H(X) + 1 Entropy indicates the best possible compression for the distribution – average number of bits needed to store the value of the random variable X. Noteworthy that entropy suggests only theoretical basis, some practical algorithm should be used for the actually coding (for example Huffman codes). Next, some applications of entropy for the software measurement are discussed.

Average Information Content Classification

In [ETZK02, p 295] the work of Harrison is mentioned. Harrison and other scientists proposed to extend Halstead's number of operators and measure distribution of different operators and operands within a program. It should allow assessing the analyzability of one single chunk of code. However such method is not very useful since the main complexity is contained within user-defined strings. Remarkable, that syntactical rules in languages decrease entropy. For example it is not possible to have 2 operands without an operator in between and the compiler takes care for it. Hence the probabilities of the occurrence of the operands in the syntactical correct

programming text depend on the syntactical rules. Consequently, the entropy of the syntactical correct program will never reach maximum and normalizing with respect to the syntactical rules becomes much more difficult.

Metric: CDEm - Class Definition Entropy (modified)

This metric reduces alphabet for text to the user-defined strings used in a class, because

these contain the most part of the complexity. Examples of the user-defined strings are:

Names of classes, attributes and methods

Package and class names within an import section

Types of public attributes and types of return values for methods

Types of parameters

Method calls, etc

By such restriction one can get another level of granularity. Let’s illustrate this metric by

an example. Consider a maintainer seeking throw the source code. How surprised would be the maintainer, if he sees a reference to other class? Suppose that:

Maintainers work easily if they confront with the same object again and again

Maintainers work difficult if they often should analyze new unknown objects

Consider two programs presented in figure 5.1. Assume that the maintainer has to fix two faults in the modules B and C. Both modules the use functionality provided by the modules A and E. In the first program the module A plays a role of an interface and the maintainer can work easy, because he has to keep in mind only one pattern for collaboration. In the second program the modules B and C have references to different modules, thus such model is more multifarious and more difficult to comprehend. Figure 5.2 show different patterns for frequency of occurrence of module names in an abstract program. The frequent used modules play a role of interface for its contained package. Entropy of frequency distribution is an indicator for evidence of interfaces. Positively P1 has less value for entropy than P2. Noteworthy that other metrics will show that P2 is much easier to comprehend: less coupling between modules – less complexity. Consequently high entropy for distribution of the user-defined strings will indicate the difficult comprehensible text. Different variances of this metric have been proposed. A very simple and intuitive variant is an analysis of the import-section only and calculation of the distribution of occurrences of class names in the import-sections. The classes, which occur most often in the import-sections, are also often used from outside of the package, where they are defined, and thus are the interface of this package. Clear (small) package interfaces are indicator of the good design. This metric is called CDEm – Class Definition Entropy (modified). Reader interested in other implementations of this kind of metric are referred to [ETZK99], [ETZK02] and [Yi04]. Incidentally some entropy-based metrics use also semantic analysis to improve its pronouncement.

P1 H(P1) = 2,37 F E A D B C
P1
H(P1) = 2,37
F
E
A
D
B
C
P2 H(P2) = 2,5 F E A D B C
P2
H(P2) = 2,5
F
E
A
D
B
C

Figure 5.1: The uniform (left) and the multifarious pattern for communication between modules

Frequency diagram for user-defined strings

Frequency /

Frequency /

Probability of

Probability of

occurrence

occurrence

Name of A B C D E F . . . Z module
Name of
A B C D E F .
.
.
Z
module

The system with pronounced interfaces

Name of A B C D E F . . . Z module
Name of
A B C D E F .
.
.
Z
module

The system with ulterior interfaces

Figure 5.2: The evidence of classes, which play the role of interface for the packages

For the calculation of CDEm two programs were developed: Class Entropy.java prepares the list of all classes in the project and after that seeks the source code in order to find references on classes from this list. Next, the list is filled up with data about frequencies and entropy is calculated. Class EntropyImport.java doesn’t have a list of classes to be found, this tool seeks source files and calculates entropy of import-clauses. In this case list of user-defined strings (import clauses) is prepared dynamically. It is argued that both tools measure the same aspect, since the results of both tools are correlated. Since entropy based on analysis of import section is easier, it will be used for the further research. As very initial indicator of entropy the coefficient of compression (for example of ZIP archive) can be taken. The author believes that ZIP practically implements algorithm, which with its ZIP-coefficient of compression (ZCC) tends to the best possible

compression defined by means of entropy. However, ZIP works with symbols, while CDEm works with tokens. ZCC = size of the ZIP-archive / size of the project before the compression. Thus, high ZCC indicates high complexity of the project, low ZCC indicates simple project with high redundancy. High CDEm indicates complex design, low CDEm indicated simple design. The next simple experiment tries to find out whether these two metrics are correlated and proves whether there is a correlation between CDEm and ZIP- coefficient. Figure 5.3 shows dependency between ZIP-compression coefficient and import-based CDEm. Input for this experiment were examples of code described in chapter 8.

CDEm (%)

Correlation between ZCC and CDEm

87,0 0,20 0,26 0,23 0,17 0,29 ZCC 0,32 82,0 83,0 84,0 85,0 86,0 0,21; 83,0 88,0
87,0
0,20
0,26
0,23
0,17
0,29
ZCC 0,32
82,0
83,0
84,0
85,0
86,0
0,21; 83,0
88,0
89,0
0,29; 88,6
0,18; 82,4
0,21; 82,5
0,25; 83,3
0,32; 84,2
0,30; 87,1
0,28;
85,9

Figure 5.3: ZCC and CDEm do not have evident dependence

During this experiment 4 pairs of projects were analyzed, whereas each pair presents two versions of the same project – old and new. Each newer project is supposed to have better values than older one. In figure 5.3 arrows connecting two measurement points indicate trend of values within one project. Since directions of the arrows are quite different one can say about absence of any connection between these metrics. Overview of all considered projects is also shown in table 5.1. Trend of metrics (improvement - or degradation - ) is shown using arrows. According to expert’s opinion all newer version should show improvement, however the metrics often show opposite results. Nevertheless, ZCC measures not pure entropy of the code, but also entropy of comments, most of which are generated or consist of the same predicates. Consequently, ZCC shows lower values that the entropy actually is. More accurate experiment should exclude comments before compression. Besides, high CLON can cause low ZIP-coefficient values as well.

Table 5.1: Dependence between ZCC and CDEm

             

Mobile

Mobile

ObjMgr

ObjMgr

SLDClient

SLDClient

JLin 630

JLin dev

Client

Client

Metric

old

new

old

new

old

new

7.0

7.1

Size on disk

788202

833213

1916940

1454570

1896690

1822725

722159

1725178

Size of ZIP archive

144018

209549

409760

300455

522499

590507

216222

503508

ZCC

0,18

0,25

0,21

0,21

0,28

0,32

0,30

0,29

CDEm (%)

82,4

83,3

82,5

83,0

85,9

84,2

87,1

88,6

The analysis of the examples also shows that many developers use “*” in import sections. Such inaccurate definition leads to inexact CDEm calculation. Hence the contradiction between ZCC and CDEm is most probably caused by not proper computation of the metrics. The author argues that more accurate experiments should be made in order to ascertain ability of these metrics to predict the maintainability. Noteworthy, that some peculiar properties of the software design can influence CDEm. For example the project Ant (www.apache.org) has very low value for CDEm, because almost every class uses the following classes: BuildException, Project, Task and some others. Such distribution of the user-defined strings leads to the underestimation of the entropy.

Complexity of the development process

Interesting approach is suggested in [HASS03] by Hassan. He argues, that if developers should intensively modify many files at the same time, it increases cognitive loading and can also cause problem for managers. Such strategy can also lead to the bad quality

of the product. As measurement for the chaos of software development was suggested an entropy-based process metric. Input data for the measurement is a history of code development. Time is divided in periods and for each period is calculated the frequency of changes for each source file (see illustration in figure 5.4). Through main property of the entropy it will maximize for uniform distribution. Thus high entropy for the distribution of the source code changes indicates situation, when many files are changed very actively. Low entropy shows normal development process, when only few files are edited actively and the rest is kept untouched or is changed very insignificantly. Evolution of entropy during the development process is illustrated in figure 5.5. Hence high entropy can warn the project manager about insufficient organization of the development process. The next entropy-based metric is discussed in sub-chapter “Metric: m – Structure Entropy” after introduction of an appropriate model. As short conclusion to the usage of the information theory in the software measurement one can say: it is powerful non-counting method for describing of the semantic properties of the software, but before one can use it, more experiments with exact and perceptive tools should be made.

Figure 5.4: The Entropy of a Period of Development [HASS03, p. 3] Figure 5.5: The Evolution

Figure 5.4: The Entropy of a Period of Development [HASS03, p. 3]

Figure 5.4: The Entropy of a Period of Development [HASS03, p. 3] Figure 5.5: The Evolution

Figure 5.5: The Evolution of the Entropy of Development [HASS03, p.4]

Model: Flow-graph

The flow-graph model represents intra-modular control complexity in form of a graph.

The flow-graph consists of edges and nodes, whereas nodes represent operators and

edges represent possible control steps between the operators. A flow-chart is a kind of

flow-graph, where decision nodes are marked out with a different symbol. Figure 5.6

provides an example of both notations. A region is an area within a graph that is

completely bounded by nodes and edges.

1 1 2 2 3 3 4 6 4 6 7 7 8 8 5 5
1
1
2
2
3
3
4
6
4
6
7
7
8
8
5
5
9
9
10
10
11
11
Flow Chart
Flow Chart
Nodes 1 Nodes Edges 1 Edges 2,3 2,3 6 4,5 6 4,5 R2 R2 R3 R3
Nodes
1
Nodes
Edges
1
Edges
2,3
2,3
6
4,5
6
4,5
R2
R2
R3
R3
7
R1
7
R1
8
8
R4
R4
9
9
10
10
Regions
Regions
11
11
Flow Graph
Flow Graph

Figure 5.6: Example of Flow-graph and corresponding flow-chart [ALTU06, p. 15]

Metric: MCC – McCabe Cyclomatic Complexity

The cyclomatic number from graph theory presents the number of regions in the graph

or number of linearly independent paths in the graph. Initially McCabe suggested using

cyclomatic number to assess the number of test cases needed to sufficient testing of the

module and called this metric MCC (McCabe Cyclomatic Complexity). Since all

independent paths through the module should be tested independently, it is

recommendable to have at least one test case for each path within the module. Thus

MCC presents minimal number of test cases for the sufficient test coverage.

However, later this metric was suggested for assessment of comprehension complexity

and now MCC is also used as recommendation for the modularity in development

process. Empirical researches have showed that probability of the fault increases in

modules with MCC>10. Thus it is recommendable to split modules with MCC>10 into

several modules. Many experts argue, that a lot of decision operators (IF, FOR, CASE,

etc.) increases the algorithm complexity of the module. It is obvious that the program,

where all operations are made sequentially, is easy independent of its size.

Consequently, MCC is included in the quality model in the areas Analyzability and

Testability.

One possible way to calculate MCC is: MCC = E - N + 2, where E = number of edges

and N = number of nodes. It has also been shown that for a program with binary

decisions only (all nodes have out-degree <= 2), MCC = P + 1, where P is number of

predicated (decision) nodes (operators: if, case, while, for, do, etc.).

Usage of MCC in the object-oriented environment

This intra-modular metric can be used both in procedural and in object-oriented

context. However, the usage of this popular metric in the object-oriented context has

some peculiarities. Usual object-oriented programs show understated values of MCC,

because up to 90% of methods could have MCC = 1. In [SER05] is hypothesized that

part of the complexity is hidden behind object-oriented mechanisms such as

inheritance, polymorphism or overloading. These mechanisms are in fact the hidden

decision nodes. A good illustration of this phenomenon applied to overloading could be

the following example:

Listing 5.1: Illustration of the hidden decision node in case of the overloading

class A{ method1(int arg){}; method1(String arg){}

}; … public A a;

a.method1(b);

Hidden in the last statement decision node could be represented in procedural way by

proving the type of the argument and calling the corresponding method. Additional

decision nodes for polymorphism and inheritance could be presented similar. The

hypothesis is: the less OO mechanisms are used, the more complex methods should be.

Experiment described in [SER05] tried to find inverse correlation between inheritance

factor (Depth in Inheritance Tree) and MCC, but didn’t show significant results.

Nevertheless polymorphism or overloading could be a better factor to correlate with.

Additional experiments are needed.

Since MCC can be calculated only for the single chunk of code, in OO-environment one

further metric is introduced in order to aggregate values and present metric for entire

class, see metric WMC (Weighted Methods per Class) for more details.

Model: Inheritance Hierarchy

For the next group of metrics different models of an inheritance hierarchy can be used.

Empirical objects for all these models are classes and interfaces, which are connected

into hierarchies using “extends” or “implements” relation:

In the Simple Inheritance Hierarchy nodes present classes and edges present

inheritance connections between classes, whereas only simple inheritance is

possible

The Extended Inheritance Hierarchy can have more than one root and allows

multiple inheritances. Such edges present “implements”- connections between

the interface and the class, which implements this interface. The Extended

Inheritance Hierarchy is a directed acyclic graph with no loops

The Advanced Inheritance Hierarchy supplements the Extended Inheritance

hierarchy by adding attributes and methods for each class

Because in ABAP and Java interfaces are widely used and they have great impact on the

analyzability and changeability, the Simple Inheritance Hierarchy was rejected. On the

other side the very detailed level of granularity, provided by the Advanced Inheritance

Hierarchy, is not very useful in context of the maintainability. Consequently, the

Extended Inheritance Hierarchy is chosen as most appropriate basis for the

maintainability metrics. For this model the following metrics were proposed:

Chidamber and Kemerer proposed the Depth of Inheritance Tree (DIT) metric, which is

the length of the longest path from a class to the root in the inheritance hierarchy

[CHID93 p.p. 14-18] and the Number of Children (NOC) metric, which is the number of

classes, that directly inherit from a given class [CHID93 p.p. 18-20].

Later, Li suggested two substitution metrics: the Number of Ancestor Classes (NAC)

metric to measure how many classes may potentially affect the design of the class

because of inheritance and Number of Descendent Classes (NDC) metric to measure

how many descendent classes the class may affect because of inheritance. These two

metrics are good candidates for the quality model in the areas Analyzability and

Changeability respectively and will be discussed in more details.

Metric: NAC – Number of Ancestor Classes

This metric indicates the analyzability from the viewpoint of inheritance and in general

holds: the deeper a class is placed in the hierarchy, the more ancestors has the class and

the more additional classes have to be analyzed and understood by the developer in

order to understand the given class. It is also can be shown that a class with high NAC

implements more complicated behavior. Several guides recommend avoiding classes

with DIT more than 3 or NAC more than 6.

Metric: NDC - Number of Descendent Classes

This metric shows the changeability of the class and means how many classes could be

affected by changing a given class by the developer.

Noteworthy that experiments with Chidamber and Kemerer metrics set [BASI95] show

that larger NOCs correlated with smaller defect probability. It can be explained by the

fact that classes with many subclasses are the subject of much testing and most of error

are found by the implementation of subclasses.

“Inheritance introduces significant tight coupling between super classes and their

subclasses” [ROSE, p.5]. Thus importance of NAC and NDC is high.

Geometry of Inheritance Hierarchy

Above the metrics NAC and NDC were introduced and it was shown that they are

good descriptors of a single class. In this subsection author tries to use these metrics to

describe the entire inheritance hierarchy.

Let’s try to classify inheritance hierarchies into subtypes based on geometrical

properties. The most important geometric characteristics are the width and weight

distribution. The width is a ratio of super-classes to total number of classes. An indicator

of width is U metric (Reuse factor), where U = super-classes / classes = (CLS - LEAFS) /

CLS. A super-class is a class that is not a leaf class. U measures reuse via inheritance.

The high U indicates a deep class hierarchy with high reuse. The reuse ratio varies in

the range 0 <= U < 1.

The weight distribution means tendency to where main functionality is implemented.

However there is no appropriate metric for the weight distribution. The best way to

indicate weight distribution is a histogram, where vertical axis represents DIT and

horizontal axis represents the number of classes, number of methods or sum of WMC.

Figure 5.8 depicts an example of top-heavy hierarchy, as functionality indicator the

metric WMC is selected.

Different designs of inheritance hierarchy are presented in figure 5.7. The next

experiment tries to estimate best geometry for inheritance hierarchy from viewpoint of

the maintainability using metrics NAC and NDC.

distribution . The width is a ratio of super-classes to total number of classes. An indicator

Figure 5.7: Types of Inheritance Hierarchies.

First of all, values of metrics are calculated for each class and then aggregated using

arithmetic mean. Let’s try to estimate the analyzability and changeability for each type

of hierarchy based on the average values. The comments can be also seen in the figure

5.7.

DIT

Distribution of Average WMC in Levels of Inheritance Tree

1 Ave-WMC 70 60 50 40 30 20 10 0 44,53 6 75,43 2 69,12 3
1
Ave-WMC
70
60
50
40
30
20
10
0
44,53
6
75,43
2
69,12
3
27,11
4
21,21
5
15,12

Figure 5.8: Weight distribution

Top-heavy hierarchies maybe not take advantage of reuse potential. Ultimately here the

design is discussed from viewpoint of the maintainability. Top-heavy means, that the

classes with main functionality are placed near the root, hence such a hierarchy should

be easy to understand, because the classes have small number of ancestors. However, if

classes have large number of descendents they are difficult to change.

Bottom-heavy hierarchy is easy to change, because many classes have no children.

Narrow bottom-heavy designs are difficult to understand because of many unnecessary

levels of abstraction.

Nevertheless this consideration has several problems:

Though the metrics NAC and NDC seem to be comprehensive, mean values of

this metrics are fungible and yield same numerical values. Ave-NAC can be

calculated as the number of descendent-ancestor relations divided by the

number of classes. Ave-NDC can be calculated as the number of ancestor-

descendent relations divided by the number of classes. Because each descendent-

ancestor relation is the reversed ancestor-descendent relation, Ave-NAC = Ave-

NDC. The numbers in figure 5.7 confirm it. Noteworthy, that the metrics DIT and

NOC have the same property, applied to simple inheritance hierarchy: Ave-DIT

= Ave-NOC. Therefore the aggregated values for these metrics are redundant

In some cases the metrics NAC and NDC cannot distinct between different types

of hierarchies, in the given example top-heavy narrow hierarchy has

approximately equal values as a bottom-heavy wide hierarchy with the same

number of classes. To distinct different designs additional metric is needed

In common it is not possible to assess the maintainability based on the

geometrical properties of the hierarchy, because it is more important to know

how the inheritance is used

Some experts mean, that the inheritance hierarchy should be optimized, for example

using balanced trees. However the theory of balanced trees is intended for the search or

change operation, such a tree will be always wide and bottom-heavy, because 50% of

the nodes are leafs. Thus such optimization is misleading for the maintainability’s goals.

Many experts agree that the inheritance is a very important attribute of the software,

which has impact also on the maintainability. However, there are different points of

view, some experts recommend using a deep hierarchy, others prefer a wide hierarchy.

Nevertheless author does not see any possibility to assess the entire inheritance

hierarchy from maintainability’s point of view using the metrics.

Consequently, the suggestion is to use metrics NAC and NDC for finding of classes,

which could be difficult to maintain because of erroneous usage of the inheritance. A

simple example of the audit is the report, which includes all classes with more than 3

super-classes or more than 10 sub-classes.

Metric: IF – Inheritance Factor

The metric IF (Inheritance Factor) shows the percentage of classes that belong to any

inheritance hierarchy. A stand-alone class doesn’t belong to any inheritance hierarchy

and thus doesn’t have any ascendant or descendant classes. Localizing of the faults in

the large stand-alone classes is difficult, because such class implements accomplished

functionality and is large. Classes, which belong to an inheritance hierarchy, provide

only fractional functionality and thus it is relative easy to find, which part has caused

fault, irrespective of the size. Additionally, classes within an inheritance tree can be

maintained using inheritance concept and so new functionality can be added with

preserving of the old functionality.

Model: Structure Tree

This model is presented by a directed graph composed of two different types of nodes:

leaf nodes and interior nodes; and two different types of edges: structural edges and

dependency edges.

A leaf node corresponds to a function module, global variables, program or form for

ABAP; method or public attributes for Java.

An interior node corresponds to either:

an aggregate of the leaf nodes (function pool, program or class)

an aggregate of other interior nodes (directory or package)

Structural edges, attached to interior and leaf nodes, create a tree that corresponds to

the package and file structure of the source. Note that a structural edge may connect

two interior nodes, or an interior node with a leaf node, but may never connect two leaf

nodes. Figure 5.9 (see p. 44) shows an example of the structure tree. In this example the

system has a package A, which has two classes (B and C). Points marked with small

letters (leafs) are methods or attributes. Doted edges between leafs are dependency

edges and represent calls.

Metric: CBO - Coupling Between Objects

The coupling is a quality, which characterizes the number and strength of connections

between modules. In the scope of maintainability, the software elements A and B are

interdependent if:

Some change to A requires a change to B to maintain correctness, or

Some change elsewhere requires both A and B to be changed

Obviously, the first case is much easier to find.

In general, objects can be coupled in many different ways. The next list presents several

important types of coupling, resulted from theoretical speculations of the author and

partially taken from [ZUSE98]:

By content coupling one module directly references the code of another module.

This type of coupling is very strong because almost any change in the referred

module will affect the referring module. In Java such type of coupling is

impossible, in ABAP is implemented through the INCLUDE directive

By common coupling two modules share a global data structure. In ABAP is most

commonly used in the DATA DICTIONARY. Such coupling is not very

dangerous, because data structures are changed very seldomly

By external coupling two modules share a global variable. This coupling deserves

attention, because excessive usage of global variables can lead to maintenance

problems. To handle with the external coupling a metric GVAR (number of

global variables) is suggested. However this metric is duplicated in metrics FAN-

IN and FAN-OUT and thus rejected from further investigation (see p. 54)

Data coupling is the most commonly used and unavoidable. In his work Yourdon

stated that any program can be written using only data coupling

[ZUSE98, p. 524]. Two modules are data coupled if one calls the other. In object-

oriented environment there are even more possibilities to use the data coupling:

o

Class A has a method with local variable of type B

o

Class A has a method with return type B

o

Class A has a method with argument of type B

o

Class A has an attribute of type B

There are several metrics for data coupling: FAN-IN, FAN-OUT for procedural

and RFC (Response For a Class), CBO for object-oriented environment. For the

metrics FAN-IN, FAN-OUT see section “Structure Chart” (p. 54)

Inheritance coupling appears in an inheritance hierarchy between classes or

interfaces. The metrics for this type of coupling have been discussed in the

previous section

Structural coupling appears between all units, which are combined together in a

container. For example all methods within a class are structural coupled into the

class; all classes within a package are coupled into the package. In order to

qualify such coupling the term cohesion is introduced in one of the next sections

(p. 44)

Logical coupling is unusual coupling, because modules are not coupled physically,

but changing of one will cause changing of another. Since there is no

representation of such coupling in the source code, the logical coupling is very

difficult to find out. Reader interested in this type of coupling can find reference

to the research of logical coupling at the end of chapter 10 (p. 91)

Indirect coupling. If class A has direct references to A 1 , A 2 , …, A n , then class A has

indirect references to those classes directly and indirectly referenced by A 1 , A 2 ,

…, A n . In this thesis (except inheritance) only direct coupling is considered

Content, common, logical and indirect coupling are not considered in this thesis,

structural coupling in form of cohesion of methods is discussed in one of the next

sections. The metrics for inheritance coupling have been discussed in the previous

section.

Coupling Between Objects (CBO) is the most important metric for data coupling in the

object-oriented paradigm. “CBO for a class is a count of the number of other classes to

which it is coupled” [CHID93, p. 20]. However it would be more precise to call this

metric Coupling Between Classes, because at the time of the measurement are no

objects created yet.

“In order to improve modularity and promote encapsulation, inter-object class couples

should be kept to a minimum. The larger the number of couples, the higher the

sensitivity to changes in other parts of the design, and therefore maintenance is more

difficult” [CHID93, p. 20]. Also the class with many relations to other classes is difficult

to test. Hence CBO impacts the Changeability and the Testability.

Nevertheless, CBO can indicate the Analyzability as well, but the RFC metric indicates

it more precisely.

Metric: RFC - Response For a Class

The response of a class is a number of methods that can potentially be executed in

response to a message received by an object of that class and can be expressed as

number of public methods of the class and sum of number of methods called by

methods of given class. Example of calculation of the RFC is shown in listing 5.3. For

more details see [CHID93, p.p. 22-24]. However, some implementation of the RFC (for

example Borland Together) counts private methods as well.

The class can implicit call the methods of its ancestors, for example in constructor, but

in this case the constructor will not be called from outside, thus only explicit calls of

foreign methods will be counted in this metric.

This metric shows the analyzability of the class:

The class with more methods is more difficult to understand than the class with

fewer methods

The method, which calls many other methods, is more difficult to understand,

than a method calling fewer foreign methods

In Java it is possible to use enclosed method calls, the example is showed in the

following listing:

Listing 5.2: Example of enclosed method calls

a = b.getFactory().getBrige().getWidth(c.getXYZ, 15);

Such calls embarrass the understanding of the program repeatedly and should be

counted as separate method calls. Noteworthy in ABAP this is not possible.

Listing 5.3: Example of calculation of the metric RFC

public class RFCexample {

public ClassC c = new ClassC(); // constructor is not counted: RFC = 0

public int meth1() {// RFC = 1 int temp = 0;

};

temp += c.D(); temp += c.D();

// RFC = 2 // duplicate call: RFC = 2

return c.getClassB(). D() +

meth2();

// RFC = 3

// RFC = 4 // RFC = 5

private int meth2() {// private methods are counted: RFC = 6 return c.D(); // duplicate calls, which appear // in different methods are counted: RFC = 7

}

}

“If a large number of methods can be invoked in response to a message, the testing and

debugging of the class becomes more complicated since it requires a greater level of

understanding required on the part of the tester” [CHID93 p.22].

From the definition of the RFC is clear that it consists of two parts: number of methods

within a class and number of calls of other methods. Hence, RFC correlates with NOM

and FAN-OUT, this has been shown in [BRUN04, p.9].

RFC is an OO-metric and corresponds to FAN-OUT in the procedural context.

Metric: m – Structure Entropy

An interesting metric was proposed by Hewlett-Packard Laboratories in [SNID01]. A

simplified version follows. The main question discussed in here is “how can you

measure the degree of conformance of a large software system to the principles of

maximal cohesion and minimal coupling?”

The input to the model is the source code of the system to be measured. The output is

the numeric measure of the degree of conformance

Before the model is created the following assumptions are supposed:

Since engineers work with source code when modifying a system, it is interesting

to analyze the structure of the application at the lexical level

It is more interesting to analyze the global relationships than local ones

The more dependencies a module has to other parts of the system, the harder it is

to modify

“Remote” dependencies are more expensive (in terms of comprehensibility) than

“local” dependencies (restatement of cohesion and coupling principle)

An example of the used model is depicted in figure 5.9. Here some calls are short

(within one class), others are middle (between classes) or long (between packages). In

agreement with assumptions, the system with many short calls and only few long has a

good design.

Let’s find an optimal method for describing of character of the calls. Initially each

dotted edge can be described by a pair of numbers: start leaf and end leaf. Therefore,

each leaf needs log 2 F bits, where F equals the number of leafs. For the description of

each call 2*log 2 F bits are needed. However it is possible to reduce number of bits, by

indicating a relative path for the end leaf. Hence short calls need shorter description

and long calls – longer. If one describes all calls of the system in such way, and calculate

average number of bits needed for each call one can gather about design of the system.

Higher number of bits needed for description of average relation indicates poor design.

System I

Packages A A B Classes B C C Methods a a b b c c d
Packages
A
A
B
Classes
B
C
C
Methods
a
a
b
b
c
c
d
d
e
e
and attributes

Figure 5.9: Example of structure tree

System II

A A B B C C a a b b c c d d e e
A
A
B
B
C
C
a
a
b
b
c
c
d
d
e
e

Nevertheless the analyst doesn’t need to actually code all these calls. The information

theory says that one can easily estimate average number of needed bits based on the

entropy. As probability basis for entropy one can use the frequencies of length of call.

To make matter worse, long calls can be additionally penalized by coefficients. For

entropy background see section “Introduction into Information Theory”. More detailed

description of this metric can be found in [SNID01, p.p. 7-9]. Here just one simple

example is given in order to illustrate ability of this metric.

Consider two small systems, depicted in figure 5.9. Both systems have equal number of

classes, methods (F=5) and calls (E=4). However most of the calls in the first system are

long, this disadvantage was fixed in the second system by better encapsulation: method

c provide an interface for attributes d and e in its class. Thus it is supposed, that the

second system is more maintainable, because of more easier and structured design.

According to formulas given in [SNID01, p.p. 7-9], the structure entropies of the given

systems are:

m(I) = - (3/5*log 2 (3/5) + 1/5*log 2 (1/5) + 0 + 1/5*log 2 (1/5))

+ 4/5 (¼*log 2 (5* 8/20) + ¾*log 2 (5* 12/20))

2,52

m(II) = - (2/5*log 2 (2/5) + 2/5*log 2 (2/5) + 1/5*log 2 (1/5))

+ 4/5 (¾*log 2 (5* 8/20) + ¼*log 2 (5* 12/20))

2,44

Hence, the second system needs fewer bits for its description and has fewer long calls.

Consequently metric m (Structural Entropy) can indicate the tendency of the system to

have short or long calls.

Metric: LCOM - Lack of Cohesion Of Methods

The cohesion is one of the structural properties of a class. The cohesion is the degree, to

which the elements in a class are logically related. Most often it is estimated by the

degree of similarity of functions provided by methods. With respect to object-oriented

design, a class has to consist only of methods and attributes, which have common

functionality. If the class can be split in parts without the breaking the intra-modular

calls, as it shown in figure 5.10, the class is supposed to be not cohesive.

Hence, the second system needs fewer bits for its description and has fewer long calls. Consequently

Figure 5.10 Non-cohesive class can be divided in parts.

However, “coupling and cohesion are also interesting because they have been applied

to procedural programming languages as well as OO languages” [DARC05, p.28]. In

case of the procedural paradigm, the procedures and functions of the module should

implement a single logical function. Concern an example with a functions pool in

ABAP. It is an analogue of a class, it has internal global data (attributes) and functions

(methods). Noteworthy that by a call of one function from the pool the entire function

group is loaded into the memory. Consequently, if you create new function modules,

you should deliberate how they will be organized in the function groups. In one

function group you should combine only function modules, which use common

components of this function groups, so that the loading into the memory is not useless

(translation from [KELL01, p. 256]). Finally the low cohesion can indicate potential

performance problems. For the maintenance, the low cohesion means that the

maintainer has to understand the additional not related to the main part code, which

may be badly structured. This fact has an impact on the analyzability. Additionally, a

low cohesive component, which implements several different functionalities, will be

more affected by the maintenance, because the changing of one logical part of the

component can destroy other parts. “Components with low cohesion are modified more

often since they implement multiple functions. Such components are also more difficult

to modify, because a modification of one functionality may affect other functionalities.

Thus, low cohesion implies lower maintainability. In contrast, components with high

cohesion are modified less often and are also easier to modify. Thus, high cohesion

implies higher maintainability” [NAND99]. This fact has impact on the changeability.

High cohesion indicates good class subdivision. The cohesion degree of a component is

high, if it implements a single logical function. Objects with high cohesiveness cannot

be split apart.

Lack of cohesion or low cohesion increases complexity, thereby increasing effort to

comprehend unnecessary parts of component. Classes with low cohesion could

probably be subdivided into two or more subclasses with increased cohesion. It is

widely recognized that highly cohesive components tend to have high maintainability

and reusability. The cohesion of a component allows the measurement of its structure

quality.

“There are at least two different ways of measuring cohesion:

1. Calculate for each attribute in a class what percentage of the methods use that

attribute. Average the percentages then subtract from 100%. Lower percentages mean

greater cohesion of data and methods in the class.

2. Methods are more similar if they operate on the same attributes. Count the number of

disjoint sets produced from the intersection of the sets of attributes used by the

methods” [ROSE, p.4].

In [BADR03] most used metrics for cohesion are shortly described, see brief definitions

in table 5.2.

Metrics for cohesion are not applicable for classes and interfaces with:

no attributes

one or no methods

only attributes with get and set methods for these (data-container classes)

abstract classes

numerous attributes for describing internal states, together with an equally large

number of methods for individually manipulating these attributes

multiple methods that share no variables but perform related functionality. Such

situation can appear because of usage of several patterns

Classes, where calculation of cohesion is not possible, are accepted as cohesive.

To overcome these limitations the following various implementations of the LCOM

metric are possible:

Regarding inherited attributes and/or methods in the calculation or not

Regarding constructor in the calculation or not

Regarding only public method or all methods in the calculation

Regarding get and set methods or not

These implementations are independent of which definition is used. According to the

recommendation from [ETZK97], [LAKS99] and [KABA] and theoretical speculation the

following options were selected:

Inherited attributes and methods are excluded from calculation

Constructors are excluded from calculation

Get and set methods are excluded from calculation

Methods with all types of visibility are included into calculation

It is also possible to find and remove all data-container classes from research of

cohesion. It can be easily made by an additional metric NOM (Number Of Methods). In

case of the data-container class NOM=WMC.

Table 5.2: The major existing cohesion metrics [BADR, p. 2]

Metric

Description

LCOM1

The number of pairs of methods in a class using no attribute in common.

LCOM2

Let P be the pairs of methods without shared instance variables, and Q be

the pairs of methods with shared instance variables. Then LCOM2 = |P| -

|Q|, if |P| > |Q|. If this difference is negative, LCOM2 is set to zero.

LCOM3

The Li and Henry definition of LCOM. Consider an undirected graph G,

where the vertices are the methods of a class, and there is an edge

between two vertices if the corresponding methods share at least one

instance variable. Then LCOM3 = |connected components of G|

 

LCOM4

Like LCOM3, where graph G additionally has an edge between vertices

representing methods M i and M j , if M i invokes M j or vice versa.

 

Co

Connectivity. Let V be the vertices of graph G from LCOM4, and E its

edges. Then

edges. Then

LCOM5

Consider a set of methods {M i } (i = 1, … , m) accessing a set of instance

variables {A j } (j

=

1,

…,

a).

Let

µ

(A j )

be

the number of methods that

reference A j . Then

reference A . Then
 

Coh

Cohesiveness is a variation on LCOM5.

Cohesiveness is a variation on LCOM5.
 

TCC

Tight Class Cohesion. Consider a class with N public methods. Let NP be

the maximum number of public method pairs: NP = [N*(N – 1)]/2. Let

NDC be the number of direct connections between public methods. Then

TCC is defined as the relative number of directly connected public

methods. Then, TCC = NDC / NP.

LCC

Loose Class Cohesion. Let NIC be the number

of direct

or

indirect

connections between public methods. Then LCC is defined as the relative

number of directly or indirectly connected public methods.

LCC=NIC/NP.

DCD

Degree of Cohesion (direct) is like TCC, but taking into account Methods

Invocation Criterion as well. DCD gives the percentage of methods pairs,

which are directly related.

DCI

Degree of Cohesion (indirect) is like LCC, but taking into account

Methods Invocation Criterion as well

In [LAKS99] and [ETZK97] various implementations of the cohesion metrics (LCOM2

and LCOM3) are compared on C++ code example classes. Best results show the

following metrics:

LCOM3, which did not include inherited variables, and that did include the

constructor function in the calculations [ETZK97], [LAKS99]

LCOM3 with consideration of inheritance and constructor [LAKS99]

The metrics LCOM5 and Coh are not robust and are rejected from the further

investigation. The next simple example presented in table 5.3 shows this.

Table 5.3: Example for LCOM5 and Coh

 

A1

A2

A3

A4

A5

M1

+

     

+

M2

+

+

     

M3

 

+

+

   

M4

   

+

+

 

M5

     

+

+

µ

2

2

2

2

2

Obviously, the class is relative cohesive – all pairs of method have one common

variable, but the metrics show the opposite.

LCOM5 = ((1/a) Σ µ (A j ) – m) / (1 – m) = (10 / 5 – 5) / (1 - 5) = 0,75

Coh = Σ µ (A j ) / (m*a) = 10 / 5*5 = 0,4

In [BADR03] experts argue that methods can be connected in many ways:

Attributes Usage Criterion – two methods are connected, if they use at least one

attribute in common.

Methods Invocation Criterion – two methods are connected, if one calls other

Only three metrics (LCOM4, DCD, and DCI) consider both types of connections, all

other metrics consider only attribute connection.

The metrics have different empirical meaning:

Number of pairs of methods (LCOM1, LCOM2)

Number of connected components (LCOM3, LCOM4, Co)

Relative number of connections (TCC, LCC, DCD, DCI)

Most logically and interesting for the goals of this thesis is the number of connected

components, this could be interpreted as number of parts, in which the class could be

split.

Noteworthy, that the values of normalized metrics (TCC, LCC, DCD, DCI, Co) are

difficult to aggregate for representing of the result for entire system because averaging

of the percentages leads to value with bad numerical and empirical properties. For

more precise results the weighted mean value should be used. In case of size-dependent

metrics (LCOM1, LCOM2, LCOM3, LCOM4) simply average value can be used.

Hence LCOM4 is the most appropriate metric. Basically it is the well-handled metric

LCOM3 extended by the methods invocation criterion.

“A non-cohesive class means that its components tend to support different tasks.

According to common wisdom, this kind of class has more interactions with the rest of

the system than classes encapsulating one single functionality. Thus, the coupling of

this class with the rest of the system will be higher than the average coupling of the

classes of the system. This relationship between cohesion and coupling means that a

non-cohesive class should have a high coupling value” [KABA, p.2].

However in [KABA, p.6] by means of an experiment is shown that “in general, there is

no relationship between these (LCC, LCOM) cohesion metrics and coupling metrics

(CBO, RFC)”. Also one cannot say that less cohesive classes are more coupled to other

classes.

In [DARC05] Darcy believes that metrics for coupling and cohesion should be used only

together and expects, that “for more highly coupled programs, higher levels of cohesion

increase comprehension performance”. He motivated his conception by the following

thought experiment (figure 5.11).

Hence LCOM4 is the most appropriate metric. Basically it is the well-handled metric LCOM3 extended by

Figure 5.11: Interaction of coupling and cohesion (according to [DARC05, p. 17])

“If a programmer needs to comprehend program unit 1, then the programmer must

also have some understanding of the program units to which program unit 1 is coupled.

In the simplest case, program unit 1 would not be coupled to any of the other program

units. In that case, the programmer need only comprehend a single chunk (given that

program unit 1 is highly cohesive). In the second case, if program unit 1 is coupled to

program unit 2, then just 1 more chunk needs to be comprehended (given that program

unit 2 also shows high cohesion). If program unit 1 is also coupled to program unit 3,

then it can be expected that Short-Term Memory (STM) may fill up much more quickly

because program unit 3 shows low cohesion and thus represents several chunks. But,

the primary driver of what needs to be comprehended is the extent to which program

unit 1 is coupled to other units. If coupling is evident, it is only then that the extent of

cohesion becomes a comprehension issue.” Next, Darcy confirmed his hypotheses with

an experiment with the maintenance of a test application. However, the very artificial

sort of the experiment prevents reader from untried implementation of this hypothesis

without more experiments.

Other types of cohesion – Functional Cohesion.

Zuse [ZUSE98, p.525] distinguishes seven types of cohesion:

Functional Cohesion: A functionally cohesive module contains elements that all

contribute to the execution of one and only one problem related task

Sequential Cohesion: A sequentially cohesive module is one whose elements are

involved in activities such that output data from one activity serves as input data

to the next

Communicational Cohesion: A communicational cohesive module is one whose

elements contribute to activities that use the same input or output data

Procedural Cohesion: As we reach procedural cohesion, we cross the boundary

from the easily maintainable modules to the higher levels of cohesion to the less

easily maintainable modules of the middle levels of cohesion. A procedurally

cohesive module is one whose elements are involved in different and possibly

unrelated activities in which control flows from each activity to the next

Temporal Cohesion: A temporally cohesive module is one whose elements are

involved in activities that are related in time.

Logical Cohesion: A logically cohesive module is one whose elements contribute

to activities of the same general category in which the activity or activities to be

executed are selected from outside the module.

Coincidental Cohesion: A coincidentally cohesive module is one whose elements

contribute to activities with no meaningful relationship to one another”.

Since the functional cohesion is most desirable, some researchers ([BIEM94]) tried to

develop a metric to measure it.

Nevertheless, the “Functional Cohesion is actually an attribute of individual procedures

or functions, rather than an attribute of a separately compliable program unit or

module” [BIEM94 p.1] and is out of scope of this work. Inter-modular metrics are more

important, since these are better indicators of the maintainability. Intra-modular

cohesion seems to be too complicated in the calculation and weak in prediction of the

maintainability of the entire system.

The next type of cohesion is package cohesion or partition of classes into packages. Such

kind of cohesion is also important, however difficult to analyze. Hence the package

cohesion is the topic of separate research.

LCOM Essential:

It is the degree of relatedness of methods within a class.

Cohesion can be used in the procedural and in object-oriented model as well.

Has impact on the analyzability and the changeability

Cohesion may be concerned together with coupling

LOCM4 seems to be most appropriate metric from the theoretical point of view,

additional experiments are needed.

Metric: D – Distance from Main Sequence

This set of metrics was suggested by Martin in [MART95] and measures the

responsibility, independency and stability of the packages. Martin proposes to consider

a ratio between the amount of abstract classes within a package and its stability.

A package is responsible, if it has big number of classes outside this package that depend

upon classes within this package. This number is called Afferent Coupling (Ca). Package

is independent, if it has small number of classes outside this package that are depended

upon by classes inside this package. This metric is called Efferent Coupling (Ce). The

responsible and independent package is stable, such package has no reason to change,

and lots of reasons not to change.

For measuring stability of the package Martin suggests the Instability metric:

In = Ce / (Ca+Ce). This metric has the range [0,1]. In=0 indicates a maximally stable

package. In=1 indicates a maximally instable package. The special case of a package

coupled to no external classes (not mentioned by Martin) is considered to have the

instability of 0 [REIS].

If all the packages in a system were maximally stable, the system would be

unchangeable. In fact, designer wants portions of the design to be flexible enough to

withstand significant amount of change. Also package should have sufficiently number

of classes that are flexible enough to be extended without requiring modification -

abstract classes.

To measure it, Martin suggests the Abstractness metric: A = # of abstract classes in the

package / total # of classes in the package. This metric also has the range [0,1]. 0 means

concrete and 1 means completely abstract package.

The more stable is the package, the more abstract classes should it have in order to keep

the ability to extension. These metrics are presented graphically on the figure 5.12

Each dot in the coordinate frame presents one package with two characteristics: stability

and abstractness. Packages placed in area A are highly stable and concrete. Such

packages are not desirable because they are rigid. These cannot be extended because

they are not abstract. And they are very difficult to change because of high stability.

Packages from area C are also undesirable, because they are maximally abstract and yet

have no dependencies.

Packages from area B are partially extensible, because they are partially abstract.

Moreover, these are partially stable so that the extensions are not subject to maximal

instability. Such a category seems to be "balanced". Its stability is in balance with its

abstractness. The size of the dot in figure 5.12 indicates the size of the corresponding

package.

As final metric was suggested the distance from the dot, which presents the package, to

line A+In=1. Because of its similarity to the graph used in astronomy, Martin calls this

line the Main Sequence.

The perpendicular distance of a package from the main sequence is D = (A+In-1)/2.

This metric ranges from [0, ~0.707]. One can normalize this metric to range between

[0, 1] by using the simpler form D=|(A+In-1)|.

The big distance from the main sequence doesn’t mean that this package was bad

designed. It also depends on place of the package in the architecture. Packages working

with database or offering tools usually have high afferent coupling and low efferent

coupling, therefore are highly stable and difficult to change. Thus it is useful to have

more abstract classes here in order to be able to extend these and in such way maintain

the packages.

Packages for user interface depend from many other packages thus they have low

afferent coupling and high efferent coupling and are mostly instable. Hence designers

don’t need to have many abstract classes here, because these packages could be easily

changed. This statement should be empirically proved.

In figure 5.12 the analysis of project Mobile Client 7.1 (detailed described in chapter 8) is

presented. As it can be seen, the packages are evenly distributed on the whole square,

and it is impossible to conclude whether the entire system has good or bad design. The

same situation can be seen in all other analyzed projects. Hence D-metric is bad

indicator for the maintainability of the entire project. But one can notice that single

packages from areas A and C possible may by difficult to maintain. Thus D-metric is

supposed to be used for the metric-based audits. However experiments and discussion

with the designers show, that audits based on D-metric can find only evident errors of

design (for example not used abstract classes). Consequently, D-metric is rejected from

the quality model.

with database or offering tools usually have high afferent coupling and low efferent coupling, therefore are

Figure 5.12: Demonstration of the analysis of Martin on project Mobile Client 7.1

Metric: CYC - Cyclic Dependencies

The metric CYC determines the number of mutual coupling dependencies between

packages. That is, the numbers of other packages the package depends upon and which

in turn depend on that package. Cyclic dependencies are difficult to maintain and

indicate potential code to apply refactoring changes, since cyclically dependent

packages are not only harder to comprehend/compile individually, but they cannot be

packaged, versioned, and distributed independently. Thus, they violate the idea that a

package is the unit of release. Unfortunately, this metric is project size dependent and it

is impossible to compare two projects based on this metric. Consequently, the audits

based on this metric can be useful to catch cyclic package dependencies before they

make it into a software baseline.

Metric: NOM - Number of Methods and WMC - Weighted Methods per Class

Consider a class with n methods. Let c 1

  • c n be the complexity of the methods. Then:

...

in turn depend on that package. Cyclic dependencies are difficult to maintain and indicate potential code

If all method complexities are considered to be unity (equal to 1), then WMC = NOM =

n, the number of methods. However in most cases complexity of the methods is

estimated by MCC.

The metric WMC was introduced by Chidamber and Kemerer's [CHID93, p. 12] and

criticized by Churcher, Shepperd and Etzkorn. In particular, Etzkorn has suggested

new metric for complexity of the methods [ETZK99] – Average Method Complexity.

in turn depend on that package. Cyclic dependencies are difficult to maintain and indicate potential code

He argued that WMC has overstated values for classes with many simple methods. For

example, a class with 10 attributes has 10 get-methods, 10 set-methods and the

constructor, thus WMC = 21, what is very high value for such a primitive class. AMC,

on the opposite, will have understated values for classes with a few really complex

methods (MCC > 100) and many simple methods. “Thus, AMC is not intended

primarily as a replacement for the WMC metric, but rather as an additional way to

examine particular classes for complexity” [ETZK99, p. 12].

In this thesis it is more preferable to use WMC instead of AMC, because WMC is class

size dependent, but independent from the project size. Additionally, it has very clear

meaning: number of all decision statements in the class plus number of methods.

Consequently, WMC is a good metric for estimating of overall algorithm complexity of

the class.

For data-container classes NOM = WMC because such classes have only get- and set-

methods, which have MCC = 1. Thus, NOM can be used as additional metric for finding

data-container classes. It is important for rejecting the data-container classes from the

cohesion research.

Structure Chart

This model describes the communication between modules in the procedural

environment and suits for illustration of processes in non-OO ABAP programs.

Example of structure chart is depicted in figure 5.13. Boxes present modules (function

modules, programs, includes, etc.), circles present global variables and arcs present

calls, whereas parameters of the call can be also depicted. Direction of the arrows

distinguishes between importing and exporting parameters.

modules, programs, includes, etc.), circles present global variables and arcs present calls, whereas parameters of the

Figure 5.13: Example of structure chart

Metric: FAN-IN and FAN-OUT

These metrics describe the data and external coupling in the procedural environment.

Besides FAN-IN and FAN-OUT describe opposite directions of coupling:

Parameters passed by values count toward Fan In

External variables used before being modifies count toward Fan In

External variables modified in the block count toward Fan Out

Return values count toward Fan Out

Parameters passed by reference depend on their use

Drawback of these metrics is that it is assumed that all the pieces of information have

the same size, however the distinction of complexity of procedure calls requires much

more detailed analysis. All in all these metric can impart quite thorough idea about

coupling.

In the ABAP-environment functions for “where-used” and “usage” can be used for

reporting of FAN-IN and FAN-OUT respectively.

Based on these metrics a wide list of hybrid metrics was suggested in order to aggregate

metrics to one single value for the entire system. One example is D-INFO =

(SUM(FAN-IN*FAN-OUT)) 2 , where SUM means sum for all modules (see [ZUSE98] for

more details). However these derived metrics are project size dependent and have less

meaning.

Metric: GVAR - Number of Global Variables

This metric presents the number of global variables used in a system. Usually, to

overcome the size-dependency of this metric by normalizing, Number of Global

Variables is divided by number of modules. Nevertheless, this metric is indirectly

included in FAN-IN and FAN-OUT, hence it is senseless to include this excessive metric

into the quality model even in spite of its ease.

Other Models

Here some simple metrics, which don’t suit to any previous introduced models, are

discussed

Metric: DOCU – Documentation Rate

This quantitative metric indicates the percentage of modules, which have external

documentation: DOCU for ABAP or JavaDoc for Java. However, the quality of the

documentation itself is not considered and is very difficult to automatically assess at all.

Moreover, this metric is a part of the Maintainability Assessment. Thus this metric is

excluded from the model.

Metric: OO-D – OO-Degree

This is an additional metric for ABAP (for Java application is always 100%). It shows

percentage of compilable units created using the object-oriented paradigm (classes or

interfaces) to total number of compilable units. This metric like other additional metric

don’t have any qualitative meaning, but indicates the importance of OO-metrics: if only

small part of a system is created using OO-paradigm, the analyst will pay less attention

to OO-metrics.

Metric: SMI – Software Maturity Index

It is possible that the customer changes some modules in order to customize his system.

Before the maintainers can start with the analysis and updating of the customer’s

system, they should make sure that modules, which should be maintained, are not

affected by the customization. It is important to know how different from the standard

release the system actually is. For this reason the list of new created, changed or deleted

objects should be written. As the metric for the modifying degree, the metric SMI is

suggested. This is the rate of new created, changed or deleted objects with respect to the

total number of objects, whereas it is unimportant who has made the changes: IMS or

the customer.

SMI =

M

(

M

a

+

M

c

+

M

d

)

M

If SMI

is less

than

1,

it

is very likely that maintainers should compare the current

customer version with the standard release before the update. The SMI approaches 1 as

product begins to stabilize.

Empirical meaning of SMI is: which percent of modules is not changed with respect to

last standard release.

This metric has type Percentage, thus it has bad numerical properties, see next chapter

for more details.

For the ABAP environment the number of changed LOC can be calculated by Note

Assistant.

Noteworthy, that in the ABAP environment only small part of the system cannot be

changed by the customer, in the Java environment unchangeable part is much bigger.

Metric: NOD – Number Of Developers

This metric shows the average number of developers which have ever touched an

object. Author believes that modules, which were changed many times by different

developers, have complicated behavior and are hard to modify. Moreover, such

modules likely have very different styles and names conventions. All these factors

decrease the maintainability. The interpretation of the values depends on the used for

the development process methodology. For example eXtreme Programming doesn’t

distinguish the code ownership and in this case the metric NOD is senseless.

Correlation between Metrics

Various methods can be used in order to prove whether one metric depends on another.

Depends on numerical properties (see next chapter), several methods are possible.

Pearson and Kendall’s correlation coefficients are applicable with measures with a ratio

scale, whereas Spearman is used when the measure has an ordinal scale.

For the small amount of data the covariation can be used: Covariation = SUM(x-

ave(x))(y-ave(y))/(n-1)

Several experts argue that the positive result of the correlation not necessarily implies

the presence of casual relationship between the correlated metrics. In this thesis the

relation between the metrics is proved using the correlation and deduction as well.

This procedure is used in order to find correlated metrics in the quality model and reject

the less important metrics, which do not provide additional information. The second

scenario is an empirical validation of metrics, in this case it is proved whether the

selected product metrics are correlated with process metrics. Unfortunately, in this

thesis such study is impossible because lack of data for process metrics.

For the illustration and to stress out the important properties of the correlation, a

diagram can be used. The example depicted in figure 5.14 presents the correlation

between LOC and WMC. The area marked with “1” presents several generated message

classes, which have a lot of LOC, but no methods and thus no WMC. The area marked

with “2” presents interfaces and abstract classes, which have few methods and

approximately as few LOC.

WMC Correlation between LOC and WMC 200 150 100 50 2 0 1 0 400 800
WMC
Correlation between LOC and WMC
200
150
100
50
2
0
1
0
400
800
1200
1600
LOC

Figure 5.14: Correlation between LOC and WMC

The next possible relation between metrics is not so obvious. Each product has the

minimal inherent complexity, which depends only on the problem statement. If

complexity of one perspective is reduced, complexity of other perspective will increase.

For example reducing the high intra-modular complexity by increasing the total

number of classes will lead to increasing the inter-modular complexity. An example of

such relation is depicted in figure 5.15. With the numbers “1” and “2”, two releases with

the same functionality are marked off. Here can be seen that decreasing the average

MCC leads to increasing the total number of classes in the second release.

WMC Correlation between LOC and WMC 200 150 100 50 2 0 1 0 400 800

Figure 5.15: Example of dependency between MCC and NOO

Metrics Selected for Further Investigation

Some parts of the quality model were rejected already by the creation of the model and

expansion of it with the metrics:

Questions “Does the system have data flow anomalies?” and “Is a code that is

unreachable or that does not affect the program avoided?” were removed from

the quality model, because they have no great impact on the maintainability and

are difficult to calculate.

For the question “Are the naming conventions followed?” has not been found

any approp