You are on page 1of 5

IS J AA

International Journal of Systems , Algorithms & Applications

Implementation of Conceptual Cohesion of Classes to Predict Faults in Object Oriented Systems


1,2,3,4,5

Rakshith V, 2Ramesh Sagar V N, 3Sandeep A, 4Varun B, 5Suma V Department of Information Science and Engineering, Dayananda Sagar College of Engineering, VTU, Bangalore, India e-mail: rakshithv22@gmail.com1, rameshsagar09@gmail.com2, anke.sandeep@gmail.com3, varunverchus@gmail.com4, sumavdsce@gmail.com5

Abstract- One of the key characteristics of software industry is the development of quality software. Existence of high cohesion among the various classes in the software is one of the significant factors for quality software production. Software with high cohesion characteristics improves the understanding, productivity, maintenance and reuse of the product. Current trend in software development is largely based on utilization of the structural information from the source code, such as attribute reference in methods to measure cohesion. The position of this paper is the implementation of a new measure for the cohesion of classes in Object Oriented software systems based on the analysis of the unstructured information embedded in the source code, such as comments and identifiers. Conceptual cohesion of classes is the mechanism that has been implemented here in order to measure textual coherence in cognitive psychology and computational linguistics. The Paper thus, presents the principles and the technology that support the Conceptual Cohesion of Class (C3) measurement. Keywords: Software quality metrics, Software quality, Latent semantic indexing (LSI), Textual coherence, Software cohesion.

the strength in which the methods of a class relate to each other conceptually. The conceptual relation between methods is based on the principle of textual coherence. This paper provides information that interpret the implementation of methods as elements of discourse. There are many aspects of a discourse that contribute to coherence, including co reference, causal relationships, connectives, and signals. The source code is far from a natural language and many aspects of natural language discourse do not exist in the source code or need to be redefined. The rules of discourse are also different from the natural language. Some of the existing metrics to measure cohesion are LCOM metrics and TCC and LCC metrics: LCOM metrics Lack of Cohesion of Methods: This group of metrics aims to detect problem classes. A high LCOM value means low cohesion. TCC and LCC metrics: Tight and Loose Class Cohesion. This group of metrics aims to tell the difference of good and bad cohesion. With these metrics, large values are good and low values are bad. This investigation deems the aforementioned theme because cohesion of classes is measured on the basis of unstructured information in the code. II.LITERATURE SURVEY Authors, Jehad Al Dallal, et al (2010) suggest the use of the discrimination metric. They express that discrimination metric measures the probability that cohesion metric will produce distinct cohesion values for classes with the same number of attributes and methods. They further feel that there metrics produces different Connectivity Pattern of Cohesive Interactions (CPCIs). However, a highly discriminating cohesion metric is more desirable because it exhibits a lower chance of inappropriately considering classes to be cohesively equal when they have different CPCIs [1]. Authors, Bela Ujhazi, et al. (2010) present two novel conceptual metrics for measuring coupling and cohesion in software systems. Conceptual Coupling metric which is implemented here between Object classes (CCBO), is based on the well-known CBO coupling metric, while the other metric, Conceptual Lack of Cohesion on Methods (CLCOM5), is based on the

I. INTRODUCTION Software quality may be defined as conformance to explicitly state functional and performance requirements, explicitly documented development standard and implicit characteristics that are expected of all professionally developed software. Some of the issues that affect code quality include readability, low complexity is of maintenance, testing, debugging, modification and portability. Quality software cohesion is a measure of the degree to which elements of a module in a software product belong together. Cohesion is considered to be imperative from a conceptual point of view. Most of the approaches to measure cohesion are automated as it is impractical to manually measure the cohesion of classes in large systems. The measures that we use deal with information that can be automatically extracted from software. The above obtained information is analyzed by automated tools. This would ignore less structured information from the software (for example, textual information). Cohesion is usually measured on structural information extracted solely from the source code (for example, attribute references in methods and method calls) that captures the degree to which the elements of a class belong together from a structural point of view. The measure for class cohesion, named the Conceptual Cohesion of Classes (C3), which captures the conceptual aspects of class cohesion which enables to measure

Volume 2, Issue ICTM 2011, February 2012, ISSN Online: 2277-2677 ICTM 2011|June 8-9,2011|Hyderabad|India

25

IMPLEMENTATION OF CONCEPTUAL COHESION OF CLASSES TO PREDICT FAULTS IN OBJECT ORIENTED SYSTEMS

IS J AA

International Journal of Systems , Algorithms & Applications

LCOM5 cohesion metric. One of the avantages of the proposed conceptual metrics is that they can be computed in a simpler (and in many cases, programming language independent) way as compared to some of the structural metrics [2]. Author, Lalji Prasad, (2009) defines a new set of operational measures for the conceptual coupling of classes that have been empirically studied and are theoretically valid. In this paper, he shows that these metrics capture new dimensions in coupling measurement, compared to existing structural metrics [3]. Authors, Sukainah Husein, et al. (2009) introduce their view of coupling and cohesion metrics and its implementation approach. Coupling and cohesion metrics are calculated by considering a number of relationships, which were introduced by several researchers. Based on the relationships, some sets of metrics were chosen and implemented [4]. Authors, Andrian Marcus, et al (2008) propose a new measure for the cohesion of classes in OO software systems based on the analysis of the unstructured information embedded in the source code, such as comments and identifiers. The measure, named the Conceptual Cohesion of Classes (C3), the mechanism used to measure textual coherence in cognitive psychology and computational linguistics. This paper, thus presents the principles and the technology that stand behind the C3 measure [5]. Authors, Richard Barker, et al. (2007) present the first large-scale empirical study of object oriented cohesion metrics. Their results show that by and large applications have similar distributions of measurements according to any given metric, but that the distributions can be quite different across metrics. This provides useful information for the ongoing empirical validation efforts for cohesion metrics [6]. Authors, Andrian Marcus, Denys Poshyvanyk (2005) propose a new set of measures for the cohesion of individual classes within an OO software system, based on the analysis of the semantic information embedded in the source code, such as comments and identifiers. They present a case study on open source software which compares the new measures with an extensive set of existing metrics. They further discuss and analyze the differences and similarities among the approaches and results [7]. III. RESEARCH DECISIONS The class of structural metrics is the most investigated category of cohesion metrics and includes lack of cohesion in methods LCOM (logic control output module), LCOM1 (logic control output module1), LCOM2 (logic control output module2), LCOM3 (logic control output module3), LCOM4 (logic control output module4),
Volume 2, Issue ICTM 2011, February 2012, ISSN Online: 2277-2677 ICTM 2011|June 8-9,2011|Hyderabad|India

LCOM5 (logic control output module5), Coh (coherence), TCC (tight class cohesion), LCC (loose class cohesion).
Class cohesion metric Lack of Cohesion of Methods (LCOM1) (Chidamber and Kemerer 1991) LCOM2 (Chidamber and Kemerer 1994) Definitions/Formulae LCOM1= Number of pairs of methods that do not share attributes. P= Number of pairs of methods that do not share attributes. Q= Number of pairs of methods that share attributes. LCOM2={P-Q, if P-Q >= 0 0, Otherwise LCOM3= Number of connected components in the graph that represents each method as a node and the sharing of at least one attribute as an edge. Similar to LCOM3 and additional edges are used to represent method invocations. LCOM5=(a-kl)/(l-kl), where l is the number of attributes, k is the number of methods, and a is the summation of the number of distinct attributes accessed by each method in a class. Coh=a/kl, where a, k, and l have the same definitions above. TCC= Relative number of directly connected pairs of methods, where two methods are directly connected if they are directly connected to an attribute. A method m is directly connected to an attribute when the attribute appears within the method's body or within the body of a method invoked by method m directly or transitively. LCC=Relative number of directly or transitively connected pairs of methods, where two methods are transitively connected if they are directly or indirectly connected to an attribute. A method m, directly connected to an attribute j, is indirectly connected to an attribute i when there is a method directly or transitively connected to both attributes i and j.

LCOM3 (Li and Henry 1993)

LCOM4 (Hitz and Montazeri 1995) LCOM5 (HendersonSellers 1996)

Coh (Briand et al. 1998)

Tight Class Cohesion (TCC) (Bieman and Kang 1995)

Loose Class Cohesion (LCC) (Bieman and Kang 1995)

Table 1: The definitions of few class cohesion metrics

The dominating philosophy behind this category of metrics considers class variable referencing and sharing between methods as contributing to the degree to which the methods of a class belong together. Most structural metrics define and measure relationships among the methods of a class based on this principle. Cohesion is seen dependent on the number of pair of methods that
26

IMPLEMENTATION OF CONCEPTUAL COHESION OF CLASSES TO PREDICT FAULTS IN OBJECT ORIENTED SYSTEMS

IS J AA

International Journal of Systems , Algorithms & Applications

share instance or class variables, one way or another. The differences among the structural metrics are based on the definition of the relationships among methods, system representation and counting mechanism. Somewhat different in this class of metrics are LCOM5 and Coh, which consider that cohesion is directly proportional to the number of instance variables in a class that are referenced by the methods in that class. LCOM4 is the metric which measures the number of "connected components" in a class. A connected component is a set of related methods (and class-level variables). There should be only one such component in each class. If there are 2 or more components, the class should be split into so many smaller classes. Any two methods namely method a and method b is said to be related if, they both access the same class-level variable, or one of the method calls another. Having determined the related methods, we draw a graph linking the related methods to each other. LCOM4 equals the number of connected groups of methods. If LCOM4=1 then it indicates that it is a cohesive class and if LCOM4>=2 then it indicates a problem and the corresponding class should be split into many smaller classes. Outline of Latent Semantic Indexing LSI is a corpus-based statistical method for inducing and representing aspects of the meanings of words and passages (of the natural language) reflective of their usage in large bodies of text. LSI is based on a vector space model (VSM) as it generates a real-valued vector description for documents of text. Results have shown that LSI captures significant portions of the meaning not only of individual words but also of whole passages, such as sentences, paragraphs, and short essays. The central concept of LSI is that the information about the contexts in which a particular word appears or does not appear provides a set of mutual constraints that determines the similarity of meaning of sets of words to each other. LSI was originally developed in the context of IR as a way of overcoming problems with polysemy and synonymy that occurred with VSM approaches. Some words appear in the same contexts and an important part of word usage patterns is blurred by accidental and inessential information. The method used by LSI to capture the essential semantic information is dimension reduction, selecting the most important dimensions from a cooccurrence matrix (words by context) decomposed using singular valued composition (SVD). As a result, LSI offers a way of assessing semantic similarity between any two samples of text in an automatic unsupervised way. LSI relies on an SVD of a matrix (word _ context) derived from a corpus of natural text that pertains to knowledge in the particular domain of interest. AccordVolume 2, Issue ICTM 2011, February 2012, ISSN Online: 2277-2677 ICTM 2011|June 8-9,2011|Hyderabad|India

ing to the mathematical formulation of LSI, the term combinations that occur less frequently in the given document collection tend to be precluded from the LSI subspace. LSI does noise reduction, as less frequently cooccurring terms are less mutually related and, therefore, less sensible. The formalism behind SVD is rather complex and too lengthy to be presented here. Once the documents are represented in the LSI subspace, the user can compute similarity measures between documents by the cosine between their corresponding vectors or by their length. These measures can be used for clustering similar documents together to identify concepts and topics in the corpus. This type of usage is typical for text analysis tasks. Uses of LSI in software engineering are presented and discussed in our previous work. The designers and the programmers of a software system often think about a class as a set of responsibilities that approximate the concept from the problem domain implemented by the class as opposed to a set of method attribute interactions. Information that gives clues about domain concepts is encoded in the source code as comments and identifiers. Among the existing cohesion metrics for OO software, the Logical Relatedness of Methods (LORM) and the Lack of Conceptual Cohesion in Methods (LCSM) are the only ones that use this type of information to measure the conceptual similarity of the methods in a class. The philosophy behind this class of metrics, into which our work falls, is that a cohesive class is a crisp implementation of a problem or solution domain concept. Hence, if the methods of a class are conceptually related to each other, the class is cohesive. The difficult problem here is how conceptual relationships can be defined and measured. LORM uses natural language processing techniques for the analysis needed to measure the conceptual similarity of methods and represents a class as a semantic network. LCSM uses the same information, indexed with LSI, and represents classes as graphs that have methods as nodes. It uses a counting mechanism similar to LCOM. IV. DATA FLOW DIAGRAM The java program is given as input to the module. From the given input the module would extract two kinds of data. They are variables and methods and valuable comments. The above obtained data is utilized effectively further in implementation. The variables and comments are considered to be structured information. The valuable comments obtained from the program are considered to be unstructured information. The structured information is processed using LCOM5 formulae. The unstructured information is processed using the LSI technique and by the application of the vector calculation metric. The results thus obtained are analyzed and interpreted to provide the output.

27

IMPLEMENTATION OF CONCEPTUAL COHESION OF CLASSES TO PREDICT FAULTS IN OBJECT ORIENTED SYSTEMS

IS J AA

International Journal of Systems , Algorithms & Applications

Assessment of the new cohesion measure In order to evaluate our measure, we conducted two case studies. The goal of the first case study was to determine whether the C3 measure captures additional dimensions of cohesion measurement when compared to existing structural cohesion measures. Our hypothesis is that, given the nature of the information and counting mechanism employed by C3, it should capture different aspects of class cohesion than existing structural measures. Existing research showed that cohesion measures can be used as good indicators for the fault proneness of classes in OO systems. In the second case study, C3 is compared with existing metrics and combinations of C3 with existing cohesion metrics are also compared with combinations of structural metrics (with each other) to assess whether they provide better results in predicting faults in classes or not. Our assumption is that combining C3 with other structural cohesion metrics should be a more complete indicator of cohesion (given that they capture different aspects of it); hence, it is a better indicator of fault proneness than combinations of structural metrics alone Investigation Analysis Work The conceptual similarity between documents is measured via the cosine or inner product between the corresponding vectors (i.e., methods), which increases if more words are shared. This underlying mechanism entirely supports the idea of measuring conceptual coupling and cohesion in software based on word matching from identifiers and comments in software. The source code of the software system is parsed and transformed into a corpus of textual documents where each document corresponds to the implementation of a method. Aforementioned LSI technique takes the corpus as an input and creates a term-by-document matrix, which captures the dispersion and co-occurrence of terms in class methods. SVD is used next to construct a subspace, referred to as the LSI subspace. All methods from this matrix are represented as vectors in the LSI subspace. The cosine similarity between two vectors is used as a measure of conceptual similarity between two methods and is purported to determine shared conceptual information between two methods in the context of the entire software system. MODULES I. Retrieving the structured information. II. Check the availability of structured information for your source code. III. Apply the LCOM5 formula for structured information. IV. Analyze about the comments i.e. unstructured information. V. Index Searching VI. Apply the Conceptual similarity formula. VII. Comparison
Volume 2, Issue ICTM 2011, February 2012, ISSN Online: 2277-2677 ICTM 2011|June 8-9,2011|Hyderabad|India

V.RESULT Software with a high cohesion characteristic improves the understanding, productivity, maintenance and reuse of the product. Conceptual cohesion of classes is the mechanism that has been implemented here in order to measure textual coherence. The measure for class cohesion, named the Conceptual Cohesion of Classes (C3), which captures the conceptual aspects of class cohesion which enables to measure the strength in which the methods of a class relate to each other conceptually. There are many aspects of a discourse that contribute to coherence, including co reference, causal relationships, connectives, and signals. The method used by LSI to capture the essential semantic information is dimension reduction, selecting the most important dimensions from a co-occurrence matrix (words by context) decomposed using singular valued composition (SVD). As a result, LSI offers a way of assessing semantic similarity between any two samples of text in an automatic unsupervised way. VI.CONCLUSION Classes in object-oriented systems,-are written in different programming languages and contain identifiers and comments which reflect the concepts from the domain of the software system. This information can be used to measure the cohesion of software. To extract this information for cohesion measurement, Latent Semantic Indexing can be used in a manner similar to measuring the coherence of natural language texts. This paper defines the conceptual cohesion of classes, which captures new and complementary dimensions of cohesion compared to a host of existing structural metrics. Principal compo28

IMPLEMENTATION OF CONCEPTUAL COHESION OF CLASSES TO PREDICT FAULTS IN OBJECT ORIENTED SYSTEMS

IS J AA

International Journal of Systems , Algorithms & Applications

nent analysis of measurement results which were conducted on three open source software systems statistically supports the aforementioned fact. Faults in classes can be predicted better using the combination of structural and conceptual cohesion metrics than using combinations of structural metrics. Highly cohesive classes need to have a design that ensures a strong coupling among its methods and a coherent internal description. Overall, the results indicate that C3 (Conceptual Cohesion of Classes) is a useful indicator of an external property of classes in OO systems, that is, the fault proneness of classes. REFERENCES
[1] Jehad Al Dallal Measuring the Discriminative Power of ObjectOriented Class Cohesion Metrics ,2010 IEEE. [2] Bla jhzi, Rudolf Ferenc, Denys Poshyvanyk2 and Tibor Gyimthy1 New Conceptual Coupling and Cohesion Metrics for

Object-Oriented Systems, 2010 Working Conference on Source Code Analysis and Manipulation. [3] Lalji Prasad, Aditi Nagar EXPERIMENTAL ANALYSIS OF DIFFERENT METRICS (OBJECT-OREINTED AND STRUCTURAL) OF SOFTWARE, 2009 First International Conference on Computational Intelligence, Communication Systems and Networks. [4] Sukainah Husein, Alan Oxley A Coupling and Cohesion Metrics Suite for Object-Oriented Software, 2009 International Conference on Computer Technology and Development. [5] Andrian Marcus, Denys Poshyvanyk, Rudolf Ferenc Using the Conceptual Cohesion of Classes for <varunverchus@gmail.com>, Fault Prediction in Object-Oriented Systems, IEEE transactions on software engineering, 2008. [6] Richard Barker, Ewan Tempero A Large-Scale Empirical Comparison of Object-Oriented Cohesion Metrics, 14th Asia-Pacific Software Engineering Conference,2007. [7] Andrian Marcus, Denys Poshyvanyk The Conceptual Cohesion of Classes, 21st IEEE International Conference on Software Maintenance (ICSM05),2005

Volume 2, Issue ICTM 2011, February 2012, ISSN Online: 2277-2677 ICTM 2011|June 8-9,2011|Hyderabad|India

29

You might also like