You are on page 1of 9

A new Component Selection Algorithm based on

Metrics and Fuzzy Clustering Analysis


Camelia Serban, Andreea Vescan, and Horia F. Pop
Babes-Bolyai University, Department of Computer Science,
1 Kogalniceanu St., 400084, Cluj-Napoca, Romania
{camelia,avescan,hfpop}@cs.ubbcluj.ro
http://www.cs.ubbcluj.ro/~{camelia,avescan,hfpop}

Abstract. Component-Based Software Engineering is concerned with


the assembly of preexisting software components that lead to software
systems responding to client specific requirements. This paper presents a
new algorithm for constructing a software system by assembling components. The process of selecting a component from a given set takes into
account some quality attributes. Metrics are defined in order to quantify
the considered attributes. Using these metrics values, a fuzzy clustering
approach groups similar components in order to select the best candidate.
We comparatively evaluate our results with a case study.
Key words: Component Selection Problem, Metrics, Fuzzy Analysis.

Introduction

The main objective of Component-Based Software Engineering [1] is that of obtaining a better and more efficient system while having a shorter development
time and using existing components rather than developing new ones. In this paper we address the problem of component selection. Informally, our problem is to
select a subset of components satisfying the system requirements. The difficulty
resides in the fact that each component had a related set of components that
share similar functionalities and because of this, an algorithm for the decision
process is needed. Fuzzy clustering analysis is used to classify the components
based on the values of metrics that measure different attributes of the components. The choice of the best component is based on the obtained classifications.
We discuss the proposed approach as follows. Section 2 presents the theoretical background regarding the problem that we address. Section 3 presents our
proposed algorithm for selecting a set of components that satisfies all the requirements. The approach uses fuzzy clustering analysis to help us decide which
component should be selected. Section 4 presents a case study. We have compared the obtained solution by our algorithm with solutions obtained by other
approaches. Finally, Section 5 summarizes the contributions of this work and
outlines directions for further research.

Camelia S
erban and Andreea Vescan and Horia F. Pop

Theoretical background

Problem statement. Component Selection Problem (CSP) consists of choosing


a number of components from a set of components such that their composition
satisfies a set of objectives. The notation used for formally defining our problem
is described in what follows.
Denote by SR the set of final system requirements SR = {r1 , r2 , ..., rn },
and by SC the set of components available for selection SC = {c1 , c2 , ..., cm }.
Each component ci may satisfy a subset of requirements
from SR, SRci =
T
{ri1 , ri2 , ..., rik }, with additional condition that SR SRci is not empty. We also
denote by CR = (SRc1 , SRc2 , ..., SRcm ) the vector containing the requirements
of all components.
In order to specify the component dependencies we use a dependency matrix
D [5]. The dependencies specification table contains dependencies between each
requirement in the set of all components requirements.
The goal is to find a set of components Sol in such a way that every requirement rj from the set SR may have assigned a component ci from Sol where rj
is in SRci .
Different components may exist to satisfy the same needed requirement and
our aim is to select the best available component.
Component classification based on metrics and fuzzy analysis. As
it has already been mentioned above, we need to evaluate the available set of
components in order select the best candidate. This evaluation is based on some
criteria (quality attributes) that are important for the final system.
The main objective of CBSE is that of obtaining a more efficient system,
with shorter development time and better quality products. These attributes
are quantified using some of the metrics stated in the following. The cost (C) of
a component metric is defined as the overall cost of acquisition and adaptation
of that component. Regarding the reusability criterion, the selected metrics are
Provided Services Utilization (PSU) and Required Services Utilization (RSU) [2].
The last metric considered in this study is Functionality metric (F) defined as the
ratio between the number of required services of the system that are provided by
the component and the number of required services of the system.The influence
of these metrics values over the quality attributes previously mentioned has been
discussed in [8].
In the following, each component ci from SC is described by a 4-dimensional
vector, ci = (C, P SU, RSU, F ).
Next, our goal is to group similar components regarding the defined attributes. To obtain this, we use a clustering approach [4]. The objects to be
clustered are components from our repository and the characteristics of these
objects are the corresponding metrics values. The next problem is the selection
of one component out of a set of possibilities.

Component Selection based on Metrics and Fuzzy Clustering Analysis

Proposed algorithm description

In this section we propose a new algorithm for the component selection problem
defined in section 2. Our algorithm is based on selected metrics and fuzzy clustering approach described before. The objects to be clustered are components,
each component being identified by a vector of metrics values. Our focus is to
group similar objects in order to select the best candidate. The fuzzy clustering
algorithm (Fuzzy n-means algorithm) used to determine the fuzzy partition of
the set of components is described in [3]. Taking these into account, we next give
the proposed algorithm.
Two alternative approaches are possible: one that uses only one initial partition and the second one that recomputes metrics based on the update of SR
and reclassifies the candidate components at each step of a component selection (from a set of candidates). The described situations are emphasized in the
algorithm by the use of the changeP artition input variable.
The Pseudocode of the approach (algorithms Metrics and Fuzzy-based Component Selection with Same Partition MFbCSwSPA and Metrics and Fuzzybased Component Selection with Changed Partition MFbCSwCPA) is described in Algorithm 1.
Algorithm 1 Metrics and Fuzzy-based Component Selection Algorithm (MFbCSwSPA/MFbCSwCPA)
Require: SR,n; {SR-set of requirements,n-no. of requirements}
SC, m; {SC-set of components and their metrics values, m-no. of components}
CR, D; {CR- components requirements vector, D-dependency matrix}
changePartition. {boolean value deciding if the algorithm recomputes the metrics
and then recomputes the Fuzzy partitions using the remained components.}
Ensure: Sol. {obtained solution }
1: FuzzyPartitionDet(SC, m, A, B);
{A, B fuzzy sets represented as vectors containing the membership degree of the
components of the two clusters;}
2: startCompSet=StartComponentsSet(CR, m, D);
3: selectedComp=SelectComp(A,B,m,startCompSet);
4: AddToSol(Sol,selectedComp); { adds the component selectedComp to Sol;}
5: UpdateReqSet(SR, n, selectedComp, CR, m);
6: while (SR is not empty) do
7:
if (changePartition) then
8:
CurrentCompSet(SC,m,SR,n); { provides the components that may offer functionalities from SR;}
9:
ReComputeMetrics(SC, m);
10:
FuzzyPartitionDet(SC, m, A, B);
11:
end if
12:
posibleCompSet=PosibleComponentsSet(SR, n, CR, m);
13:
selectedComp=SelectComp(A,B,m,posibleCompSet)
14:
AddToSol(Sol,selectedComp); { adds the component to the solution;}
15:
UpdateReqSet(SR, n, selectedComp, CR, m);
16: end while

Camelia S
erban and Andreea Vescan and Horia F. Pop

The F uzzyP artitionDet(SC, m, A, B) subalgorithm computes a fuzzy partition of the set SC. The fuzzy sets obtained after the first splitting are A, B. The
subalgorithm StartComponentsSet determines the subset of start components
from the set of given components set, those components that have no dependencies. The components with no dependencies are the source components, i.e.
the components that read the input data of the algorithm.
The SelectComp subalgorithm selects the new component to be added to
Sol from a set of possible components. Initially, the set of possible components
to be added to Sol are the start components set. At the next step, those possible
components are obtained by calling the subalgorithm P osibleComponentsSet.
The SelectComp subalgorithm is described in Subalgorithm 2. From this set
of possible candidates we have to select one of them. There are two different
cases that may appear: all these components belong to the same cluster, or some
of them are in the first cluster and the others in the second one. For the first case,
we select the component with the maximum membership degree for that class.
Regarding the second case we proceed in the following way: the best candidate
from each cluster is identified, and then some criteria are considered to choose
one of them. For this reason, the component set is split in two clusters. In future
research we will apply a divisive hierarchical algorithm and the initial partition
will be further split.

Algorithm 2 SelectComp Subalgorithm


Require: A,B, m; {A, B fuzzy partition (vectors containing the membership degree
of the components); m-no. of components;}
compSet; {the set of possible components candidates.}
Ensure: comp. {the best candidate component.}
1: if (BelongsToTheSameCluster(compSet,A,B,m,firstCluster)) then
2:
if (firstCluster) then
3:
comp = MaxMembershipDegree(compSet,A,m);
4:
else
5:
comp = MaxMembershipDegree(compSet,B,m);
6:
end if
7: else
8:
compA = MaxMembershipDegree(compSet,A,m);
9:
compB = MaxMembershipDegree(compSet,B,m);
10:
comp = CriteriaBasedBestClusterCandidateSelection(compA,compB);
11: end if

The subalgorithm CriteriaBasedBestClusterCandidateSelection decides,


based on some criteria which one of the two components is the best candidate.
The criteria helps us decide which of the two clusters contains good components considering the metrics values. In this way the selected component is the
best representative object from the good cluster.

Component Selection based on Metrics and Fuzzy Clustering Analysis

Case Study

In order to validate our approach we have used the following case study. The
set of requirements SR = {r0 , r3 , r4 , r7 , r9 , r12 } and the set of components SC =
{c0 , c1 , c2 , c3 , c4 , c5 , c6 , c7 , c8 , c9 } are given.
Table 1 contains for each component the provided services (in term of requirements for the final system). Table 2 contains the dependencies between
each requirement from the set of requirements. Table 3 contains the values of
the metrics.

Comp. Requirements
c0
r0 , r1 , r7
c1
r4 , r5 , r6 , r12
c2
r0
c3
r0 , r2 , r8 , r10
c4
r3 , r11
c5
r4 , r5 , r6 , r9
c6
r7 , r9 , r12
c7
r1 , r2 , r9 , r12
c8
r3 , r4 , r10 , r11
c9
r0 , r5 , r6 , r8 , r9 , r12
Table 1. Requirements

Comp.
PSU
RSU
F
C

4.1

Depend. r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12

r0

r1

r2
r3

r4

r5

r6

r7

r8

r9

r10

r11

r12
Table 2. Specification of the Requirements
Dependencies

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
0.66 0.50 1.00 0.25 0.50 0.50 1.00 0.5 0.5 0.5
0.5 0.60 1.00 0.20 0.0 0.60 1.00 0.33 0.50 0.5
0.33 0.33 0.16 0.16 0.16 0.33 0.50 0.33 0.33 0.50
0.08 0.07 0.06 0.09 0.06 0.14 0.15 0.14 0.07 0.14
Table 3. Initial Metrics Values

Solution obtained by the proposed algorithm

The criteria that help us decide which of the two components should be chosen
to be part of the final solution are based on three metrics values that quantify:
functionality, reusability and cost. When the functionality criterion has the same
value for both components, the reusability criterion is considered. If again, the
reusability criteria has the same value, the cost criterion is used. If all the criteria
are equal one of the components is randomly selected.

Camelia S
erban and Andreea Vescan and Horia F. Pop

Solution obtained with the same partition and metrics values. Applying the [3] algorithm we obtained the results in Table 4.
In the first step of the algorithm the only two components having no dependencies are c4 and c8 . Thus the start set of components (responsible for reading
the input data and do no computation) contains only these two. Based on the
partition, and because both of them are in the same cluster (A), the component
with the highest membership degree is chosen, c8 .
Class c0

A
B

Representative
components
(R.C.)
0.82 0.86 0.59 0.72 0.77 0.11 0.27 0.26 0.91 0.15 c0 , c1 , c2 , c3 , c4 , c8
0.18 0.14 0.41 0.28 0.23 0.89 0.73 0.74 0.09 0.85 c5 , c6 , c7 , c9
Table 4. The final partition for the set of 10 components
c1

c2

c3

c4

c5

c6

c7

c8

c9

The remaining set of requirements is: {r0 , r7 , r9 , r12 }. The components that
offer these requirements are split in the following clusters: A contains {c0 , c1 , c2 ,
c3 } with the most representative being c1 and the cluster B contains {c5 , c6 , c7 ,
c9 } with the most representative being c5 . To decide which of c1 and c5 components should be chosen, we consider the metrics-based criteria selection mention
above. In this case, the c1 component is chosen due to the cost criterion.
The new set of remaining requirements is {r0 , r7 , r9 }. The components that
offer these requirements are again grouped in two clusters. Based on the same
rules, between the c0 and c5 components, the component c0 was chosen.
After choosing the c0 component, the only requirement needs to be satisfied
is r9 . All the components that offer this functionality are grouped in the same
cluster, thus the chosen component is c5 .
The obtained solution contains the components: {c0 , c1 , c5 , c8 } and has the
cost 32. The reusability of this system solution is 5. We define reusability as the
number of requirements of the components in the system, that are not in the
current set of requirements.
Solution obtained with the changed partition and metrics values.
Applying the [3] algorithm, we obtained the results in Table 4. The first step is
the same as in the first version, component c8 being chosen. At the second step
the set of remaining requirements is {r0 , r7 , r9 , r12 } and the set of the candidate
components and the metrics values are stated in Table 5. The corresponding
fuzzy partition is presented in Table 7.
Comp. c0 c1 c2 c3 c5 c6 c7 c9
PSU 0.66 0.25 1.00 0.25 0.25 1.00 0.5 0.5
RSU 0.5 0.2 1.00 0.2 0.2 1.00 0.33 0.5
F
0.33 0.16 0.16 0.16 0.2 0.5 0.33 0.5
C
0.08 0.07 0.06 0.09 0.14 0.15 0.14 0.14
Table 5. Second Step Metrics Values for
candidate components

Comp. c0 c2 c3 c9
PSU 0.33 1.00 0.25 0.16
RSU 0.25 1.00 0.20 0.16
F
0.16 0.16 0.16 0.16
C
0.08 0.06 0.09 0.14
Table 6. Third Step of
Metrics Values for the candidate components

Component Selection based on Metrics and Fuzzy Clustering Analysis


Class. c0 c1 c2 c3 c5 c6 c7 c9 R.C.
0.47 0.92 0.40 0.96 0.84 0.12 0.54 0.23 c1 , c3
A
c5 , c7
0.53 0.08 0.60 0.04 0.16 0.88 0.46 0.77 c0 , c2
B
c6 , c9
Table 7. The second step partition for the
set of candidate components

Class. c0 c2 c3 c9 R.C.
0.93 0.01 0.98 0.91 c0
A
c3
c9
B
0.07 0.99 0.02 0.09 c2
Table 8. The third step partition for the set of candidate
components

Based on the new metrics values and the new partition of the candidate components, the c6 component is chosen between c3 and c6 . Updating the remaining
set of requirements, the only requirement needed to be satisfied is r0 , requirement provided by the components {c0 , c2 , c3 , c9 }. In Table 6 the new values of the
metrics are given. The corresponding partition is presented in Table 8 and the
criterion that differentiate the c2 and c3 is P SU . The c2 component is chosen.
Thus, the obtained solution contains the components: {c2 , c6 , c8 } and has the
cost 28 and the reusability only 2.
4.2

Comparative analysis of the obtained solutions by other


approaches

The problem of selecting components from a set of available components was also
discussed in several papers using various approaches. A Greedy algorithm [6] was
used taking into consideration also the dependencies between the components. A
Branch and Bound approach using the cost criterion and the number of remained
requirements was proposed in [7].
An approach for the same selection problem that uses principles of evolutionary computation and multiobjective optimization was proposed in [6]. The
problem is formulated as a multiple objective optimization problem having two
objectives: the total cost of the components used and the number of components
used. Both objectives are to be minimized. Besides these, the dependencies are
also treated, but are considered as constraints.
Another evolutionary algorithm that uses the same metrics values from this
paper was proposed in [8]. The problem was formulated as multiobjective, considering the metrics values. Two types of experiments were performed: first considering only two objectives and then considering all objective but with different
population sizes and different number of generations.
In order to compare our approach with those mention before, we describe in
Table 9 the obtained solutions with all approaches.
The solutions obtained by our proposed algorithm are comparable with the
ones provided by the others approaches, the main advantages being: the search
space dimension is drastically reduced, the obtained partition suggesting the
component that should be selected at a given step; the execution time needed
for selecting the best component is reduced due to the reduced search space;
the selection criteria of the components are based on several characteristics of
components (several metrics may be defined).

Camelia S
erban and Andreea Vescan and Horia F. Pop
Algorithm
Greedy
Branch and Bound
Genetic algorithm (only cost)
Genetic algorithm (only PSU and RSU)
MFbCSwSPA
MFbCSwCPA
Table 9. Obtained solutions

Solution
Cost Reusability
c4 , c0 , c7 , c1
35
5
c4 , c2 , c6 , c1
34
3
c2 , c6 , c8
28
2
c0 , c7 , c8
29
4
c0 , c1 , c5 , c8
32
5
c2 , c6 , c8
28
2
using different approaches

Conclusions and FutureWork

A new algorithm based on metrics and fuzzy clustering analysis that addresses
the problem of component selection was proposed in this paper. We evaluate our
approach using a case study. We will focus our future work on three main fronts:
to apply this approach for more case studies; to apply other fuzzy clustering algorithms in order to obtain the needed classification, and to select more relevant
metrics.

Acknowledgement

This material is based upon work supported by the Romanian National University Research Council under award PN-II no. ID 550/2007.

References
1. I. Crnkovic and M. Larsson, Building Reliable Component-Based Software Systems,
Artech House publisher (2002)
2. Hoek, A. v. d., Dincel, E., and Medvidovic, N., Using Service Utilization Metrics to
Assess and Improve Product Line Architectures, 9th IEEE International Software
Metrics Symposium (Metrics2003), Sydney, Australia, 2003
3. J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum
Press, New York (1981)
4. L.A. Zadeh, Fuzzy sets, Information and Control 8, pp.338353, (1965)
5. A. Vescan, Dependencies in the Component Selection Problem, in Proceedings of
the 6th ICAM - International Conference on Applied Mathematics, 2008 (accepted)
6. A. Vescan, An evolutionary multiobjective approach for the Component Selection
Problem, in Proceedings of the First IEEE International Conference on the Applications of Digital Information and Web Technologies, 252257, IEEE Press, (2008)
7. A. Vescan, H. F. Pop, The Component Selection Problem as a Constraint Optimization Problem, in Software Engineering Techniques in Progress, Wroclaw University of Technology, Wroclaw, Poland, Eds: Tomas Hruska, Lech Madeyski, Miroslaw
Ochodek, ISBN: 978-83-7493-421-3, 203211, IEEE Press, (2008)
8. A. Vescan, A Metrics-based Evolutionary Approach for the Component Selection
Problem, in Proceedings of the 11th International Conference on Computer Modelling and Simulation, IEEE Press, 2009 (accepted)

9
Component Selection based on Metrics and Fuzzy Clustering Analysis

Attribute Name Attr(E)


id
name
id
name
courses
id
name
nrCourses
courses

ac1 ,1
ac1 ,2
ac2 ,1
ac2 ,2
ac2 ,3
ac2 ,4
ac3 ,1
ac3 ,2
ac3 ,3

Properties of attribute entities types


Type
Agregation Visibility
user-defined
simple
private
user-defined
simple
private
user-defined
simple
private
user-defined
simple
private
user-defined
array
private
user-defined
simple
private
user-defined
simple
private
user-defined
simple
private
user-defined
array
private

Table 10. Specification of properties for entities of type attribute.

You might also like