Software Module Clustering Using The Hierarchical Clustering Combination Method

2022 7th International Conference on Cloud Computing and Big Data
Software Module Clustering Using the

Hierarchical Clustering Combination Method
Hong Xia Yongkang Zhang Yanping Chen
2022 7th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA) | 978-1-6654-8711-5/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICCCBDA55098.2022.9778877
School of Computer Science and School of Computer Science and School of Computer Science and
Technology, Xi’an University of Posts Technology, Xi’an University of Posts Technology, Xi’an University of Posts
and Telecommunications and Telecommunications and Telecommunications
Shaanxi Key Laboratory of Network Xi’an, China Shaanxi Key Laboratory of Network
Data Analysis and Intelligent Processing zhangyokalen@163.com Data Analysis and Intelligent Processing
Xi’an, China Xi’an, China
xiahong@xupt.edu.cn xaxuptcs@163.com
Fengwei Wang
Hengshan Zhang Zhongmin Wang ZTECorporation, No.55, Hi-techRoadSouth
School of Computer Science and Technology, School of Computer Science and Technology, Shenzhen, China
Xi’an University of Posts and Xi’an University of Posts and
Telecommunications Telecommunications
Shaanxi Key Laboratory of Network Data Shaanxi Key Laboratory of Network Data
Analysis and Intelligent Processing Analysis and Intelligent Processing
Xi’an, China Xi’an, China
zhanghs@xupt.edu.cn zmwang@xupt.edu.cn
Abstract—The scale of software applications has increased functions [3]. In this way, entities with similar functional
dramatically. Hierarchical clustering is a good method for characteristics are gathered into modules, and an architecture
modular recovery of software architecture. Because the different with high cohesion and low coupling is produced, and a structure
evaluation criteria of software types and clustering results, a single that is easier to maintain for software developers is provided [4].
hierarchical clustering algorithm cannot integrate the number of Therefore, modular technology is usually used, and clustering is
clusters, arbitrary decision-making, recovery quality and other widely used in the research of software architecture recovery,
indicators on different evaluation standards, there is no which is called the software module clustering method [5]. In
comprehensive clustering algorithm selection method. We propose the specific application of the software module clustering
a hierarchical clustering combination algorithm and use principal
method in software architecture recovery, the results are
component analysis to combine the results of multiple hierarchical
clusters, the combined result retain the basic information of each
represented in a visual view, combined with the logic of the
clustering algorithm as much as possible. The experimental results clustering algorithm, the architecture view that can be clearly
show that compared with the single software hierarchical and intuitively understood by users is obtained. Convert the
clustering method and other combined methods, the result of our clustering results into a form related to source code information.
method is closer to expert decomposition, and it has good The limitation of a single clustering algorithm is that it lacks
performance on a variety of indicators. stability under different evaluation criteria, may produce a large
number of arbitrary decisions (when multiple entities have the
Keywords—software architecture recovery; software module
same similarity, arbitrary decisions will appear), and will gather
clustering; hierarchical clustering combination; principal in the early stage, resulting in poor results. These are all due to
component analysis
the performance of similarity measures, so a single algorithm
I. INTRODUCTION performs better on one evaluation standard, but performs poorly
on another standard [6]. So we use the clustering ensemble
As the speed of software iterative evolution has accelerated method to integrate existing similarity measures. In recent years,
in recent years, the structure of the legacy software system has many studies have integrated single hierarchical clustering
gradually become chaotic, and maintenance costs have algorithms to improve clustering accuracy, called hierarchical
increased [1]. The software that has not been maintained for a clustering combination (HCC) [7].
long time lacks detailed documentation. The software
architecture recovery method is usually used to extract the To solve the above problems, we propose a hierarchical
software system architecture from the underlying information of clustering combination algorithm for software architecture
the software system [2]. recovery. The main contributions of this paper are as follows:
(1)Because of the unstable performance of a single clustering
Software architecture recovery needs to find appropriate similarity method, we will combine the results of multiple
entities from the source code, such as classes, functions, files, similarity methods, and the calculated similarity matrix can fully
etc., and group them into a collection of modules with the same describe and consider various relationships between software
978-1-6654-8711-5/22/$31.00 ©2022 IEEE 155
Authorized licensed use limited to: Istinye Universitesi. Downloaded on February 26,2023 at 01:12:26 UTC from IEEE Xplore. Restrictions apply.
entities. (2)In order to combine the results of different levels of III. SOFTWARE ARCHITECTURE RECOVERY METHOD
clustering methods and retain the characteristics of these BASED ON HCC-PCA
algorithms to the greatest extent, a new software module
This paper explores the hierarchical clustering combination
clustering algorithm is proposed, which uses PCA to combine
method of software architecture recovery. This method uses
hierarchical clustering algorithms and calculates the final result
several IAHCs generate dendrograms with different structural
to represent all base clusters.
characteristics, and then express these dendrograms in the form
II. RELATED WORK of description matrices, and then uses PCA to combine these
description matrices. The specific steps of the method are shown
The task of software module clustering is to define certain in Figure 1.
entities in the source code of the software. Then find the
association between the entities, cluster all the entities, and get Jaccard
the high cohesion, low coupling function module in line with the Jaccard NM Unbiased
Ellenberg
Information
loss
Similarity
methods
software design concept. The process is divided into four steps: Jaccam
software feature extraction, entity similarity calculation,

clustering algorithm improvement, and clustering result CLA WCA LIMBO Clustering
methods
evaluation [8].
Dendrogram 1-n
Maqbool, O et al. [4] reviewed hierarchical clustering 7
6
7
research in the context of software architecture recovery and

5 5
4 4
3 3
modularization, analyzed in detail various similarity and 2
E1 E2 E3 E4 E5 E6 E7 E8
2
E1 E2 E3 E4 E5 E6 E7 E8
distance measurement behaviours that may be used in software

clustering, and revealed the advantages and disadvantages of Using Descriptor
these methods. Based on the LIMBO algorithm, Wang, Y et al.
[9] introduced more static and dynamic information as the
characteristics of the software system, and assigned different
weights to different characteristics, thereby improving accuracy,
and to a certain extent, improving the recovery efficiency of the PCA
software architecture. Naseem, r et al. [10] introduced some of

the well-known advantages of existing binary similarity Description matrices combination
measures, and used these advantages, an improved new binary

similarity measure is proposed. The new similarity measure Final
Dendrogram
demonstrates the comprehensive advantages of existing
similarity measures by reducing arbitrary decisions and Fig. 1. Hierarchical clusterers combination using PCA
increasing the number of clusters. Naseem, r et al. [11] used
consensus-based technologies (cbts) and proposed a A. Feature Extraction
collaborative clustering technique (cct), which combines two In order to obtain an ideal clustering result, it is necessary to
cohesive hierarchical software clustering algorithms to solve the find sufficient feature information for the entity. On this basis,
weakness of individual similarity measures. Cho, c. Et al. [12] the following feature selection principles are defined. As shown
considered using clustering ensemble for software architecture in Table I.
recovery. Their experiments were conducted on five open source
projects, and the results were analyzed. Naseem, r et al. [13] TABLE I. THE SELECTED ENTITIES AND FEATURE RELATIONSHIPS
used a hierarchical clustering combination method, and No Entity relationship Description
proposed the concepts of absolute and relative features of the 1 Global variables Two entities reference the same global
dendrogram, then aggregated the vector matrix and used the referenced by entities variable
euclidean distance metric to calculate the distance between each 2 Local variables referenced Two entities reference the same local
by entities, User defined variable
two vectors. Our method is also inspired by this research. The types comprise strut, enum
difference is that our method focuses on combining the results and union
of clustering algorithms with significant differences, rather than 3 User-defined types used by User-defined types are referenced by
the process of combining. Compared with existing ensemble entities two entities
4 Data files referenced by Two entities access the same data file
methods, our proposed method focuses on integrating different entitie
similarity methods, considering the advantages of each 5 Entities called by entities Two entities call the same function, they
similarity method, the existing ensemble methods are mainly the show a certain cohesion
6 System calls for entity Two entities use similar system calls,
preliminary exploration of the cluster ensemble method in the reference they are considered to perform similar
field of software module clustering. In the literature [12], the functions
clustering ensemble method is used to recovery the source code 7 Macros referenced by Two entities contain similar macros,
files, and there is no further exploration of smaller entities. For entities then there can be a cohesive
relationship between them
the combination method used in the literature [13], the results of
the conventional hierarchical clustering algorithm are combined,
some algorithms that have advantages, such as wca, limbo, and Each entity is expressed as  1 2 i
, i is the number
E = E , E , ..., E
previously proposed similarity methods with good results are of selected entities, and the various relationships between the
not considered. entities extracted from it are expressed as the features of each
156
f = { f , f ,..., f } measure can give better results. Therefore, the above-mentioned
entity, and use a feature vector to represent: i i1 i2 ip
.
hierarchical clustering algorithms are selected as the base
The similarity between entities is determined based on their
clustering.
characteristics or characteristics. Features are usually binary,
after identifying entities and features, each entity in the system In the next step, the result of hierarchical clustering is
is represented as a feature vector. represented by a description matrix. The result of hierarchical
clustering is usually represented by a dendrogram, which can be
B. Similarity Computation obtained intuitively, can display hierarchical features, and is
The next step is to use the similarity calculation method, and easy to explain. A framework based on the description matrix is
calculate a N * N similarity matrix, which represents the used to preserve the common structure of the input clusters and
similarity between each pair of entities in the system. After generate a completely consistent tree . Many description
calculating the similarity, the hierarchical clustering algorithm matrices for dendrogram representation have been proposed
merges the entities according to the calculated similarity. The [16]. In this paper, Cophenetic Difference (CD) is used as the
similarity methods we use are shown in Table II [14], the representation method of the dendrogram description matrix, the
description is as follows: lowest height of the internal node (i.e., the merge distance) of
two specific leaves connected in the dendrogram.
TABLE II. SIMILARITY MEASURES
D. Combination Method Based on PCA
Similarity measure Formula
Jaccard a
We use the above basic clustering method to combine
Sim( Ei , E j ) = different similarities, and each clustering method produces a
a+b+c
JaccardNM dendrogram with a different structure. Using the principal
a
Sim( Ei , E j ) = component analysis method, these matrices containing position
2(a + b + c) + d
information are combined. The matrix is sorted according to the
Jaccam a(3(a + b + c) + d )
Sim( Ei , E j ) = degree of importance, and the most critical information is
(a + b + c)(2(a + b + c) + d ) retained. the principal component analysis uses the mean square
Unbiased Ellenberg 1/ 2* Ma error index and minimizes the error between the real data and
measure Eu ( Ei , E j ) =
1/ 2* Ma + b + c the low-dimensional data. This method combines the clustering
Information Loss I ( Ei , E j ) = [ p ( Ei ) + p ( E j )]* Djs ( f i , f j ) results into a dimensionality reduction problem, solved by
principal component analysis. All previous work used element
aggregators to combine description matrices. The proposed
Jaccard , JaccardNM and Jaccam are used to calculate the
principal component analysis method considers all the elements
similarity of binary features. Unbiased Ellenberg measure is a
of the description matrix[17].
non-binary form of Jaccard measure, used to calculate the
similarity of entities with non-binary features formed in the After the dendrogram obtained by base clustering is
clustering process. Information Loss measure applies represented by the description matrix, L N * N matrices are
information theory technology to software clustering, it retains Z = {z , z , z ,..., z } z
important information about entities by reducing information obtained, denoted as 1 2 3 L
, Each L represents the
loss in each clustering. description matrix representation of each clustering result. The
purpose of PCA is to combine these matrices into N 2 samples
C. Application of Software Clustering with L attributes, and then reduce dimension. Enter as:
The software architecture is finally reflected in the level and X ij = {xij(1) ,..., X ij( L ) }T , where xij(1) to xij( L ) describe each element at the
position of each entity. Therefore, using different hierarchical same position of the matrix. Together, they form the L attributes
clustering methods may produce very different results, such as X
CL, WCA, LIMBO [15]. of ij . In this way, a multidimensional matrix is formed so that
each The position contains all the elements in the same position
These methods have their characteristics. The best indicator, of the clustering result.
CL performs well in the internal evaluation of clustering results,
and can produce cohesive results for software clustering. 1 N N
Combined with similarity measures such as Jaccard, JaccardNM,

m=
N2
 X ij
Then calculate the sample mean, use i =1 j =1
, where
Jaccam, etc., it performs well in the number of clusters, reducing N is the number of objects, and then calculate the covariance
arbitrary decision-making and balancing these two factors. 1 N N
 X
T
WCA and Limbo reduce the number of arbitrary decision- C= ij X ij − mmT
N2
making in the second half of the clustering process. There is a matrix C, use i =1 j =1
.
high consistency with the results of expert decomposition. The Then calculate the eigenvalues and eigenvectors, and save
LIMBO algorithm uses the information loss metric to generate them in the vector  , where  contains L eigenvalues, and the
small internal clusters initially and then merge them, while
reducing any decision-making at the later stage of the clustering, matrix V represents the corresponding L eigenvectors.
thereby generating better (or more cohesive) clusters, if any, Finally, the eigenvector corresponding to the largest
software systems contain valuable functions that are common V V
across multiple subsystems. The LIMBO algorithm will get eigenvalue is extracted and expressed as max . The vector max
better results. If the utility function is associated with a specific contains L elements. In the last step, each object of the array X
subsystem, the WCA measured by the Unbiased Ellenberg
157
is projected onto the maximum value of the vector V , which is
X final = { X ijfinal | X ijfinal = Vmax
T
X ij }
calculated by .
final
Thus, the final description matrix X can be extracted.
The final step of the hierarchical clustering combination is
final
to use the final description matrix X obtained in the above
method to create the final hierarchy. Create the final hierarchy
by executing standard hierarchical clustering algorithms on the
combined final matrix, such as single link.
IV. EXPERIMENT AND RESULT ANALYSIS
The data set required for the experiment consists of CVS and
Fig. 2. The number of nonsingleton clusters of CVS
Bash from Maqbool, O.[4]. CVS is a commonly used code
version control software. Bash is the Shell (command line
environment) for Unix and Linux systems. We choose functions
as entities and use their formal characteristics.
In the experimental clustering strategy, we use CL, WCA and
LIMBO, and similarity methods Jaccard, JaccardNM, Jaccam,
Unbiased Ellenberg measure, information loss to cluster the
entities and feature data of these two data sets, and get different
results. Then, we use a hierarchical clustering combination
method to combine these results. We also used the two
previously proposed methods Sum Operator and Weighted
Average [18] for comparison, as shown in Table III.
TABLE III. CLUSTERING SIMILARITY/COMBINATION STRATEGY
Clustering Similarity/Combination
Method Strategy Fig. 3. The number of nonsingleton clusters of Bash
CL Jaccard
CL JaccardNM Because the LIMBO algorithm uses the information loss
CL Jaccam
WCA Unbiased Ellenberg Measure metric, the maximum number of clusters can be generated in the
LIMBO Information Loss Measure early clustering process. However, because the combination of
HCC-PCA PCA singleton and non-singleton occurs next, the number of clusters
HCC-A Sum Operator begins to decline, and then continues to rise. Due to the
HCC-B Weighted Average
generation of arbitrary decisions in the clustering process,
A. Internal evaluation singleton entities may be randomly assigned to different non-
1)Cluster size. Good clustering results should produce more singleton clusters, which affects the quality of clustering. The
non-singleton clusters in the initial stage, which are cohesive, excellent results of the CL algorithm show that the number of
and then merge these non-singleton clusters. The number of clusters generated in the early stage is the least, and is similar to
clusters is considered a good indicator, indicating that the the clustering results generated by WCA, because the similarity
clustering method performs better and creates good clustering metric they use uses "a" or the existence of features to determine
quality in terms of authority. similarity. The HCC-PCA algorithm can generate a large
In the early stage, a large number of non-singleton clusters number of clusters in the early stage, and maintain a steady rise
were generated, which means that the entities at the bottom are during the clustering process, which indicates that the clustering
looking for close associations, and merged at the bottom layer, using the PCA combination method in the early stage will merge
clustering multiple pairs of the most related and most similar more non-singleton clusters, and will combine a large number
entities. Therefore, the number of clusters will increase. When of non-singleton clusters. In the two data sets Bash and CVS, the
singleton clusters joining a non-singleton cluster, the number of effect of a single hierarchical clustering algorithm and other
clusters can remain unchanged. When two non-singleton combined methods Significantly lower than the PCA-based
clusters start to merge, the number of clusters will decrease. As combination method.
shown in Figure 2-3, on average, our method HCC-PCA creates 2)Arbitrary decision. The clustering algorithm groups entities
a large number of non-single clusters in clusters compared to based on the similarity of their features. In the clustering stage,
single clustering algorithms and other combination strategies. there may be more than two entities that calculate the same
The number of clusters generated at each step in the clustering degree of similarity. In this case, the clustering algorithm will
process is an important indicator for evaluating the clustering arbitrarily select two entities for clustering. This phenomenon is
quality. very common in the clustering process. Therefore, in the
clustering process, as the clustering progresses, in order to
obtain high-quality results, the proportion of arbitrary decisions
should be reduced.
158
Arbitrary decisions may reduce performance and affect the
clustering quality, so the percentage of arbitrary decisions is
used as a metric for estimating clustering. Figures 4-7 show the
percentage of arbitrary decisions made by various algorithms
during the iteration, where n represents the number of entities in
the system. We divided the clustering process by 5% and
counted the arbitrary number of decisions made at each node. In
the early clustering stage, since the entities have the same
similarity, more arbitrary decisions will be generated. Any early
decisions will not have much impact on the quality of the
clustering. This is because singleton entities are always merged
in the early stages of clustering. As shown in Figure 4, all
algorithms produce more than 20% of arbitrary decisions. As the Fig. 6. Percentage of arbitrary decisions (65%)
clustering algorithm is executed, more non-single entities will
be merged. At this time, arbitrary decisions will lead to huge
changes in the clustering results, and LIMBO shows more
arbitrary decisions. This can be attributed to the clustering utility
function in the first half of the clustering process. Since all
effective functions are similar, the number of arbitrary decisions
in LIMBO is relatively high.
Fig. 7. Percentage of arbitrary decisions (75%)
B. External Evaluation
In order to evaluate the quality of the results obtained by
various clustering algorithms, we compare the clustering results
with the decomposition prepared manually by human experts.
Fig. 4. Percentage of arbitrary decisions (1-55%) We use the MoJo metric [4] for software clustering evaluation,
MoJo calculates the number of movement and connection
Therefore, although LIMBO produced non-single cases in the operations required to transform one decomposition into another.
early stage. The cluster performance is better, it will also make To evaluate the clustering quality of various clustering
more arbitrary decisions, and the combination method can show algorithms, we used the maximum Q(M) value obtained at the
lower arbitrary decisions in the early stage, and the PCA-based cut-off height. Compared with the average Q(M) value, the
hierarchical clustering combination method performs best. maximum Q(M) value metric provides a better insight into the
Figures 5, 6 and 7 show that compared with other methods, clustering quality. The average Q(M) value may be excessively
HCC-PCA can always maintain the lowest arbitrary decision biased due to the low value at the beginning of the clustering
percentage. This represents a relatively high execution quality, process .
because it merges clusters with a small number of arbitrary Table 4-5 shows that for a single algorithm, the maximum
decisions in the second half of the cluster, and also obtains the Q(M) value of all systems is obtained through the (LIMBO)
best cluster quality, so due to the combination based on PCA in algorithm, followed by WCA and the CL algorithm that
the first half of the iteration process. The method can produce combines similarity. For the results of the combined algorithm,
small and denser singleton entity mergers, and has the smallest the PCA-based method obtains the best results, while the results
percentage of arbitrary decisions. Therefore, the use of HCC- of A and B are more similar to the average effect of a single
PCA is more suitable for software clustering. algorithm. In the results of Table 4-5, the results of a single
algorithm still show that LIMBO and WCA have the best results,
while the combined algorithm PCA performs best in all
comparisons, and the other two combined algorithms A and B,
still show average results. Since the purpose of software
architecture recovery is to recover the structure closest to the real
architecture as much as possible, the PCA-based hierarchical
clustering combination method is more suitable for software
architecture recovery.
TABLE IV. Q(M) VALUES BETWEEN DIFFERENT METHODS OF BASH
Method Bash
Exp1 Exp2 Exp3 Exp4 Exp5 Ave
Fig. 5. Percentage of arbitrary decisions (55%) CL(Jc) 39 44 36 41 35 39
159
CL(JNM) 37 42 36 42 34 38 [2] Capilla R, Jansen A et al. 10 years of software architecture knowledge
CL(Jac) 39 46 35 44 34 40 management: practice and future. Journal of Systems and Software, 2016,
WCA 40 46 35 44 34 40 116: 191-205.
LIMBO 59 63 58 63 56 60 [3] Capiluppi A, Ruscio D et al. Detecting java software similarities by using
HCC-PCA 58 65 62 64 59 62 different clustering techniques. Information and Software Technology, 2020,
HCC-A 43 49 41 47 39 44 122: 106279.
HCC-B 48 54 47 52 45 49
[4] Maqbool O, Babri H. Hierarchical Clustering for Software Architecture
Recovery. IEEE Transactions on Software Engineering, 2007, 33(11): 759-
TABLE V. Q(M) VALUES BETWEEN DIFFERENT METHODS OF CVS 780.
Method CVS [5] Izadkhah B, Izadkhah H. A graph-based clustering algorithm for software
Exp1 Exp2 Exp3 Exp4 Exp5 Ave systems modularization. Information and Software Technology, 2021, 133:
CL(Jc) 50 50 49 49 58 51 106469.
CL(JNM) 54 53 53 53 65 55 [6] Lutellier T, Chollak D. Measuring the impact of code dependencies on
CL(Jac) 55 52 53 53 65 56 software architecture recovery techniques. IEEE Transactions on Software
WCA 59 59 58 59 67 60 Engineering, 2018, 44(99): 159-181.
LIMBO 60 59 59 61 70 62
HCC-PCA 63 62 62 65 69 64 [7] Xiao W , Yang Y et al. Semi-supervised hierarchical clustering ensemble
HCC-A 52 56 55 56 66 57 and its application. Neurocomputing, 2016, 173: 1362-1376.
HCC-B 55 54 57 55 64 57 [8] Shtern M, Tzerpos V. Clustering Methodologies for Software Engineering.
Advances in Software Engineering, 2012, 1: 1-18.
V. CONCLUSION [9] Wang Y, Liu P. Improved Hierarchical Clustering Algorithm for Software
Architecture Recovery. 2010 International Conference on Intelligent
This paper explores the hierarchical clustering combination Computing and Cognitive Informatics, 2010, 247-250.
method of software architecture recovery. This method uses [10] Naseem M, Deris M M. Improved binary similarity measures for software
several IAHCs generate dendrograms with different structural modularization. Frontiers of Information Technology & Electronic
characteristics, and then express these dendrograms in the form Engineering, 2017, 18(8): 1082-1107.
of description matrices, and then uses PCA to combine these [11] Naseem R, Maqbool O and Muhammad S. Cooperative clustering for
description matrices. software modularization. Journal of Systems and Software, 2013, 86(8):
We evaluated the methods used in the experiment and from 2045-2062.
the internal evaluation of the clustering results and the external [12] Cho C, Lee K, Lee M and Lee C. Software Architecture Module-View
evaluation analyzed the algorithm. Experiments show that for Recovery Using Cluster Ensembles. IEEE Access, 2019, 7: 72872-72884.
the test system used, the PCA-based combination method can [13] Naseem R, Deris M M and Maqbool O. Euclidean space based hierarchical
clusterers combinations: an application to software clustering. Cluster
produce better results on all indicators. Comput, 2019, 22: 7287-7311.
[14] Naseem R, Deris M M. A New Binary Similarity Measure Based on
ACKNOWLEDGEMENT Integration of the Strengths of Existing Measures: Application to Software
This work is supported by Science and Technology Project Clustering. International Conference on Soft Computing and Data Mining
Springer, 2016, 549: 304-315.
in Shaanxi Province of China (Program No. 2019ZDLGY07-
[15] Shtern M, Tzerpos V. Methods for selecting and improving software
08), Natural Science Basic Research Program of Shaanxi clustering algorithms. Software Practice and Experience, 2014, 44: 33-46.
Province, China (Grant No. 2020JM-582), Scientfic Research [16] Rashedi E, Mirzaei A and Rahmati M. Optimized aggregation function in
Program Funded by Shaanxi Provincial Education Department hierarchical clustering combination. Computer Graphics Forum, 2016, 20:
(No. 21JP115), Natural Science Basic Research Program of 281-291.
Shaanxi (Program No. 2021JQ-719), and Special Funds for [17] Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary
Reviews Computational Statistics, 2010, 2 :433-459.
Construction of Key Disciplines in Universities in Shaanxi. [18] Kappe C P, Böttinger M. Analysis of decadal climate predictions with
userguided hierarchical ensemble clustering. Computer Graphics Forum,
REFERENCES 2019, 38: 505-515.
[1] Fittkau F, Krause A. Hierarchical software landscape visualization for
system comprehension: A controlled experiment. Software Visualization
IEEE, 2015, 36-45.
160

Software Module Clustering Using The Hierarchical Clustering Combination Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Software Module Clustering Using The Hierarchical Clustering Combination Method

Uploaded by

Copyright:

Available Formats

2022 7th International Conference on Cloud Computing and Big Data

Software Module Clustering Using the

978-1-6654-8711-5/22/$31.00 ©2022 IEEE 155

software feature extraction, entity similarity calculation,

research in the context of software architecture recovery and

modularization, analyzed in detail various similarity and 2

distance measurement behaviours that may be used in software

software architecture. Naseem, r et al. [10] introduced some of

measures, and used these advantages, an improved new binary

Combined with similarity measures such as Jaccard, JaccardNM,

Fig. 7. Percentage of arbitrary decisions (75%)

You might also like