0 views

Uploaded by EighthSenseGroup

Data mining is one of the most significant and developing trends in computer science world. It includes several concepts such as classification and clustering. In the current research we focus on the clustering methods. In this paper, we introduces a new enhanced Kprototypes algorithm (EKP) with three enhancements. The first is the centroids initialization methods, the second is the convergence condition which is connected to the distance between the new and old centroids, while the third is vectorization process which minimize the time complexity of KP significantly. We also modified the FCM algorithm to make the distance computation step more effective. We applied our test on two dataset (Adult and CoverType) which are large mixed dataset and one of them included missed values which treated by two different ways. In the experiments, we defined the best clustering algorithm, compare our results with traditional algorithms and the international studies. The results showed that EKP achieved 4.769s only to converge and 68±4% clustering accuracy. The results showed that EKP was better than EKP at CoverType dataset. The best initialization method was the space attribute-splitting with minimum time and maximum accuracy. We compared our research with the previous ones and found that our research achieved the best time and accuracy clustering.

save

- 1 a New Algorithm for Cluster Initialization
- Problem
- k-means php
- A Modified Mountain Clustering Algorithm
- Identifying Structures in Social Conversations in NSCLC Patients through the Semi-Automatic extraction of Topical Taxonomies
- IJAIEM-2014-07-24-58
- 2875-27398-1-SP
- Evaluatating Unsupervised Learning Techniques to Inventory Classification JBS-RV
- Experiments on Hypothesis Fuzzy K-Means is Better Than K-Means for Clustering
- Research of Intrusion Detection Based on an Improved K-means Algorithm
- FGKA a Fast Genetic K-means Clustering Algorithm
- A-Link-based-Cluster-Ensemble-Approach-For-Improved-Gene-Expression-Data-Analysis.pdf
- Application_Of_K-Means_1002.2425.pdf
- Bio Medical Data Analysis Using Novel Clustering Techniques
- 05656255
- Digital Booklet - Nothing Was the Same
- Numerical Exam Plk Means
- VOL3I1P5 - Decisive Cluster Evaluation of Institutional Quality In Education Systems
- A Survey of K Means and Ga Km the Hybrid Clustering Algorithm
- Meybodi AISP Shiraz 2012
- Kmeans and Adaptive k Means
- An Incremental K-means Algorithm
- Full Text 1
- 00192473
- V2I9-IJERTV2IS90699
- SURVEY Kmeans
- Proiect ME
- K ANMIClusteringCategoricalData
- Combining Multiple Clustering Systems
- Data Clustering in Education for Students
- [IJCST-V7I2P5]:Preethu Daniel, Anciya Backer, Shini K.N, Sulthana Nazar, Soumya Joseph
- [IJCST-V7I2P3]:Master Prince, Maha AlDwehy, Anwar AlShoumer, Mariah AlNosaian, Ruba AlRasheed
- [IJCST-V7I1P8]:Mohammad Alssarn, Mouin Younes, Chahada Moussa
- [IJCST-V7I2P6]:Athul Krishna P.B, Raheenamol M.R, Jesna M.S and Ms. Sheena Kurian K
- [IJCST-V7I2P8]:Muhammed Shihab V.K, Malavika Dileep, K.M. Muhammed Rameez, Ashish Soni, Sheena Kurian K
- [IJCST-V7I2P1]:Azhar Ushmani
- [IJCST-V7I1P7]:L.R.Arvind Babu
- [IJCST-V7I1P6]:Hydar Hassn, Bilal chiha
- [IJCST-V7I2P2]:Azhar Ushmani
- [IJCST-V7I1P10]:Kunal Khoche, Siddhesh Gurav, Rahul Pundir, Shyam Chrotiya
- [IJCST-V7I1P9]:A. Indhumathi, S. Jambunathan
- [IJCST-V7I1P11]:Alex Roney Mathew
- [IJCST-V7I2P4]:M.Sundarapandiyan, Dr.S.Karthik, Mr.J.Alfred Daniel
- [IJCST-V7I2P7]:Mary Celin AX, KM Fathima Farzana, Lija Jude, Sherwin J Vellayil, Abeera VP
- [IJCST-V7I2P9]:Abdul Jabar M A, Mohith Kishan K, Firose K Shanavas, Joshua P Jimson, Maria Joy
- [IJCST-V7I1P2]: S.Sumathi, Dr. S.Karthik, Mr. J.Alfred Daniel
- [IJCST-V6I6P17]:Anam Bansal
- [IJCST-V6I6P21]:Yogesh Sharma, Aastha Jaie, Heena Garg, Sagar Kumar
- [IJCST-V6I6P24]:P.Thamarai
- [IJCST-V6I6P25]:Anjali Mehta
- [IJCST-V7I1P5]:Naveen N, Sowmya C N
- [IJCST-V6I6P22]:Prof. Sudhir Morey, Prof. Sangpal Sarkate
- [IJCST-V7I1P1]: Azhar Ushmani
- [IJCST-V7I1P4]:Prof. Ms. Manali R. Raut, Trupti P. Lokhande, Karishma D. Godbole, Shruti J. More, Sunena S. Hatmode, Nikita D. Tibude
- [IJCST-V6I6P19]:Munish Bansal, Navneet Kaur, Inderdeep Kaur
- [IJCST-V6I6P23]:Sourav Biswas, Sameera Khan
- [IJCST-V6I6P16]:Ms.Vishakha A. Metre, Mr. Pramod B. Deshmukh
- [IJCST-V6I6P18]:Prof. Amar Nath Singh, Mr. Rishi Raj
- [IJCST-V7I1P3]: Joshua Gisemba Okemwa
- [IJCST-V6I6P20]:Asst. Prof. Shailendra Kumar Rawat, Dr. Prof. S. K. Singh, Dr. Prof. Ajay Kumar Bharti, Prof. Amar Nath Singh
- Genetic Risk Factors
- Ejercicios de Diagramas Simples
- Conectado Con La Fuente Divina
- Fenwal 2320
- Radio Comunicaciones
- From Memphis to Kingston
- ed85.pdf
- hyerion
- Diagrama de Estados
- Globalization or World Society-Luhmann
- QUIZ 5 CIENCIA Y TECNOLOGIA.docx
- Les Stats Qui Revelent Le PEAD
- Crucigrama Segunda Parte Plan de Estudios
- Articulo Cientifico Eficiencia en La Conversion de Energia
- Trabajo de Petrología 1
- Catalogo Airis
- Cocomo Model
- Tarchuk
- Modelo de Laudo Completo de Ruido Industria
- Manual Wbs Modeler
- ASTM D 1196
- Heat Treatment of Steels and Metallic Materials[1]
- densidad del aire.pdf
- lauren truman cv 2016
- Sanchez 2018 the Dynamic Nature of Bilingualism
- Curso de Atualização em Cardiologia 2009
- Proyecto Lulu
- 2006_exam_1
- Prevalencia de Anemia Falciforme en Medellín
- Pour en finir avec les idées reçues.docx

You are on page 1of 9

of the Large Mixed Data

Faten Alkrdy [1], Jabr Hanna [2]

PhD Student [1], Professor [2]

Computer Engineering

Tishreen University

Syria

ABSTRACT

Data mining is one of the most significant and developing trends in computer science world. It includes several concepts such as

classification and clustering. In the current research we focus on the clustering methods. In this paper, we introduces a new enhanced

Kprototypes algorithm (EKP) with three enhancements. The first is the centroids initialization methods, the second is the convergence

condition which is connected to the distance between the new and old centroids, while the third is vectorization process which minimize the

time complexity of KP significantly. We also modified the FCM algorithm to make the distance computation step more effective. We applied

our test on two dataset (Adult and CoverType) which are large mixed dataset and one of them included missed values which treated by two

different ways. In the experiments, we defined the best clustering algorithm, compare our results with traditional algorithms and the

international studies. The results showed that EKP achieved 4.769s only to converge and 68±4% clustering accuracy. The results showed that

EKP was better than EKP at CoverType dataset. The best initialization method was the space attribute-splitting with minimum time and

maximum accuracy. We compared our research with the previous ones and found that our research achieved the best time and accuracy

clustering.

Keywords :— Data Mining, Kprototypes, FCM, Kmeans, Clustering, Adult Dataset, CoverType Dataset.

I. INTRODUCTION time minimized by 48% off the original algorithm. The used

Data mining is one of the computer science trends and it dataset in both Kolen and Sun studies were very small (no

takes the importance of researchers due to its benefits and more than 500 data points).

importance in many applications like analysing big dataset to Liu [3], in 2008, applied FCM and partitioning of the data

investigate acknowledge about weather prediction, military points space to enhance the initial centroids generation. The

application, statistics of study centers, educational concepts accuracy of clustering improved in case of little datasets only.

etc. Chen et al [4] at 2011 used the Landmark-based Spectral

With the presence of large amounts of data stored in Clustering using random sampling to select landmarks (LSC-

databases and data stores, the need to develop powerful tools R). They applied their experiments on CoverType dataset.

for analyzing data and extracting information and knowledge They got only 24.75% clustering accuracy and 134.71s as

has increased. Hence, data mining has emerged as a technique average converge time.

aimed at extracting knowledge from vast amounts of data. But Celebi [5] at 2013, modified FCM in order to generate

although these techniques have led to good knowledge and initial centroids based on the idea of that each data point is a

useful databases, they are time-consuming in the exploration centre of other neighbours. They could define K number of

process, making their effectiveness limited. clusters for dataset. Their method lack the application on real

Our goal in the current research is to minimize the datasets.

clustering time and improve the performance in order to make Steteco et al [6], in 2015, used an improved FCM to

the data mining process more effective and robust. minimize the iteration of FCM. They reduced the iterations of

FCM by Factor 2.1 off the original FCM. They used very

II. RELATED WORK small dataset.

There are many studies in the field of clustering of big data. In 2014, Cheung [7] used the Kprototypes algorithm. He

Kolen et al. [1], modified FCM to make it depending on linear achieved a medium clustering time and high performance on

complexity approach. They minimized the time complexity adult dataset. He got 61.45% ± 1.4% accuracy and 15s

from O(NC2p) into O(NCP). The algorithm took 90.67s for average time.

each iteration instead of 745s. However, their algorithm added Kim [8] at 2017 used the speed Kprototypes to minimize

a new time complexity and didn't minimize time effectively as the calculations of distances. He minimized time significantly

they said. but on the other hand the accuracy degraded. The one iteration

Sun et al. [2], used the FCM and a series of integer indexes took 5.8 seconds.

to generate the initial centroids. His algorithm minimize time

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

Jang [9], at 2018, used the Efficient Grid-Based K- Figure 1. The Suggested Modified KP algorithm

Prototypes for clustering a randomized 10000 data points. He The modifications are with color red on figure 1.

achieved a time improvement against Kim's method where the We propose three different initialization approaches.

iteration took only 3.5s and less. Figure2 shows them.

All the previous studies lacks many important research

points such as some of them depended only on one type of

datasets. Other studies processed the time or the accuracy. A

very little studies focused on them both. Very little studies

treated the problem of detecting the best ways of centroids

initialization methods. Some other studies [10,11,12,1314]

used the classification concepts which are another different

type of data mining.

In the current study we try to deal with all that points

together to make a robust clustering system which is effective

in both time and accuracy.

In this study, we used and improved two different data mining

algorithms which are Kprototypes (KP) and Fuzzy C-Means

(FCM). A.

A. The New Enhanced Kprototypes (EKP)

We modified three stages of the original KP [15]. First, we

replaced the traditional random centroids initialization with a

new approaches which could initialize centroids with more fit

values and improve the performance.

The second modification is about how the algorithm

converge and stop iteration. In the original KP, there was a

number of iteration to perform clustering. In our method, we

related the convergence of KP with the difference between the

old and new centroids.

The third enhancement we made is the speedup of the

algorithm by applying the vectorization process on the KP

stages. Figure 1 illustrates the modified KP block diagram.

B.

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

C.

Figure 2. The Iniial Centroid Generation: A) the

randomized Euclidean distance way, B) the randomized

Euclidean distance based on space splitting way, C) the Figure 4. The Modified FCM Algorithm

randomized Euclidean distance based on space attribute- C. The Used datasets:

splitting way. In the current research, we used two different datasets. The

The vectorization process is performed on KP algorithm to first is the Adult dataset [18] which includes 8 numerical

minimize the time complexity of KP which is O(N*M*K*I) attributes and 6 categorical attributes. It also includes a missed

[13] where N is the total number of data points, M is the values. Its Prediction task is to determine whether a person

number of attributes, k is the number of clusters and I is the makes over 50K a year.

number of iterations. The time complexity is minimized by Z The second dataset is the CoverType dataset [19] which

factor which is defined by Amdal's Law [16]. consists of 581012 records and 12 attributes. It is used to

OnewKP=O(N*M*K*I)/Z; 𝑧=1/((1−𝐹)+𝐹 /𝑆) (1) predict the cover type of Roosevelt National Forest of

Figure 3 shows a simple example of the vectorization northern Colorado.

process. Table 1 includes detailed information about both datasets.

Table 1. Statistics of the used mixed data sets.

Dataset Instances Attributes Class

Adult 30162 8 (cat) + 6 2

(num)

CoverType 581012 2 (cat) + 10

(num)

D. Performance Metrics:

We used some evaluation metrics for the Kprototypes and

fuzzy clustering techniques.

Figure 3. Example of vectorization process on KP Equation 3 shows how can we compute the clustering

algorithm. accuracy while equation 2 explains the Rand_Index metric

B. The Modified FCM algorithm [20].

We modified the original FCM algorithm [17] through two

modifications. The first was the preparing of data which

included the suntituion of missed values and the normaliztion (3)

of data points. The second modification was the way to Where N is the number of data points, Ci is the cluster

compute the new distances from all centroids by the equation number which the data point belong to, map(li) is the cluster

(2) which achieved more cluster separation. number in which the data points classified, and 𝜹 is a number

mfnew=1./(1+expo.^(U)) (2) with two conditions (1 in case of matching between the

classified and original cluster number, 0 otherwise).

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

Second, we used the Rand Index metric. Given a set of n E. Test Scenarios:

elements S={o{1},… ,o{n}} and two partitions of S to For the test step, we suggested many scenarios which include

compare, X={X{1},… ,X{r}}}, a partition of S into r subsets, comparatives between traditional and modified algorithms

and Y={Y{1},… ,Y{s}}} a partition of S into s subsets, define besides the comparatives between our algorithms and the

the following [21]: other ones at the same datasets.

𝑹=(𝒂+𝒃)/(𝒂+𝒃+𝒄+𝒅) (4) The test scenarios includes the following:

Where a is the number of pairs of elements in S that are in i. Compare EKP with traditional one.

the same subset in X and in the same subset in Y, b is the ii. Compare the suggested initialization approaches and

number of pairs of elements in S that are in different subsets define the best one.

in X and in different subsets in Y, c is the number of pairs of iii. Compare the missed values approaches.

elements in S that are in the same subset in X and in different iv. Evaluate FCM on adult dataset based on the fuzzy

subsets in Y and d is the number of pairs of elements in S that performance metrics.

are in different subsets in X and in the same subset in Y. v. Evaluate EKP on CoverType dataset under different

Third, we used the following fuzzy metrics [22,23,24,25]: iterations.

Partition coefficient: which compute the intersection vi. Evaluate FCM on CoverType dataset using fuzzy

between clusters. performance metrics.

vii. Compare EKP and FCM.

viii. Compare our algorithms with the recent published

research on both adult and CoverType datasets.

where Ui,j is the membership matrix. The results were carried out on a laptop with Intel core i5

Coefficient entropy: compute the fuzziness of the processor and 4 GB of RAM.

clustering. A. Adult dataset results:

Here, we applied the first four test scenarios. Table 2

includes the result of applying KP on adult dataset before and

after the suggested modification. We used the number 10 as a

maximum iteration for KP.

(6) Table 2. KP Time Comparative before and after The

Inner Scattering: compute the degree of correlation of Suggested Enhancement.

data points inside the cluster.

Data Time

ion Time

Normalizat

Status/Tim

Processing

Clustering

Time (s)

Time (s)

Prepare

Missed

Values

(7)

(s)

(s)

Where nc is the number of clusters.

e

With Stop

(With Vectorization)

Condition

Enhanced KP (EKP)

Without

(8)

1.12

Where Xkp is the data point of P column and K raw in 1.025 0.106

9

dataset. Vip is the centre of ith cluster while ni is the number of

clusters. 4.76 8.99

The inner variance is gives as follows: 9 8

Original

(9) >20 Minutes

s s 7s

Where 𝑋 ̅^𝑃 is the mean value of the all data.

KP

It is given by equation 10.

The enhanced KP algorithm achieved 68.33% clustering

precision compared to 64.9% for Kmeans.

(10) We needed only 4.769 seconds until KP converges.

Where Dmin, Dmax are the minimum and the maximum The next scenarios were to define the best initialization

distances between centroids of clusters z and k. algorithm for the initial centroids generation process.

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

Figure 5 shows the difference between the four suggested

approaches on Adult dataset. Clustering Performance without Missed

Values Substitution

80

70

Mean Values

60

50

40

30

20

10

0

Space Attributes- With Distance Non-Similarity

Splitting

Centroids Initialization Methods

a.

Figure 5. A Comparison between Initial Centroids

Generation Suggested Approaches on Adult Dataset. Clustering Performance without Missed

In other point of view, the time comparison between that Values Substitution

four initialization approaches shows that the non-similarity 100

Mean Value

Clustering

approach is the most expensive approach with mean-time

equals to 9.517 s, while the space splitting is the least one with Accuracy

Clustering

mean-time equals to 0.78 s.

If we discuss these two results, we find that the splitting Time

Rand Index 0

way which depends on the attributes similarity to portioning

the space is the best way from time and accuracy point of

view.

The last scenario of the KP algorithm on the adult data set

is to define the nature of missed values of this dataset and the Centroid Initialization Method

best way to treat them. We suggest to use two principles; the

b.

first lets them missing while the second substitute them with

Figure 6. Accuracy of Enhanced KP Algorithm under The

the ways introduced by our previous research on the same

Two Approaches of Missed Value Processing (a): Without

dataset [26]. The second way replace the numerical values

any way, (B) with the suggested substitution method.

with the mean value of all attribute values, and the categorical

The results shows that the type of Adult dataset is "Missing

values with the most frequently value through them. Some of

completely at random (MCAR)" and the "Missing at random

the missed values is inferred by other attributes values. The

(MAR)" because some missed values can be inferred from the

"Education" missing values are replaced by the

others while other missed values should be substituted with

"Education_num" attribute values and vice versa. The

other way (mean, mode,…).

"Relationship" missed valued are substituted by means of

The results also illustrates that the accuracy didn't change

"Material_status" attribute values and vice versa. Figure 6

significantly.

shows the accuracy of enhanced KP algorithm under the two

For the modified FCM clustering algorithm, we find that

approaches of missed value processing.

the clustering spends 2.449 seconds to converge after 50

iterations. Table 3 includes the performance evaluation of

modified FCM on adult dataset. While table 4 shows a

comparison between KP and FCM on adult dataset.

Table 3. The performance evaluation of modified FCM on

adult dataset.

Attributes Number Clustering Performance

of Time Metrics

Clusters

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

Age 2 2.449 s

Partition b.

SurvyWeight coefficient Figure 7. EKP performance on CoverType dataset: (a)

CapitalGain (similarity) Clustering Accuracy, (B) Clustering Time.

CapitalLoss PC: 0.1844 B. CoverType Dataset Scenarios:

HoursPerWeek Coefficient For the CoverType dataset, we introduces many scenarios

NativeCountry Entropy (CE): of KP and FCM. Figure 7 (a,b) show the EKP time and

-0.2091 accuracy results on CoverType dataset. On the other hand,

(max=0.3) figure 8(a,b) includes the modified FCM clustering results.

Inner

Scattering: It can be noticed that the average accuracy time is linear and

0.7533 not exponential which approve that our enhancement

Outer minimize the accuracy time.

Scattering:

3.6731 ACCURACY

Clustering

50

Precision:

55.09 % 40

The KP algorithm achieves accuracy 65.83% with

Accuracy

clustering time 3.65 s, while FCM gets 55.09%, 3,74s at 30

iteration 75 for accuracy and time respectively. 20

Dataset 0

30

Number of Iterations

25

Accuracy

20

ACC

15

10

a.

5

0

Iter1 Iter2 Iter3 Iter4 Iter7 Iter10 Iter15

Time

Number of Iterations 400

ACC %

300

200

Time

a.

100

EKP Clustering Time on CoverType Dataset

0

140

-100

120 Number of Iterations

100

80 Time Linear (Time)

Time

60

b.

40

Figure 8. Modified FCM performance on CoverType dataset:

20 (a) Clustering Accuracy, (B) Clustering Time.

0

Figure 9 shows the clustering result on CoverType dataset

Iter1 Iter2 Iter3 Iter4 Iter7 Iter10 Iter15 after 75 iterations of modified FCM.

-20

Number of Iterations

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

[7] 5s 1.4%

Research s 4%

Previous Ones in Clustering on CoverType Dataset.

Research Clustering Accuracy (%)

Time (S)

Chen )Spectral 181006.17 44.24

Clustering) [4]

Figure 9. The clustering result on CoverType dataset after 75 Yan (KASP) [27] 360.07 22.42

iterations of modified FCM. Shinnou (CSC) [28] 402.14 21.65

Chen )Nystrom) [4] 258.25 22.31

If we compare EKP and FCM clustering results on CoverType

Chen (LSC-R) [4] 134.71 24.75

dataset, we can define that EKP has a better performance and

this is due to the fact that KP is better than FCM with large Chen (LSC-K( [4] 615.84 25.5

data processing. Figure 10 shows that difference. Li (SeqSC(200) [29] 980 28.5

Li (SeqSC(400) [29] 620 27.8

Comparison between EKP and FCM on Li (SeqSC(600) [28] 420 27.2

50 CoverType Dataset Li (SeqSC(800) [28] 230 27

45

Current Research 85.55 28.14

40 (EKP)

35 Current Research 254.3 40.1

30 (FCM)

25

20 V. CONCLUSIONS

15 In this paper, we introduced a new Kprototypes algorithm

10 with three enhancements. The first is the centroids

initialization methods, the second is the convergence

5

condition which is connected to the distance between the new

0 and old centroids, while the third is vectorization process

Iter1 Iter2 Iter3 Iter4 iter7 iter10 iter15 which minimize the time complexity of KP significantly.

FKprototypes Kprototypes We also modified the FCM in order to make more

separation between clusters. We applied tests on two different

Figure 10. A Comparison between EKP and FCM on large datasets (Adult and CoverType). Experimental tests

CoverType Dataset. shows that the new EKP is better than the original KP from

C. The Comparative Study with The International time and accuracy point of view. The new EKP is also better

Researches. than FCM for the CoverType dataset clustering. The

The last step in our test scenarios is a comparison between initialization process which depends on the splitting of data

our algorithms and the previous ones in the field of data space and the Euclidean Distance between data points is the

mining and clustering. Table 4,5 include two comparatives best initialization method. The adult dataset includes missed

between our algorithms and the previous studies on Adult and values which is treated by our algorithms using two ways. The

CoverType datasets. tests shows that no significant accuracy or time enhancement

Table 4. A Comparison between our Algorithms and the is achieved using the substitution methods for both numerical

Previous Ones in Clustering on Adult Dataset. and categorical values (i.e. the adult dataset missed values are

from the types MAR and MCAR).

Research Number Cluste Accuracy Accura The comparative study shows that our algorithms exceeded

of ring (EKP) cy the previous studies in case of accuracy and time enhancement.

Records Time (Kmean The future work can be done on more time enhancement using

s) other parallel processing techniques and the fusion of our

algorithms with the evolutionary ones in order to make more

Cheung 32162 15.279 61.45% ± 61.31%

accuracy enhancement.

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

ACKNOWLEDGMENT different classification methods. Med Devices Diagn

Eng: DOI: 10.15761/MDDE.1000103.

I will thank Dr. Mariam Saii for her scientific assistance

[13] Mayya M, Saii M, Hand and Finger Fracture

and cooperation with our work.

Detection and Type Classification Using Image

REFERENCES Processing, 2018, IJCST V6(4): Page(68-76) Aug-

2018. ISSN: 2347-8578.

[1] Kolen, J.F. and Hutcheson, T., 2002. Reducing the www.ijcstjournal.org.Published by Eighth Sense

time complexity of the fuzzy c-means Research Group.

algorithm. IEEE Transactions on Fuzzy [14] Raneem Knaj, Jafar Al- kheir, Creating a High-

Systems, 10(2), pp.263-267. Quality Syrian Audio Database for Analysis of

[2] Sun, H., Wang, S. and Jiang, Q., 2004. FCM-based Speaker Personality, IJCST, 2017, V5(4), Page(113-

model selection algorithms for determining the 116) Aug-2018. ISSN: 2347-8578.

number of clusters. Pattern recognition, 37(10), www.ijcstjournal.org.Published by Eighth Sense

pp.2027-2037. Research Group.

[3] Liu, T., Zhou, Y., Hu, Z. and Wang, Z., 2008, [15] HUANG Z, clustering large data sets with mixed

October. A new clustering algorithm based on numeric and categorical values, n The First Pacific-

artificial immune system. In Fuzzy Systems and Asia Conference on Knowledge Discovery and Data

Knowledge Discovery, 2008. FSKD'08. Fifth Mining, 1997.

International Conference on (Vol. 2, pp. 347-351). [16] https://study.com/academy/lesson/amdahl-s-law-

IEEE. definition-formula-examples.html, last accessed 10-

[4] Chen X, Cai D, Large Scale Spectral Clustering with 10-2018.

Landmark-Based Representation, Proceedings of the [17] Fernando Lobo, Fuzzy c-means algorithm, Data

Twenty-Fifth AAAI Conference on Artificial mining Lectures, available at

Intelligence, 2011, pp: 313-318. www.fernandolobo.info/dm/slides/fuzzy-c-means.pdf.

[5] Celebi, M.E., Kingravi, H.A. and Vela, P.A., 2013. A [18] Adult Dataset, available at:

comparative study of efficient initialization methods https://archive.ics.uci.edu/ml/datasets/adult, last

for the k-means clustering algorithm. Expert systems accessed 15-7-2017.

with applications, 40(1), pp.200-210. [19] Adult Dataset, available at:

[6] Steteco A, Xiao-Jun Zeng, John Keane, 2015, “Fuzzy https://archive.ics.uci.edu/ml/datasets/covertype, last

C-Means++: Fuzzy C-Means With Effective Seeding accessed 2-10-2017.

Initialization”, Science Direct, Expert Systems With [20] Alvarez M, Design and Analysis of Clustering

Applications, pp:7541-7548 Algorithms for Numerical, Categorical and Mixed

[7] Cheung Y, Jia H, " Categorical-and-numerical- Data, PhD thesis at Manufacturing Engineering

attribute data clustering based on a uniﬁed similarity Centre Cardiff University, ProQuest LLC publishing,

metric without knowing cluster number", Pattern 2013.

Recognition, 2014, pp:2228–2238. [21] Jinchao ji, Pang W, Zhou C, Han X, Wang Z, A

[8] Kim B, A Fast K-prototypes Algorithm Using Partial Fuzzy Kprototype Clustering Algorithm for Mixed

Distance Computation, Symmetry, 2017, pp:1-10. Numeric and Categorical data, Neuro-Computing,

[9] Jang H, Kim B, Kim J, Jung S, An Efﬁcient Grid- 2013, pp:129-135.

Based K-Prototypes Algorithm for Sustainable [22] Boriah S, Chandola V, Kumar V, Similarity

Decision-Making on Spatial Objects, Sustainability, Measures for Categorical Data: A Comparative

2018, pp:1-20. Evaluation, Department of Computer Science and

[10] Mayya A, Saii M. "Hybrid Recognition System Engineering University of Minnesota, 2008, pp:243-

under Feature Selection and Fusion". International 254.

Journal of Computer Science Trends and Technology [23] Halkidi M, Batistakis Y, Vazirgiannis M, On

(IJCST) V5(4): Page(78-84) Jul - Aug 2017. ISSN: Clustering Validation Techniques, Journal of

2347-8578. www.ijcstjournal.org.Published by Intelligent Information Systems, 31(3), 2001, pp:107-

Eighth Sense Research Group. 145.

[11] Mayya A , Saii M. "Iris and Palmprint Decision [24] Halkidi M, Batistakis Y, Vazirgiannis M, Clustering

Fusion to Enhance Human Recognition". Validity Checking Methods: Part II, ACM SIGMOD

International Journal of Computer Science Trends Record, 31(3), 2002.

and Technology (IJCST) V5(5): Page(42-46) Sep - [25] Aljawady R, Altaleb Gh, Clustering Validity, AL-

Oct 2017. ISSN: 2347-8578. Rafidain Journal of Computer Sciences and

www.ijcstjournal.org.Published by Eighth Sense Mathematics , 5(2), 2008, pp:79-94.

Research Group. [26] Alkrdy F, Hanna J, Time and Accuracy Enhancement

[12] Mayya AM, Saii M, (2016) Human recognition Of Mixed Data Sets Using Data Mining Algorithms,

based on ear shape images using PCA-Wavelets and Al-baath University Journal, 2017.

International Journal of Computer Science Trends and Technology (IJCST) – Volume 6 Issue 6, Nov-Dec 2018

[27] Yan, D.; Huang, L.; and Jordan, M.I. 2009. Fast

approximate spectral clustering. In Proceedings of

the 15th ACM International Conference on

Knowledge Discovery and Data Mining

(SIGKDD’09).

[28] Shinnou, H., and Sasaki, M. 2008. Spectral clustering

for a large data set by reducing the similarity matrix

size. In Proceedings of the Sixth International

Language Resources and Evaluation (LREC’08).

[29] Li Y, Huang J, Liu W, Scalable Sequential Spectral

Clustering, Proceedings of the Thirtieth AAAI

Conference on Artificial Intelligence (AAAI-16),

2016, 1809-1815.

[30] Suleiman A, Aldibaja I, Improving Web Cache

Algorithm Using Semantic Similarity Measures,

IJCST, 5(5), 12-17, 2017.

- 1 a New Algorithm for Cluster InitializationUploaded byBety Septika Setya Hanggara
- ProblemUploaded byPranjal Jaiswal
- k-means phpUploaded byMuhammad Iqbal
- A Modified Mountain Clustering AlgorithmUploaded byOmar Butrón Zamora
- Identifying Structures in Social Conversations in NSCLC Patients through the Semi-Automatic extraction of Topical TaxonomiesUploaded byAnonymous 7VPPkWS8O
- IJAIEM-2014-07-24-58Uploaded byAnonymous vQrJlEN
- 2875-27398-1-SPUploaded byAhmad Luky Ramdani
- Evaluatating Unsupervised Learning Techniques to Inventory Classification JBS-RVUploaded byjbsimha3629
- Experiments on Hypothesis Fuzzy K-Means is Better Than K-Means for ClusteringUploaded byLewis Torres
- Research of Intrusion Detection Based on an Improved K-means AlgorithmUploaded byDaniel Hidayat
- FGKA a Fast Genetic K-means Clustering AlgorithmUploaded byrealjj0110
- A-Link-based-Cluster-Ensemble-Approach-For-Improved-Gene-Expression-Data-Analysis.pdfUploaded byIJSTR Research Publication
- Application_Of_K-Means_1002.2425.pdfUploaded byprasad
- Bio Medical Data Analysis Using Novel Clustering TechniquesUploaded byHarikrishnan Shunmugam
- 05656255Uploaded byMohammad Afwanul Hakim
- Digital Booklet - Nothing Was the SameUploaded byKarim Sowley Delgado
- Numerical Exam Plk MeansUploaded byvi88xx
- VOL3I1P5 - Decisive Cluster Evaluation of Institutional Quality In Education SystemsUploaded byJournal of Computer Applications
- A Survey of K Means and Ga Km the Hybrid Clustering AlgorithmUploaded byIJSTR Research Publication
- Meybodi AISP Shiraz 2012Uploaded byMasoud760
- Kmeans and Adaptive k MeansUploaded bySathish Kumar
- An Incremental K-means AlgorithmUploaded byChandan Chouhan
- Full Text 1Uploaded byBrahim Sahraoui
- 00192473Uploaded byKranthirekha Chennaboina
- V2I9-IJERTV2IS90699Uploaded byWaris Ali
- SURVEY KmeansUploaded bySabitha Durai
- Proiect MEUploaded byRazvan Mazilu
- K ANMIClusteringCategoricalDataUploaded bymikypetrovici
- Combining Multiple Clustering SystemsUploaded byEda Çevik
- Data Clustering in Education for StudentsUploaded byIRJET Journal

- [IJCST-V7I2P5]:Preethu Daniel, Anciya Backer, Shini K.N, Sulthana Nazar, Soumya JosephUploaded byEighthSenseGroup
- [IJCST-V7I2P3]:Master Prince, Maha AlDwehy, Anwar AlShoumer, Mariah AlNosaian, Ruba AlRasheedUploaded byEighthSenseGroup
- [IJCST-V7I1P8]:Mohammad Alssarn, Mouin Younes, Chahada MoussaUploaded byEighthSenseGroup
- [IJCST-V7I2P6]:Athul Krishna P.B, Raheenamol M.R, Jesna M.S and Ms. Sheena Kurian KUploaded byEighthSenseGroup
- [IJCST-V7I2P8]:Muhammed Shihab V.K, Malavika Dileep, K.M. Muhammed Rameez, Ashish Soni, Sheena Kurian KUploaded byEighthSenseGroup
- [IJCST-V7I2P1]:Azhar UshmaniUploaded byEighthSenseGroup
- [IJCST-V7I1P7]:L.R.Arvind BabuUploaded byEighthSenseGroup
- [IJCST-V7I1P6]:Hydar Hassn, Bilal chihaUploaded byEighthSenseGroup
- [IJCST-V7I2P2]:Azhar UshmaniUploaded byEighthSenseGroup
- [IJCST-V7I1P10]:Kunal Khoche, Siddhesh Gurav, Rahul Pundir, Shyam ChrotiyaUploaded byEighthSenseGroup
- [IJCST-V7I1P9]:A. Indhumathi, S. JambunathanUploaded byEighthSenseGroup
- [IJCST-V7I1P11]:Alex Roney MathewUploaded byEighthSenseGroup
- [IJCST-V7I2P4]:M.Sundarapandiyan, Dr.S.Karthik, Mr.J.Alfred DanielUploaded byEighthSenseGroup
- [IJCST-V7I2P7]:Mary Celin AX, KM Fathima Farzana, Lija Jude, Sherwin J Vellayil, Abeera VPUploaded byEighthSenseGroup
- [IJCST-V7I2P9]:Abdul Jabar M A, Mohith Kishan K, Firose K Shanavas, Joshua P Jimson, Maria JoyUploaded byEighthSenseGroup
- [IJCST-V7I1P2]: S.Sumathi, Dr. S.Karthik, Mr. J.Alfred DanielUploaded byEighthSenseGroup
- [IJCST-V6I6P17]:Anam BansalUploaded byEighthSenseGroup
- [IJCST-V6I6P21]:Yogesh Sharma, Aastha Jaie, Heena Garg, Sagar KumarUploaded byEighthSenseGroup
- [IJCST-V6I6P24]:P.ThamaraiUploaded byEighthSenseGroup
- [IJCST-V6I6P25]:Anjali MehtaUploaded byEighthSenseGroup
- [IJCST-V7I1P5]:Naveen N, Sowmya C NUploaded byEighthSenseGroup
- [IJCST-V6I6P22]:Prof. Sudhir Morey, Prof. Sangpal SarkateUploaded byEighthSenseGroup
- [IJCST-V7I1P1]: Azhar UshmaniUploaded byEighthSenseGroup
- [IJCST-V7I1P4]:Prof. Ms. Manali R. Raut, Trupti P. Lokhande, Karishma D. Godbole, Shruti J. More, Sunena S. Hatmode, Nikita D. TibudeUploaded byEighthSenseGroup
- [IJCST-V6I6P19]:Munish Bansal, Navneet Kaur, Inderdeep KaurUploaded byEighthSenseGroup
- [IJCST-V6I6P23]:Sourav Biswas, Sameera KhanUploaded byEighthSenseGroup
- [IJCST-V6I6P16]:Ms.Vishakha A. Metre, Mr. Pramod B. DeshmukhUploaded byEighthSenseGroup
- [IJCST-V6I6P18]:Prof. Amar Nath Singh, Mr. Rishi RajUploaded byEighthSenseGroup
- [IJCST-V7I1P3]: Joshua Gisemba OkemwaUploaded byEighthSenseGroup
- [IJCST-V6I6P20]:Asst. Prof. Shailendra Kumar Rawat, Dr. Prof. S. K. Singh, Dr. Prof. Ajay Kumar Bharti, Prof. Amar Nath SinghUploaded byEighthSenseGroup

- Genetic Risk FactorsUploaded byヌール アミラ ヤスミン
- Ejercicios de Diagramas SimplesUploaded byMarcos Tulio Molina Romo
- Conectado Con La Fuente DivinaUploaded byFernanda Moreno Figueroa
- Fenwal 2320Uploaded byGilberto Laurias
- Radio ComunicacionesUploaded byVictor Guerrero Cardenas
- From Memphis to KingstonUploaded byCarlos Quiroga
- ed85.pdfUploaded byr_rac
- hyerionUploaded byrkraghuram007
- Diagrama de EstadosUploaded byFrancisco Javier Vázquez Espinoza
- Globalization or World Society-LuhmannUploaded byTyss Morgan
- QUIZ 5 CIENCIA Y TECNOLOGIA.docxUploaded byRuben Huff
- Les Stats Qui Revelent Le PEADUploaded byartus14
- Crucigrama Segunda Parte Plan de EstudiosUploaded byCory García
- Articulo Cientifico Eficiencia en La Conversion de EnergiaUploaded byKchorroSj
- Trabajo de Petrología 1Uploaded byDiana Liseth Vigo Malca
- Catalogo AirisUploaded byVq Calin
- Cocomo ModelUploaded byMaddy137
- TarchukUploaded byKaren Kleiss
- Modelo de Laudo Completo de Ruido IndustriaUploaded byGustavo Konieczniak
- Manual Wbs ModelerUploaded byErmel Guamán
- ASTM D 1196Uploaded bydadadad
- Heat Treatment of Steels and Metallic Materials[1]Uploaded byDražan Miloloža
- densidad del aire.pdfUploaded byerwinchoque
- lauren truman cv 2016Uploaded byapi-233496425
- Sanchez 2018 the Dynamic Nature of BilingualismUploaded byÁngel Maldonado
- Curso de Atualização em Cardiologia 2009Uploaded byCardiologia
- Proyecto LuluUploaded byJH Pabruse
- 2006_exam_1Uploaded byCarl Angelo Dela Cruz
- Prevalencia de Anemia Falciforme en MedellínUploaded byKrist H.
- Pour en finir avec les idées reçues.docxUploaded byAymeric Robert