Professional Documents
Culture Documents
An Efficient Data Clustering Algorithm Using Fuzzy PDF
An Efficient Data Clustering Algorithm Using Fuzzy PDF
ABSTRACT
In recent years, the dramatic rise in the use of the web and the improvement in process industries in
general have transformed our society into one that strongly depends on information. The huge amount of data
that is generated by this process contains important information that accumulates daily in databases and is
not easy to extract. The field of data mining developed as a means of extracting information and knowledge
from databases to discover patterns or concepts that are not evident The process usually consists the method:
transforming the data to a suitable format, cleaning it, and inferring or making conclusions regarding the
data. Machine learning is divided into two primary sub-fields: supervised learning and unsupervised
learning. Within the category of unsupervised learning, one of the primary tools is clustering. In this paper
we have established the fact that fuzzy clustering associate each pattern with every cluster using a
membership function where as traditional clustering approaches generate partitions, where in patterns
belong to one and only one cluster. In this paper, fuzzy clustering has been implemented for a multi-
compressor system for the first time. In fuzzy clustering, a large collection of documents is clustered and each
of the clusters is represented using its center. The fuzzy data-clustering algorithm generated for data based
controller for the multi-compressor system enhances the control efficiency.
Key words: Data clustering, Data clustering Algorithms, Data handling, Fuzzy logic, Fuzzy c-means
Algorithm, Multi-compressor.
130
IETECH Journal of Electrical Analysis, Vol: 1, No: 2, 130-136
represent the concept to be learned for each case. clustering problem is such that the ideal approach is
The goal is then; learn the concept in the sense that equivalent to finding the global solution of a non-
when a new, unseen case comes to be classified, the linear optimization problem. There are many
algorithm should predict a label for this case. Under different ways to express and formulate the
this paradigm, there is the possibility of over fitting clustering problem; as a consequence, the obtained
or “cheating" by memorizing all the labels for each results and its interpretations depend strongly on the
case, rather than learning general predictive way the clustering problem was originally
relationships between attribute values and labels. In formulated. If we consider all the “variations" of
order to avoid over fitting, these algorithms try to each different algorithm proposed to solve each
achieve a balance between fitting the training data different formulation, we end up with a very large
and good generalization, this is usually referred as family of clustering algorithms. Although in the
the Bias/Variance dilemma. The outcomes of this literature there are as many different classifications
class of algorithms are usually evaluated on a of clustering algorithms as the number of algorithms
disjoint set of examples from the training set, called itself, there is one simple classification that allows
the testing set. Methods range from traditional essentially splitting them into the following two
statistics approaches, neural networks and, lately, main classes: [3, 4]
Support vector machines. On the other hand, in • Parametric Clustering
unsupervised learning the algorithm is provided • Non-Parametric Clustering
with just the data points and no labels, the task is to
find a suitable representation of the underlying
distribution of the data. One major approach to
unsupervised learning is fuzzy data clustering. [2, 5,
6] Both supervised and unsupervised learning have
been combined in what some people called semi-
supervised learning. The unsupervised part is The fuzzy data-clustering algorithm will be
usually applied first to the data in order to make used to design a data based controller for the
some assumptions about the distribution of the data, system. The relationships between the presented
and then these assumptions are reinforced using a identification method and linear regression are
supervised approach. exploited, allowing for the combination of fuzzy
logic techniques with standard system identification
The simplest definition of clustering is shared tools. Attention is paid to the aspects of accuracy
among all and includes one fundamental concept: and transparency of the obtained fuzzy models. GA
the grouping together of similar data items into clustering can be used for this purpose. But fuzzy
clusters. A simple, formal, mathematical definition clustering is more beneficial. Fuzzy clustering alone
of clustering is the following: let X ∈ Rm×n a set of can’t give the optimal output so we use the
data items representing a set of m point’s xi in Rn combination of both fuzzy and GA techniques in the
The goal is to partition X into K groups Ck such design of controller. Using the concepts of model-
every data that belong to the same group are more based predictive control and internal model control
“alike" than data in different groups. Each of the K with an inverted model, the control design based on
groups is called a cluster. The result of the a fuzzy-GA model of a nonlinear dynamic process
algorithm is an injective mapping X→ C of data is addressed. To this end, methods, which exactly
items Xi to clusters Ck. The number K might be pre- invert specific types of fuzzy-GA models, are
assigned by the user or it can be an unknown, presented. In the context of predictive control,
determined by the algorithm. In this paper, we branch-and bound optimization is applied. Attention
assume that the user gives the K. The nature of the is paid to algorithmic solutions of the control
131
IETECH Journal of Electrical Analysis, Vol: 1, No: 2, 130-136
problem, mainly with regard to real-time control coefficient of performance (C.O.P.) is calculated as
aspects. This paper presents efficient algorithms for below.
data handling and data clustering as applied to a C.O.P. = Q/W (2)
multi-compressor systems. This multi-compressor Also
system is installed in MNC in Punjab and presently Relative C.O.P. = (actual C.O.P./theoretical C.O.P) (3)
possesses huge losses with an efficiency of around
60-65 %. The performance of the heat pump is taken
into account by a ratio (Q+W)/W and it is known as
2. CASE STUDY: MULTI-COMPRESSOR energy performance ratio (E.P.R.) it is obtained as
SYSTEM below.
This plant is basically a chemical plant and E.P.R. = (1 + Q/W) (4)
used for making the food products. In this Also
temperature variations occur, so cooling is required E.P.R. = (C.O.P. + 1) (5)
from time to time. A robust controller is required,
The value of C.O.P. should be less then one or
which can provide temperature stabilization and
greater then one, which depends upon the type of
accurate cooling. Systems having thermodynamic
the refrigeration system. The value of E.R.P. should
importance are divided into two groups. First, work
always be greater then one. Figure 2 shows the
developing systems which includes all types of
multi-mode system with single compressor which is
engines producing power using thermal energy and
used when numbers of loads at same temperatures
second work-absorbing systems which include
are to be taken by the refrigerating plant.
compressors, refrigerators and heat pumps etc.
Source and sink contain infinite energy at constant
temperature. Source temperature is always higher
then the sink temperature.
ñ = W/Q (1)
132
IETECH Journal of Electrical Analysis, Vol: 1, No: 2, 130-136
The pressures of the refrigerants coming out of the Step 2: Update U (l): Reallocate cluster
evaporators and after leaving the back pressure memberships to minimize squared errors:
valves is same and that is the suction pressure of the
compressor. [10, 11]
(7)
Hard-c-Means Clustering:-
Let c be the number of clusters, the hard
partitioning space. The above algorithm depicts the various
parameters involved in the proper execution of
fuzzy clustering algorithm.
(8)
4. PROBLEM FORMULATION:
Clustering criterion (objective function, cost The typical development of problem for
function) cluster analysis consists of four steps along with a
feed back path as shown in the figure 4. The steps
are as
1. Feature selection and extraction.
(9) 2. Clustering algorithm design or selection
Distance measure: 3. Cluster validation
4. Results interpretation
(10)
Algorithm:
Step 1: Calculate centers of clusters; c-mean vectors
133
IETECH Journal of Electrical Analysis, Vol: 1, No: 2, 130-136
Along with the above considerations the control will have membership values in [0,1] for each
strategy consists of formulating or identifying cluster.
control objective, input variables, output variables, zzzzzzzzz
constraints, operating characteristic, safety,
environmental, and economic considerations,
control structure and algorithm etc.
134
IETECH Journal of Electrical Analysis, Vol: 1, No: 2, 130-136
200 200
180.5
8. CONCLUSION & FUTURE SCOPE
150
162
144.5
In this paper, fuzzy clustering has been
Error criterion
128
112.5
implemented for a multi-compressor system. In a
100 98
50
40.5 40.5
50
60.5
collection of documents is clustered and each of the
32 32
24.5
18
12.5
8
4.5 2 4.5
8
12.5
18
24.5
clusters is represented using its center. The fuzzy
0 0.5 0 0.5 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
No of clusters
data-clustering algorithm generated for data based
Error Square Fuction
controller for the multi-compressor greatly enhances
the efficiency. Though the application of Genetic
Fig. 8: Error Criterion Algorithms for designing fuzzy systems is recent, it
135
IETECH Journal of Electrical Analysis, Vol: 1, No: 2, 130-136
has seen increasing interest over the last few years the Fifth IEEE International Conference, vol.3,
and will allow to fruitful research to be carried out pp. 2053-2058 ,1996.
in the building of fuzzy logic-based intelligent [7] Bonissone P.P., Khedkar P.S., Chen Y.,
clustering systems. “Genetic Algorithms for Automated Tuning of
Fuzzy Controllers: A Train Handling
Corresponding Author Application” Proc. Fifth IEEE International
Gursewak S. Brar Conference on Fuzzy Systems (FUZZ-
Dept. of Electrical Engineering IEEE'96), New Orleans, pp. 675-680, 1996.
Baba Banda Singh Bahadur Engineering
College, Fatehgarh Sahib, India. [8] Peter Zoltan Baranyi, L. T. Koczy , T. D.
Gedeon, “Improved Fuzzy and Neural
REFERENCES Network Algorithms for word frequency
[1] Bentley, J. L., Friedman, J. H., “Fast Prediction in Document Filtering,”Journal of
algorithms for constructing minimal spanning Advanced Computational Intelligence, vol. 2,
trees in coordinate spaces.” IEEE Trans. on No. 3., pp.88-95,1998.
Computer. C-27, 6 (June), pp: 97–105, 1978. [9] Jain A.K., Murty M.N., Flynn P.J. Data
[2] R L Cannon, J V Dave, J C Bezdek, “Efficient Clustering: A Review, ACM Computing
implementation of the fuzzy c-means Surveys, Vol. 31, No. 3, pp: 264-323,
clustering algorithms Source,” IEEE September 1999.
Transactions on Pattern Analysis and Machine [10] Mohamed Marzouk,Osama Moselhi, “On the
Intelligence. Vol. 8, issue 2, pp. 248 - 255, use of fuzzy clustering in construction
1986. simulation, “Proceedings of the 33rd
[3] Sutton R. S., “Learning to predict by the conference on Winter simulation, Arlington,
methods of temporal differences,” Journal of Virginia. IEEE Computer Society Washington,
Mach. Learn., vol. 3, no. 1, pp. 9–44, 1988. DC, USA. pp.1547 - 1555, 2001.
[4] Bar-dossy A., Duckstein L., “Fuzzy Rule - [11] Joy K.V., “Advantages of Reciprocating
Based Modeling with Applications to Compressor” by ISHREE, Bombay Chapter.
Geophysical” Biological and Engineering Journal of Air Conditioning and Refrigeration,
Systems CRC Press, 1995. Vol. –7, No.-1, pp:147-156 Jan-March, 2004.
[5] M. Delgado , A. Gomez-Skarmeta , M.A. Vila, [12] Gaithersburg MD, Beall K. A., “Performance
“Hierarchical Clustering to validate Fuzzy Characteristic of Refrigeration flow
Clustering. Fuzzy Systems,” IEEE fourth Compressor for natural gas compressor
International Conference on Fuzzy Systems application”. Journal of Energy Resources
and second International Fuzzy Engineering Technology Vol.-127, Issue-1, pp. 7-14,
Symposium., Proceedings of IEEE March 2005.
International Conference, vol.4., pp.1807-
1812, 1995.
[6] F. Klawonn, Member,R. Kruse, “Automatic
Generation of Fuzzy Controllers by Fuzzy
Clustering,” Fuzzy Systems, Proceedings of
136