Professional Documents
Culture Documents
Ijmemr V1i4 009
Ijmemr V1i4 009
Abstract—Cluster analysis divides data into groups) from different feature spaces. It is the
groups (clusters) that are meaningful, useful, result of integration of multiple types of
or both. If meaningful groups are the goal, measurements on observations from different
then the clusters should capture the natural perspectives and different types of
structure of the data. In some cases, however, measurements can be considered as different
cluster analysis is only a useful starting point views. For example, the variables of the
for other purposes, such as data nucleated blood cell data [1] Clustering for
summarization. Whether for understanding Understanding Classes, or conceptually
or utility, cluster analysis has long played meaningful groups of objects that share
an important role in a wide variety of fields: common characteristics, play an important
psychology and other social sciences, role in how people analyze and describe the
biology, statistics, pattern recognition, world. Indeed, human beings are skilled at
information retrieval, machine learning, and dividing objects into groups (clustering) and
data mining. Both view weights and variable assigning particular objects to these groups.
weights are used in the distance function to For example, even relatively young children can
determine the clusters of objects. In the new quickly label the objects in a photograph as
algorithm, two additional steps are added to buildings, vehicles, people, animals, plants,
the iterative-means clustering process to etc. In the context of understanding data,
automatically compute the view weights and clusters are potential classes and cluster
the variable weights. We used two real-life analysis is the study of techniques for
data sets to investigate the properties of two automatically finding classes. The following
types of weights in TW-k-means and are some examples:
investigated the difference between the
weights of TW-k-means and the weights of the 2. INFORMATION RETRIEVAL
individual variable weighting method.
The World Wide Web consists of
Keywords: —Data mining, clustering, billions of Web pages, and the results of a
MultiView learning, k-means, variable query to a search engine can return
weighting thousands of pages. Clustering can be used to
group these search results into a small number
1. INTRODUCTION: of clusters, each of which captures a particular
aspect of the query. For instance, a query of
Multiview data are instances that have ―movie‖ might return Web pages grouped into
multiple views (representations/variable categories such as reviews, trailers, stars, and
International Journal of Modern Engineering & Management Research | Vol 1 | Issue 4 | Dec. 2013 60
Study of Multiview data using Automated Two-Level Variable Weighting Clustering Algorithm
Author (s) : Piyush Tiwari, Prof. Hitesh Gupta | Patel Engg. College, Bhopal
theaters. Each category (cluster) can be broken controversy about social class. In this paper we
into subcategories (sub-clusters), producing a analyse data from the US Survey of Consumer
hierarchical structure that further assists a user‘s Finances (SCF), for which the more appropriate
exploration of the query results. Clustering for term is ‗socio-economic stratification‘. Apart
Utility Cluster analysis provides an from computing and interpreting a clustering,
abstraction from individual data objects to the we address whether the clustering captures
clusters in which those data objects reside. significant structure beyond decomposing
Additionally, some clustering techniques dependence structures between the indicators,
characterize each cluster in terms of a cluster which indicators are most influential for
prototype. stratification and whether strata derived from
the data are related to occupation categories.
This is guided by a general ‗philosophy of
clustering‘, which involves considerations of
3. BACKGROUND
how to define the clustering problem of interest,
how to understand and ‗tune‘ the various The concept of social class is central to
available clustering approaches and how to social science research, either as a subject in
choose between them. All this should be driven itself or as an explanatory basis for social,
by the way that the subject matter researchers behavioural and health outcomes. The study of
connect the aim of clustering and their social class has a long history, from the social
interpretation of concepts like ‗similarity‘ and investigation by the classical social thinker
‗belonging together in the same class‘ to the Marx to today‘s on-going academic interest in
choice and formal handling of the indicators issues of social class and stratification for both
involved. The main contribution of this paper is research and teaching [4]. Researchers in
to show in detail how this can be done and what various social sciences use social class and
it entails, exemplary for the socio-economic social stratification as explanatory variables to
stratification application but including aspects, study a wide range of outcomes from health and
particularly in Section 6, that are relevant for mortality [5] to cultural consumption [6]
other clustering applications as well and hardly
mentioned in the literature. The present When social scientists employ social class
approach brings statistical and sociological (or or stratification as an explanatory variable, they
general subject matter) knowledge closer follow either or both of two common practices,
together by focusing on the interface, the namely using one or more indicators of social
‗translation task‘ between statistical stratification such as education and income and
methodology and sociological background. Two using some version of occupation class, which
quite different approaches are compared, is often aggregated or grouped into a small
namely a model-based clustering approach [2] number of occupation categories. For example,
in which different clusters are modelled by [7] compared health outcomes and mortality
underlying latent classes or mixture between white-collar and blue-collar workers[8]
components, and a dissimilarity-based analysed the effects of social stratification on
partitioning approach that is not based on cultural consumption with a variety of variables
probability models [3] with some methods to representing stratification, including education,
estimate the number of clusters. Such data income, How to Find an Appropriate
typically arise in social stratification and Clustering 311 occupation classification and
generally in social science. Social stratification social status (operationalized by them). The
is about partitioning a population into several reason why researchers routinely use some
different social classes. Although the concept of indicators of social class is that there is no
social class is central to social science research, agreed definition of social class, let alone a
there is no agreed definition of a social class. It specific agreed operationalization of it. Various
is of interest here whether social stratification concepts of social class are present in the
based on formal clustering can contribute to the sociological literature, including a ‗classless‘
society [9], society with a gradational structure
International Journal of Modern Engineering & Management Research | Vol 1 | Issue 4 | Dec. 2013 61
Study of Multiview data using Automated Two-Level Variable Weighting Clustering Algorithm
Author (s) : Piyush Tiwari, Prof. Hitesh Gupta | Patel Engg. College, Bhopal
[10] and a society in which discrete classes mutant experiment. In the following, we use the
(interpreted in various, but usually not data- two real-life data sets to investigate the
based, ways) are an unmistakable social reality properties of two types of weights in TW-k-
[11] The question to be addressed by cluster means.
analysis is to underst and ‗social stratification‘
through data, i.e. to explore what kind of Controlling Weight Distributions
unsupervised classification(s) based on the
social (or socio-economic) indicators they yield. From figure 2a, we can see that when ŋ
We acknowledge that data alone cannot decide was small, the variances of V were unstable
the issue objectively. Some questions to address with the increase of k. When ŋ was large, the
are whether the data are meaningfully clustered variances of V became almost constant. From
at all, which indicators contribute most to a Figure.2b, we can see ŋ has similar behavior.
clustering and to what extent clusters are While some different types of behaviors can be
aligned with features that have been connected observed in the following figures that is Figure
to social stratification in the literature, here 2 c, d, e and f as shown in pictorial
particularly occupation. representation.
4. CONCLUSION
First, the original data set is represented
using a smaller set of prototype vectors, which
Figure 3: Comparison of the total variable weights in
allows efficient use of clustering algorithms to
TW-k-means and EW-8-means 8and EW-8 means on
the Water Treatment Plant data set. divide the prototypes into groups. There
duction of the computational cost is especially
Scalability Comparison important for hierarchical algorithms allowing
clusters of arbitrary size and shape The
We used all five real-life data sets to
experiments also revealed the convergence
compare the scalability of TW-k-means with
property of the view weights in TW-k-means.
the other five clustering algorithms. We compared TW-k-means with five
clustering algorithms on three real-life data
sets and the results have shown that the TW-k-
means algorithm significantly outperformed
the other five clustering algorithms in four
evaluation indices. As such, it is a new variable
weighting method for clustering of multiview
data.
REFERENCES:
[1] J. Mui and K. Fu, ―Automated
Classification of Nucleated Blood
Cells Using a Binary Tree Classifier,
International Journal of Modern Engineering & Management Research | Vol 1 | Issue 4 | Dec. 2013 63
Study of Multiview data using Automated Two-Level Variable Weighting Clustering Algorithm
Author (s) : Piyush Tiwari, Prof. Hitesh Gupta | Patel Engg. College, Bhopal
International Journal of Modern Engineering & Management Research | Vol 1 | Issue 4 | Dec. 2013 65