original Affinity Propagation algorithm and Vectorspace model. Section 3 describes the main idea anddetails of our proposed algorithms. Section 4 discussesthe experimental results and evaluations. Section 5provides the concluding remarks and future directions.
Before going into details of our proposed “K-Means based on Heterogeneous Transfer learning” and “Af-finity Clustering Based on Heterogeneous TransferLearning” algorithms, some works that are closely re-lated to this paper are briefly reviewed. TransfersLearning, K-Means clustering algorithm, affinity prop-agation algorithm and vector space model will be dis-cussed.
2.1 Transfer Learning
Machine Learning methods work well only undercommon assumption, the training and test data fromsame features space and same distribution. When dis-tributions changes, most statistical models need to berebuilt from scratch using newly collected trainingdata. In many real world applications, it is expensiveor impossible to re-collect the needed training data andrebuild the model. It would be nice to reduce the needand effort to re-collect training data. In such casesKnowledge transfer or Transfer Learning  betweentasks domain would be desirable. Transfer learninghas following three main research issues (1) What totransfer (2) How to transfer (3) When to transfer. Inthe inductive transfer learning setting, the target task isdifferent form source task, no matter source and targetdomain is the same or not. In the transductive transferlearning, the source and target tasks are same, whilesource and target domain are different. In the unsu-pervised transfer learning setting, similar to inductivetransfer learning setting, target task is different butrelated to source tasks. In the heterogeneous transferlearning, transfer the knowledge across domain or taskthat has different feature space.
2.2 K-Means Clustering Algorithm
K-Means algorithm is one of the best known andmost popular clustering algorithms K-Means seeksoptimal partition of the data by minimizing the sum of squared error criterion, with an iterative optimizationprocedure. The K-Mean clustering procedure is as fol-lowing.1.
Initialize a K - partition randomly or based onsome prior knowledge. Calculate the clusterprototype matrix M = [ m
, … , m
Assign each object in the data set to the near-est cluster
Recalculate the cluster prototype matrix basedon the current partition,4.
Repeat steps 2 and 3 until there is no changefor each cluster.Major Problem with this algorithm is that it is sensi-tive to selection of initial partition.
2.3 Affinity Clustering Algorithm
Affinity clustering algorithm  is based onmessage passing among data points. Each data pointreceives the availability from others data points (fromexemplar) and sends the responsibility message to oth-ers data points (to exemplar). Sum of responsibilitiesand availabilities for data points identify the exem-plars. After the identification of exemplar the datapoints are assigned to exemplar to form the clusters.Following are the steps of affinity clustering algo-rithms.
Initialize the availabilities to zero
Update the responsibilities by followingequation.
is the similarity of data point
Update the availabilities by following equa-tion
(3)Update self-availability by following equa-tion
Compute sum =
ሺ,ሻ ሺ , ሻ
for datapoint i and find the value of k that maximizethe sum to identify the exemplars.5.
If Exemplars do not change for fixed numberof iterations go to step (6) else go to Step (1)6.
Assign the data points to Exemplars on the basis of maximum similarity to find clusters.
2.4 Vector Space Model
Vector space model  uses to represent the text doc-uments. In VSD each document
is considered as avector in the M-dimensional term (word) space. In thealgorithm the
weighing scheme is used. In VSDmodel each document represented by following equa-tion.
Ԧൌ ሼሺ1,ሻ,ሺ2,ሻ,…. ሺ , ሻሽ
(5)Where, N is the number of terms (words) in thedocument. And
ሺ,ሻൌሺ1 logሺ ,ሻlogሺ1 / ሺ ሻሻ
ሺ , ሻ
term in the docu-ment
) is the number of document containing
term . Inverse document frequency (
) is de-fined as the logarithm of the ratio of number of docu-ments (N) to the number of document containing thegiven word (
LUSTERING BASED ON
In this section, two algorithm of clustering based onheterogeneous transfer learning are proposed. First is
JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG32© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617