Clustering gene expression data

© All Rights Reserved

35 views

Clustering gene expression data

© All Rights Reserved

- China Mono Sodium Glutamate Market Profile --- Aarkstore Enterprise
- A Novel Skin Lesion Detection Approach Using Neutrosophic Clustering and Adaptive Region Growing in Dermoscopy Images
- stats216_hw4.pdf
- 2008 - Evolution of Interest Rate Curve - Empirical Analysis of Patterns Using Nonlinear Clustering Tools
- Fuzzy Lecturer
- Assignment BRM
- PLANTS LEAVES IMAGES SEGMENTATION BASED ON PSEUDO ZERNIKE MOMENTS
- Multivariate Analytics AspenTech.pdf
- The Use of Mindful Safety Practices at Norwergian Petroleum Installations
- IJETTCS-2013-04-18-120
- J. Basic. Appl. Sci. Res., 3(4)694-703, 2013 (2)
- consumer insight on beverage industry
- Data Analysis and Research Methodology
- Doreian, Batagelj, Ferligoj - Generalized Blockmodeling
- Essential Excel
- Cito Report05 3
- Gavrila 99 Visual
- Validity Survey
- SCProj.pdf
- A Clustering Algorithm for Classification of Network Traffic using Semi Supervised Data

You are on page 1of 12

http://compdiag.molgen.mpg.de/ngfn/pma2005mar.shtml

Abstract. This is the tutorial for the clustering exercises on day 2 of the course on Practical DNA Mi-

croarray Analysis. Topics will be hierarchical clustering, k-means clustering, partitioning around medoids,

selecting the number of clusters, reliability of results, identifying relevant subclusters, and pitfalls of clus-

tering.

We consider expression data from patients with acute lymphoblastic leukemia (ALL) that were investigated

using HGU95AV2 Affymetrix GeneChip arrays [Chiaretti et al., 2004]. The data were normalized using

quantile normalization and expression estimates were computed using RMA. The preprocessed version of

the data is available in the Bioconductor data package ALL from http://www.bioconductor.org. Type

in the following code chunks to reproduce our analysis. At each > a new command starts, the +

indicates that one command spans over more than one line.

> library(ALL)

> data(ALL)

> help(ALL)

> cl <- substr(as.character(ALL$BT), 1, 1)

> dat <- exprs(ALL)

The most pronounced distinction in the data is between B- and T-cells. You find these true labels in the

vector cl. In this session we try to rediscover the labels in an unsupervised fashion. The data consist of

128 samples with 12625 genes.

The first step in hierarchical clustering is to compute a matrix containing all distances between the objects

that are to be clustered. For the ALL data set, the distance matrix is of size 128 128. From this we

compute a dendrogram using complete linkage hierarchical clustering:

> image(as.matrix(d))

> hc <- hclust(d, method = "complete")

> plot(hc, labels = cl)

We now split the dendrogram into two clusters (using the function cutree) and compare the resulting

clusters with the true classes. We first compute a contingency table and then apply an independence test.

> table(groups, cl)

1

Cluster Dendrogram

120

100

80

Height

T

B B

B

60

T

T

B

B

T

B B

B

B

B

B

BB

B B

B

T

T

B

B

BB

B B

B

B

B

B

B B

B

B

B

B

T

T

B

B B

B B

B

T

T

T

B

TT

B

B

B

B

B

B

B

B

T

T

40

T

B

B

BB

B B

T B

T T

B

B

B

B

T

TT

BB

T

T

BB

B B

B B

B

B

T

T

T

B

B

B

B

T

B

T

T

B

B

T

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

T

T

d

hclust (*, "complete")

Figure 1: Dendrogram of complete linkage hierarchical clustering. The two cell types are not well separated.

cl

groups B T

1 73 31

2 22 2

[1] 0.03718791

The null-hypothesis of the Fisher test is the true labels and the cluster results are independent. The

p-value shows that we can reject this hypothesis at a significance level of 5%. But this is just statistics:

What is your impression of the quality of this clustering? Did we already succeed in reconstructing the

true classes?

Exercise: In the lecture you learned that hierarchical clustering is an agglomerative bottom-up process of

iteratively joining single samples and clusters. Plot hc again, using as parameter labels=1:128. Now,

read the help text for hclust: hc$merge describes the merging of clusters. Read the details carefully.

Use the information in hc$merge and the cluster plot to explain the clustering process and to retrace the

algorithm.

Exercise: Cluster the data again using single linkage and average linkage. Do the dendrogram plots

change?

The dendrogram plot suggests that the clustering can still be improved. And the p-value is not that low.

To improve our results, we should try to avoid using genes that contribute just noise and no information.

A simple approach is to exclude all genes that show no variance across all samples. Then we repeat the

analysis from the last section.

> genes.var.select <- order(genes.var, decreasing = T)[1:100]

> dat.s <- dat[genes.var.select, ]

> d.s <- dist(t(dat.s))

> hc.s <- hclust(d.s, method = "complete")

> plot(hc.s, labels = cl)

Cluster Dendrogram

35

30

25

Height

20

B B

T

15

B B

B

B B

B B

B B

B

BB

B B

B

B B

B

BB

T

B

B

BB

10

B B

T

T

B B

T T

BB

BB

B

B

T

B B

B

B B

TT

T

BB

B

TB

B

BB

B

B

B

T

TT

TT

T

BB

T

T

T

B

B B

B

B

T

B

B

T T

B

B

B

B

B

T

TT

B

B

B

B

B

B

B

B

B

T T

T

B

B

B

B

B

B

B

5

B

B

T

B

B

B

B

T

T

d.s

hclust (*, "complete")

Figure 2: Dendrogram of hierarchical clustering on 100 genes with highest variance across samples. The

two classes are clearly separated.

Plot the distance matrix d.s. Compare it to d. Then split the tree again and analyze the result by a

contingency table and an independence test:

> table(groups.s, cl)

cl

groups.s B T

1 95 0

2 0 33

[1] 2.326082e-31

It seems that reducing the number of genes was a good idea. But where does the effect come from: Is it

just dimension reduction (from 12625 to 100) or do high-variance genes carry more information than other

genes? We investigate by comparing our results to a clustering on 100 randomly selected genes:

> dat.r <- dat[genes.random.select, ]

> d.r <- dist(t(dat.r))

> image(as.matrix(d.r))

> hc.r <- hclust(d.r, method = "complete")

> plot(hc.r, labels = cl)

> groups.r <- cutree(hc.r, k = 2)

> table(groups.r, cl)

> fisher.test(groups.r, cl)$p.value

1.0

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.0

0.0

0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure 3: Distance matrices for 128 samples, using all genes (left), 100 random genes (middle), and 100

high-variance genes (right).

Plot the distance matrix d.r and compare it against the other two distance matrices d and d.s. Repeat

the random selection of genes a few times. How do the dendrogram, the distance matrix, and the p-value

change? How do you interpret the result?

Finally, we compare the results on high-variance and on random genes in a cluster plot in two dimensions:

> library(cluster)

> par(mfrow = c(2, 1))

> clusplot(t(dat.r), groups.r, main = "100 random genes", color = T)

> clusplot(t(dat.s), groups.s, main = "100 high-variance genes", color = T)

Component 2

15

15 10 5 0 5 10 15

Component 1

These two components explain 32.75 % of the point variability.

Component 2

10 5 0 5

Component 1

These two components explain 44.08 % of the point variability.

Figure 4: You see the projection of the data onto the plane spanned by the first two principal components

(= the directions in the data which explain the most variance). The group labels result from hierarchical

clustering on 100 genes. In the upper plot the genes are selected randomly, in the lower plot according

to highest variance across samples. The red and blue ellipses indicate class boundaries. Do we need both

principal components to separate the data using high-variance genes?

3 Partitioning methods

In hierarchical methods, the samples are arranged in a hierarchy according to a dissimilarity measure.

Extracting clusters means cutting the tree-like hierarchy at a certain level. Increasing the number of

clusters is just subdividing existing clusters clusters are nested.

In partitioning methods this is different. Before the clustering process you decide on a fixed number k

of clusters, samples are then assigned to the cluster where they best fit in. This is more flexible than

hierarchical clustering, but leaves you with the problem of selecting k. Increasing the number of clusters is

not subdivision of existing clusters but can result in a totally new allocation of samples. We discuss two

examples of partitioning methods: K-means clustering and PAM (Partitioning Around Medoids).

3.1 k-means

First look at help(kmeans) to become familiar with the function. Set the number of clusters to k=2.

Then perform k-means clustering for all samples and all genes. k-means uses a random starting solution

and thus different runs can lead to different results. Run k-means 10 times and use the result with smallest

within-cluster variance.

> k <- 2

> withinss <- Inf

> for (i in 1:10) {

+ kmeans.run <- kmeans(t(dat), k)

+ print(sum(kmeans.run$withinss))

+ print(table(kmeans.run$cluster, cl))

+ cat("----\n")

+ if (sum(kmeans.run$withinss) < withinss) {

+ result <- kmeans.run

+ withinss <- sum(result$withinss)

+ }

+ }

> table(result$cluster, cl)

The last result is the statistically best out of 10 tries but does it reflect biology? Now do k-means again

using the 100 top-variance genes. Compare the results.

> table(kmeans.s$cluster, cl)

As explained in todays lecture, PAM is a generalization of k-means. Look at the help page of the R-function.

Then apply pam for clustering the samples using all genes:

> table(result$clustering, cl)

Now use k=2:50 top variance genes for clustering and calculate the number of misclassifications in each

step. Plot the number of genes versus the number of misclassifications. What is the minimal number of

genes needed for obtaining 0 misclassifications?

> ngenes <- 2:50

> o <- order(genes.var, decreasing = T)

> miscl <- NULL

> for (k in ngenes) {

+ dat.s2 <- dat[o[1:k], ]

+ pam.result <- pam(t(dat.s2), k = 2)

+ ct <- table(pam.result$clustering, cl)

+ miscl[k] <- min(ct[1, 2] + ct[2, 1], ct[1, 1] + ct[2, 2])

+ }

> xl = "# genes"

> yl = "# misclassification"

> plot(ngenes, miscl[ngenes], type = "l", xlab = xl, ylab = yl)

Both kmeans and pam assume that the number of clusters is known. The methods will produce a result no

matter what value of k you choose. But how can we decide, which choice is a good one?

Partitioning methods try to optimize an objective function. The objective function of k-means is the sum

of all within-cluster sum of squares. Run k-means with k=2:20 clusters and 100 top variance genes. For

k=1 calculate the total sum of squares. Plot the obtained values of the objective function.

> withinss <- numeric()

> withinss[1] <- totalsum

> for (k in 2:20) {

+ withinss[k] <- sum(kmeans(t(dat.s), k)$withinss)

+ }

> plot(1:20, withinss, xlab = "# clusters", ylab = "objective function", type = "b")

Why is the objective function not necessarily decreasing? Why is k=2 a good choice?

First read the Details section of help(silhouette) for a definition of the silhouette score. Then compute

silhouette values for PAM clustering results with k=2:20 clusters. Plot the silhouette widths. Choose an

optimal k according to the maximal average silhouette width. Compare silhouette plots for k=2 and k=3.

Why is k=2 optimal? Which observations are misclassified? Which cluster is more compact?

> for (k in 2:20) {

+ asw[k] <- pam(t(dat.s), k)$silinfo$avg.width

+ }

> plot(1:20, asw, xlab = "# clusters", ylab = "average silhouette width", type = "b")

> plot(silhouette(pam(t(dat.s), 2)))

Silhouette plot of pam(x = t(dat.s), k = 2) Silhouette plot of pam(x = t(dat.s), k = 3)

n = 128 2 clusters Cj n = 128 3 clusters Cj

j : nj | aveiCj si j : nj | aveiCj si

1 : 48 | 0.13

1 : 95 | 0.28

2 : 47 | 0.06

2 : 33 | 0.41 3 : 33 | 0.38

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Silhouette width si Silhouette width si

Average silhouette width : 0.32 Average silhouette width : 0.17

Randomly permute class labels 20 times. For k=2, calculate the average silhouette width (asw) for pam

clustering results with true labels and for all random labelings. Generate a box-plot of asw values obtained

from random labelings. Compare it with the asw value obtained from true labels.

> cl.true <- as.numeric(cl == "B")

> asw <- mean(silhouette(cl.true, d.s)[, 3])

> for (sim in 1:20) {

+ cl.random = sample(cl.true)

+ asw.random[sim] = mean(silhouette(cl.random, d.s)[, 3])

+ }

> symbols(1, asw, circles = 0.01, ylim = c(-0.1, 0.4), inches = F, bg = "red")

> text(1.2, asw, "Average silhouette value\n for true labels")

> boxplot(data.frame(asw.random), col = "blue", add = T)

You will see: The average silhouette value of the pam clustering result lies well outside the range of values

achieved by random clusters.

6 Clustering of genes

Cluster the 100 top variance genes with hierarchical clustering. You can get the gene names from the

annotation package hgu95av2.

> library(hgu95av2)

> gene.names <- sapply(geneNames(ALL), get, hgu95av2SYMBOL)

> gene.names <- gene.names[genes.var.select]

> d.g = dist(dat.s)

> hc = hclust(d.g, method = "complete")

> plot(hc, labels = gene.names)

Cluster Dendrogram

100

80

60

40

Height

20

IGLL1

CTGF

HLADQB1

PCDH9

SH3BP1

PTPRM

IGJ

C1orf29

HLADPB1

PROM1

IMP3

ITGA6

NR4A2

IL8

NPY

MAF

XIST

LILRB1

CRIP1

ID3

F13A1

TCL1A

C5orf13

HOXA9

ITM2A

AKR1C3

SH2D1A

FLT3

BCL11A

TUBB

PRDX2

ALOX5

GSN

POU2AF1

SEMA6A

CNN3

DNTT

ERG

CDW52

SH3BP5

MAFF

NR4A1

RGS2

CEBPB

CD9

CCND2

DUSP6

CMAH

HBA1

HBB

0

SLC2A5

HLADRB4

TCF8

ID2

HBG1

HBD

BTEB1

LRIG1

IRF4

IL8

CCL3

CSDA

CD3D

TCF7

IGHM

PDE4B

DDX3Y

RPS4Y1

SOCS2

LDLR

CD24

BLNK

SCHIP1

GNAI1

LCK

ARL7

LILRA2

FHL1

FOXO1A

CRIM1

HLADRA

TRD@

TRDD3

CD19

HLADRB1

CD79B

HLADMA

IGHM

IGHM

CD74

HLADPB1

HLADPA1

SMAD1

SMAD1

HLADQB1

HLADQB1

IGLJ3

d.g

hclust (*, "complete")

Now plot the average silhouette widths for k=2:20 clusters using pam and have a look at the silhouette

plot for k=2. What do you think: Did we find a structure in the genes?

> for (k in 2:20) {

+ asw[k] = pam(dat.s, k)$silinfo$avg.width

+ }

> plot(1:20, asw, xlab = "# clusters", ylab = "average silhouette width", type = "b")

> plot(silhouette(pam(dat.s, 2)))

n = 100 2 clusters Cj

0.24

j : nj | aveiCj si

0.22

average silhouette width

1 : 57 | 0.22

0.20

0.18

0.16

2 : 43 | 0.27

0.14

Silhouette width si

# clusters

Average silhouette width : 0.24

Figure 7: Left: Average silhouette width of the 100 top-variance genes for k=2:20 clusters. Right:

Silhouette plot for k=2.

7 Presenting results

In the last section we saw that a substructure in the genes is not easy to find. When interpreting heatmap

plots like the following, it is important to keep in mind: Any clustering method will eventually produce

a clustering; before submitting a manuscript to Nature you should check the validity and stability of the

groups you found.

> help(heatmap)

> heatmap(dat.s)

36773_f_at

36878_f_at

41723_s_at

39839_at

34033_s_at

32542_at

38994_at

40202_at

41356_at

1065_at

40570_at

40936_at

33705_at

32035_at

34362_at

41266_at

37344_at

37988_at

1096_g_at

41166_at

39318_at

33273_f_at

33274_f_at

41215_s_at

33439_at

38096_f_at

38968_at

34210_at

37043_at

38514_at

1369_s_at

36103_at

38354_at

37701_at

36711_at

280_g_at

33412_at

32794_g_at

39710_at

33232_at

35926_s_at

38585_at

33516_at

38242_at

266_s_at

39389_at

38604_at

307_at

36239_at

38052_at

36108_at

36638_at

41193_at

36650_at

34168_at

914_g_at

37280_at

1325_at

37006_at

37625_at

34800_at

36275_at

40953_at

36536_at

33809_at

32612_at

37623_at

35372_r_at

36927_at

41470_at

37558_at

37809_at

38147_at

41504_s_at

38446_at

995_g_at

1110_at

38917_at

38319_at

32649_at

37399_at

40775_at

41468_at

32855_at

39317_at

33238_at

39829_at

296_at

39729_at

39878_at

41214_at

38355_at

38095_i_at

38833_at

35016_at

37039_at

41164_at

41165_g_at

31687_f_at

31525_s_at

19017

17003

LAL4

18001

19014

20005

02020

43015

28008

31015

10005

11002

28009

01007

04018

15006

24006

09002

16007

16002

64005

43006

12008

83001

26009

65003

56007

19008

01003

44001

49004

37001

19002

04016

28007

24022

03002

36002

09017

27004

49006

62001

43004

20002

12012

64001

65005

28036

84004

26003

62002

15001

24008

26005

26001

08024

48001

12019

25003

11005

01005

24011

43007

04007

31011

12007

22011

24017

14016

37013

22013

68003

24010

12006

43001

08001

04006

26008

28032

16004

15004

19005

24005

28028

31007

63001

57001

24019

64002

36001

08018

28003

LAL5

22010

12026

06002

04008

16009

68001

25006

22009

24018

04010

28021

24001

30001

28035

28024

27003

28037

28006

28001

28043

28031

33005

28042

43012

28023

28047

08012

08011

28019

01010

28044

28005

62003

15005

09008

Figure 8: Heatmap representation of all samples and 100 selected high-variance genes. Rows and columns

are hierarchically clustered.

For publication of your results it is advantageous to show nice looking cluster plots supporting your claims.

Here we demonstrate how to produce almost any clustering you like in a situation with few samples but

many genes.

First reduce the data set to the first 5 B-cell and T-cell samples. Then permute class labels randomly. The

permuted labels do not correspond to biological entities. Nevertheless we will demonstrate how to get a

perfect clustering result.

The strategy is easy: Just select 100 genes according to differential expression between the groups defined

by the random labels, then perform hierarchical clustering of samples with these genes only.

> dat.small = dat[, select]

> cl.random = sample(cl[select])

> library(multtest)

> tstat = mt.teststat(dat.small, classlabel = (cl.random == "B"), test = "t")

> genes = order(abs(tstat), decreasing = T)[1:100]

> dat.split = dat.small[genes, ]

> d = dist(t(dat.split))

> hc = hclust(d, method = "complete")

> par(mfrow = c(1, 2))

> plot(hc, labels = cl.random, main = "Labels used for gene selection", xlab = "")

> plot(hc, labels = cl[select], main = "True labels", xlab = "")

Repeat this experiment a few times and compare the two plots. You should also have a look at the heatmap

representation of dat.split.

7

7

6

6

5

5

Height

Height

4

4

3

3

T

T

T

T

2

B

2

B

B

B

T

hclust (*, "complete") hclust (*, "complete")

Figure 9: Selecting only highly differential genes leads to well separated groups of samples even if these

groups are defined by random assignments of labels without any biological meaning.

9 Ranking is no clustering!

A typical situation in microarray analysis is the following. You are interested in one specific gene and want

to identify genes with a similar function. A reasonable solution to this problem is to calculate the distance

of all other genes to the gene of interest and then analyze the list of the closest (top-ranking) genes.

We choose the fifth gene out of our 100 high variance genes and print the distances and the names of the

10 genes that are closest to it.

> o = order(list.distances)

> list.distances[o][1:10]

> gene.names[o][1:10]

You can see that the original gene of interest is ranked first with a distance 0 to itself, and the second and

the seventh gene are replicates of this gene. This is a plausible result.

Another popular approach in this situation is to first cluster all genes and then analyze the genes that are

in the same cluster as the original gene of interest. In general this strategy is not suitable since even two

objects that are very close to each other can belong to two different clusters if they are close to a cluster

boundary line.

In our example, look at the result of the hierarchical clustering of genes obtained above. Define three

clusters based on this result and print the clusters of the 10 closest genes identified above.

> cutree(hc, 3)[o][1:10]

To which clusters do the original gene of interest and the replicates belong? You can see that even on this

low clustering level the replicates are separated. Thus remember: If you are interested in detecting genes

that are similar to a gene of your choice, you should always prefer a ranked list to a clustering approach.

10 ISIS: Identifying splits with clear separation

At the end of this exercise we present a method, which addresses two problems of clustering samples:

1. Only a subset of genes may be important to distinguish one group of samples from another.

2. Different sets of genes may determine different splits of the sample set.

The method ISIS (von Heydebreck et al., 2001) searches for binary class distinctions in the set of samples

that show clear separation in the expression levels of specific subsets of genes. Several mutually independent

class distinctions may be found, which is difficult to obtain from the most commonly used clustering

algorithms. Each class distinction can be biologically interpreted in terms of its supporting genes.

> library(isis)

> help(isis)

Read the help text and find out about the parameters of the function isis. Now we use it on the 1000

top variance genes:

> splits <- isis(dat.isis)

The matrix splits contains in each row a binary class separation of samples and in the last column the

corresponding score.

Exercise: Use table() to compare the found splits with the vector cl. You will see: One of the high-

scoring splits perfectly recovers the difference between B- and T-cells.

Exercise: Find out what the parameter p.off does. Run isis again with p.off=10. How does the result

change?

11 Summary

Scan through this paper once again and identify in each section the main message you have learned. Some

of the most important points are:

You learned about hierarchical clustering, k-means, partitioning around medoids, and silhouette scores.

Selecting only highly differential genes leads to well separated groups of samples even if these

groups are defined by random assignments of labels without any biological meaning.

If you have any comments, criticisms or suggestions on our lectures, please contact us:

rahnenfj@mpi-sb.mpg.de and florian.markowetz@molgen.mpg.de

12 References

Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R. Gene expression profile

of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to

therapy and survival. Blood. 2004 Apr 1;103(7):2771-8.

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer, 2001.

von Heydebreck A, Huber W, Poustka A, Vingron M. Identifying splits with clear separation: a new class

discovery method for gene expression data. Bioinformatics 2001 17: 107-114S.

- China Mono Sodium Glutamate Market Profile --- Aarkstore EnterpriseUploaded byaarkstor
- A Novel Skin Lesion Detection Approach Using Neutrosophic Clustering and Adaptive Region Growing in Dermoscopy ImagesUploaded byAnonymous 0U9j6BLllB
- stats216_hw4.pdfUploaded byAlex Nutkiewicz
- 2008 - Evolution of Interest Rate Curve - Empirical Analysis of Patterns Using Nonlinear Clustering ToolsUploaded byFranck Dernoncourt
- Assignment BRMUploaded byowaishazara
- Fuzzy LecturerUploaded byVivek Kumar
- PLANTS LEAVES IMAGES SEGMENTATION BASED ON PSEUDO ZERNIKE MOMENTSUploaded byAnonymous 0U9j6BLllB
- Multivariate Analytics AspenTech.pdfUploaded byDiego IQ
- The Use of Mindful Safety Practices at Norwergian Petroleum InstallationsUploaded byAntonio Fernando Navarro
- J. Basic. Appl. Sci. Res., 3(4)694-703, 2013 (2)Uploaded byashish88bhardwaj_314
- consumer insight on beverage industryUploaded byHemant Kumar
- Essential ExcelUploaded byRan Hasan
- IJETTCS-2013-04-18-120Uploaded byAnonymous vQrJlEN
- Doreian, Batagelj, Ferligoj - Generalized BlockmodelingUploaded bylarsrordam
- Cito Report05 3Uploaded byIzwan Tidak Pilus
- Validity SurveyUploaded byadminsvth
- Data Analysis and Research MethodologyUploaded bySOURASHTRA
- Gavrila 99 VisualUploaded bymido_u3
- SCProj.pdfUploaded byjyoti prakash sahoo
- A Clustering Algorithm for Classification of Network Traffic using Semi Supervised DataUploaded byIJSTE
- China Wine Market ReportUploaded byDrink Sector
- 1-s2.0-S0169260714002521-mainUploaded byAdemar Alves Trindade
- 07 Itng Hussain GaUploaded bypreetvibe
- Diskusi13-Clustreing Retailers by Store Space RequirementUploaded byNiki Paikito
- new product - sosUploaded byapi-288000527
- New Distance Measure of Single-Valued Neutrosophic Sets and Its ApplicationUploaded byMia Amalia
- Kritika GoyalUploaded byKirtika Goyal
- Black Belt Tollgate ChecklistUploaded byJorge Octavio Hurtado González
- Practicum Data AnalysisUploaded byksu10
- week1Uploaded byBalachandra Mallya

- Major Mark Cunningham - Intro to Hypnosis - Workbook (Pyscho-Linguistics) OCRUploaded byHerta 'Tzu' Dan
- Hypnosis Darrell Bross - Hypnosis and mind control.docUploaded bynkrich
- lecture - Diagnostics and model checking for logistic regression.pdfUploaded byThyago
- 2214-1-2195-1-10-20180523.pdfUploaded byThyago
- (The 99U Book Series) Jocelyn K. Glei, 99U-Manage Your Day-to-Day_ Build Your Routine, Find Your Focus, and Sharpen Your Creative Mind-Amazon Publishing (2013).epubUploaded byThyago
- A Peer Review Process GuideUploaded byThyago
- EMBL-EBI Train online - - .pdfUploaded byThyago
- Quackenbush 2006Uploaded byThyago
- Real time PCR new approachesUploaded byThyago
- 2173.fullUploaded byThyago
- PublicationUploaded byThyago
- Guides.github.com Hello WorldUploaded byThyago
- E1MBA4_2015_v25n2_255Uploaded byThyago
- P-Values Are Just the Tip of the IcebergUploaded byThyago
- List of Mathematical SymbolsUploaded byThyago
- dhaeseleer2005Uploaded byThyago
- Makany Et Al 2008Uploaded byThyago
- Circulating MicroRNAs as Blood-based Markers for Patients With Primary and Metastatic Breast CancerUploaded byThyago
- Interferon and Biologic Signatures in DermatomyosiUploaded byThyago
- CalibracionEspectrofotometro_24858Uploaded byEddie Mendoza
- 10 Things Every Molecular Biologist Should KnowUploaded byThyago
- PCA in RUploaded byAvdesh Kothari
- David J. Glass-Experimental Design for Biologists -Cold Spring Harbor Laboratory Press (2006)Uploaded byThyago
- Cox Review Host-pathogen TBUploaded byThyago

- TIBCO Spotfire Online Training institutesUploaded byMindMajix Technologies
- Clustering SlidesUploaded byNguyen_Anh_Tu
- GUC ArtSapience2014Uploaded byJhilver Quispe Barraza
- Apache MahoutUploaded byAmol Jagtap
- Ethical SegmentationUploaded byАндрей Кучинский
- Importance–Performance Analysis by Fuzzy C-Means AlgorithmUploaded byAhmad Rifa'i
- Image Segmentation of Cows using Thresholding and K-Means MethodUploaded byIjaems Journal
- CS 2032 — DATA WAREHOUSING AND DATA MINING.pdfUploaded byvelkarthi92
- MoyesUploaded byGustavo Lucero
- fhhUploaded byEmmadi Sirisha
- Spatial_Data_Mining theory.pdfUploaded byVaibhav Baluni
- 09_30_Alok_Sairam.pdfUploaded bySukma Nata Permana
- 1.1.3.pdfUploaded byMohamed Samir
- LOG-BACKUP UTILITY: A SECURE LOG MINING SYSTEMUploaded byIJIERT-International Journal of Innovations in Engineering Research and Technology
- Influence of Geomarketing on the Rollout OfUploaded bySajjadTaherzadeh
- vibration dataUploaded bytumballisto
- PaperUploaded byM Hudya Ramadhana
- Robust Data Clustering by Learning Multi-metric Lq-norm DistancesUploaded bykalokos
- ciml-v0_99-ch03Uploaded bypokospikos
- Effect of K-Means Clustering Method in Guitar Election Decision Support System_I Gusti Ngurah Agung Yogha Pramana S.komUploaded byIndra Wiguna
- Improvement-of-Feature-Extraction-on-Segmentation-Performance-of-the-whirlwind-cloud-based-on-the-DBSCAN-Clustering-algorithm (1).docxUploaded byarvita
- A Hybrid Particle Swarm Optimization Algorithm to Human Skin Region DetectionUploaded byJournal of Computing
- ch7Uploaded byد. حامد الباتع
- App Store Cluster AnalysisUploaded byAfnan Al-Subaihin
- SeminaarArman2.docxUploaded byavicii hardwell
- Visual Enhancement of Clustering Analysis - Unsupervised Machine LearningUploaded byWR Salas
- IW, Princeton, PaperUploaded byClément Canonne
- statistical_toolbox_manual.pdfUploaded byLyly Magnan
- SPE-137184-MSUploaded byJosé Timaná