Professional Documents
Culture Documents
Definition
- Clustering is a task which aims to « finding of natural groups from a
data set, when little or nothing is known about the category
structure. » M.R. Anderberg, 1973
- « Cluster Analysis divides data into groups that are meaningful,
useful or both » P.N. Tan et al, 2005
Does samples
Similarity? form a group
Hidden !!?
structure? …..? Decision?
Clusters? …..?
4
1. Context & Issues
2. State of Arts
3. Fuzzy Set Theory a. Data Clustering b. DBSCAN c. FN-DBSCAN d. Actual Challenges
Imprecision! Avoid
mistakes
Perform !!?
Accuracy! Uncertainty!
Clustering?! …..?
…..?
6
1. Context & Issues
2. State of Arts
3. AF-DBSCAN a. Data Clustering b. DBSCAN c. FN-DBSCAN d. Actual Challenges
Data Clustering:
A powerful tool to
identify the natural
grouping in data. Fuzzy Clustering
- Soft Clustering.
- Grouping objects by defining a membership degree to each
of the clusters using a membership function .
- Each data element has a membership degree to each of the
clusters that is between 1 (full-membership) and 0 (non-
membership).
- Fuzzy Clustering methods: a group of algorithms for
clustering analysis in which the data elements are distributed
to the clusters in a not clear way.
i.e. elements can belong to one or more clusters. 8
1. Context & Issues
2. State of Arts
3. AF-DBSCAN a. Data Clustering b. DBSCAN c. FN-DBSCAN d. Actual Challenges
ԑ
p1
p2
ԑ
Input parameters
Eps1 Eps2
ԑ
The minimal threshold p The minimal set
of neighborhood
cardinality.
membership degree.
The two parameters define together the desired density characteristics of the generated clusters10
1. Context & Issues
2. State of Arts
3. AF-DBSCAN a. Data Clustering b. DBSCAN c. FN-DBSCAN d. Actual Challenges
Eps1 Eps2
!!? !!?
Clusters’ Data set’s
density?! …..? Scale!
…..?
11
1. Context & Issues
2. State of Arts
3. AF-DBSCAN a. Data Clustering b. DBSCAN c. FN-DBSCAN d. Actual Challenges
A. Smiti and Z. Elouedi. - Combine Gaussian-Means (GM) and DBSCAN - GM provides circular cluster
Paper[SE12] algorithm. shapes.
- Not strong against noise.
E. Nejad et al. - Eps is estimated based on the noise ration of the - High dependence on the minPts’
Paper[EHY10] data and minPts. value.
M.N. Gaonkar and K. - k-neighbors plot is drawn for given k entered by - Semi-supervised method.
the user and then the Eps which corresponds to
Sawant. the knee is determined.
Paper[GS13]
14
1. Context & Issues
2. State of Arts
3. AF-DBSCAN General Solution to resolve the parameters’ complexity problem Estimating Eps Estimating minPts
Method Description
M. Ester et al. - minPts = 4
Paper[EKSX96]
Some methods are more relevant than other considering the clustering
quality, the run-time complexity, the input parameters complexity, etc.
It would rather determine the parameter values according to the data set
characteristics than taking fixed values. 15
2. State of Art
3. AF-DBSCAN
4. Experiment results a. Overview b. Data normalization c. Estimating input parameters d. Clustering Results
16
2. State of Art
3. AF-DBSCAN
4. Experiment results a. Overview b. Data normalization c. Estimating input parameters d. Clustering Results
- Min-Max method:
- Let D a k-dimensions data set:
- Where
- In our case, the data range is [0,1], so the formula is simplified to:
17
2. State of Art
3. AF-DBSCAN
4. Experiment results a. Overview b. Data normalization c. Estimating input parameters d. Clustering Results
- calculation of Eps1:
Where
18
2. State of Art
3. AF-DBSCAN
4. Experiment results a. Overview b. Data normalization c. Estimating input parameters d. Clustering Results
- calculation of Eps:
• Determine k-neighbors values of all points in the data
1 set;
- Calculation of Eps2:
Where and
20
2. State of Art
3. AF-DBSCAN
4. Experiment results a. Overview b. Data normalization c. Estimating input parameters d. Clustering Task
21
3. AF-DBSCAN
4. Experiment results
5. Conclusion a. Datasets b. Clustering accuracy c. Run time
o eT(Error Rate)
o Run Time 22
3. AF-DBSCAN
4. Experiment results a. Datasets b. Clustering accuracy c. Run time
5. Conclusion
23
3. AF-DBSCAN
4. Experiment results
5. Conclusion a. Datasets b. Clustering accuracy c. Run time
26
« Mostof the fundamental ideas of science are
essentially simple, and may, as a rule, be expressed in a
language comprehensible to everyone. »
If you can't explain it simply, you don't understand it well enough.
I. Leopold , A. Einstein
The Evolution of Physics (1938)
Questions ?
27
2. DBSCAN
3. Fuzzy Set Theory a. Definition b. Example
4. Fuzzy Clustering
U= [0,100]
u: Age
A: fuzzy subset of U labeled old defined by a membership
function such as:
If u=60 Return
30