• Aucun résultat trouvé

Summary of Graph-Based Clustering Algorithms

In this chapter, two graph-based clustering approaches were presented.

• The best-known graph-theoretic divisive clustering algorithm deletes the longest edges of minimal spanning tree (MST) of the data [30]. We introduced a new splitting criterion to improve the performance of clustering methods based on this principle. The proposed splitting criterion is based on the calculation of the hypervolume of the clusters. From the suggested splitting criterion follows that the resulted clusters can be easily approximated by multivariate Gaussian func-tions. Result of the cutting of the MST based on the combined cutting criteria (classical cutting and the proposed cutting) can be effectively used for the ini-tialization of Gaussian mixture model-based clustering algorithms. The resulted Hybrid MST-GG clustering algorithm combines the graph-theoretic and the parti-tional model based clustering. The approach is demonstrated through some sets of tailored data and through the well-known Iris benchmark classification problem.

The results showed the advantages of the hybridization of the hierarchial graph-theoretic and partitional model based clustering algorithm. It has been shown that:

(1) the chaining problem of the classical MST based clustering has been solved;

(2) the initialization of the Gath-Geva clustering algorithms has been properly handled, and (3) the resulting clusters are easily interpretable with the compact parametric description of the multivariate Gaussian clusters (fuzzy covariance matrices).

• The similarity of objects can be calculated based on neighborhood relations. The fuzzy neighborhood similarity measure introduced in this chapter extends the similarity measure of the Jarvis-Patrick algorithm in two ways: (i) it takes into account the far neighbors partway and (ii) it fuzzifies the crisp decision criterion of the Jarvis-Patrick algorithm.

The fuzzy neighborhood similarity measure is based on the common neighbors of the objects, but differently from the Jarvis-Patrick algorithm it is not restricted to the direct neighbors. While the fuzzy neighborhood similarity measure describes the similarities of the objects, the fuzzy neighborhood distance measure characterizes the dissimilarities of the data. The values of the fuzzy neighborhood distances are easily computable from the fuzzy neighborhood similarities. The application possi-bilities of the fuzzy neighborhood similarity and distance measures are widespread.

All methods that work on distance or similarity measures can also be based on the fuzzy neighborhood similarity/distance measures. This chapter introduced the appli-cation possibilities of the fuzzy neighborhood similarity and distance measures in hierarchical clustering and in VAT representation. It was demonstrated through appli-cation examples that clustering methods based on the fuzzy neighborhood

similar-ity/distance measure can discover clusters with arbitrary shapes, sizes, and densities.

Furthermore, the fuzzy neighborhood similarity/distance measure is able to identify outliers, as well.

References

1. Anders, K.H.: A hierarchical graph-clustering approach to find groups of objects. In: Pro-ceedings 5’th ICA workshop on progress in automated map generalization, IGN, pp. 28–30 (2003)

2. Augustson, J.G., Minker, J.: An analysis of some graph theoretical clustering techniques. J.

ACM 17, 571–588 (1970)

3. Backer, F.B., Hubert, L.J.: A graph-theoretic approach to goodness-of-fit in complete-link hierarchical clustering. J. Am. Stat. Assoc. 71, 870–878 (1976)

4. Barrow, J.D., Bhavsar, S.P., Sonoda, D.H.: Minimal spanning trees, filaments and galaxy clus-tering. Mon. Not. R. Astron. Soc. 216, 17–35 (1985)

5. Bezdek, J.C., Clarke, L.P., Silbiger, M.L., Arrington, J.A., Bensaid, A.M., Hall, L.O., Murtagh, R.F.: Validity-guided (re)clustering with applications to image segmentation. IEEE Trans.

Fuzzy Syst. 4, 112–123 (1996)

6. Bezdek, J., Pal, N.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. 28, 301–315 (1998)

7. Doman, T.N., Cibulskis, J.M., Cibulskis, M.J., McCray, P.D., Spangler, D.P.: Algorithm5: a technique for fuzzy similarity clustering of chemical inventories. J. Chem. Inf. Comput. Sci.

36, 1195–1204 (1996)

8. Dunn, C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974) 9. Eppstein, D., Paterson, M.S., Yao, F.F.: On nearest-neighbor graphs. Discrete Comput. Geom.

17, 263–282 (1997)

10. Forina, M., Oliveros, C., Concepción, M., Casolino, C., Casale, M.: Minimum spanning tree:

ordering edges to identify clustering structure. Anal. Chim. Acta 515, 43–53 (2004) 11. Fortune, S.: Voronoi diagrams and delaunay triangulations. In: Du, D.-Z., Hwang, F.K. (eds.),

Computing in Euclidean Geometry, pp. 193–223. World Scientific, Singapore (1992) 12. Gabriel, K., Sokal, R.: A new statistical approach to geographic variation analysis. Syst. Zool.

18, 259–278 (1969)

13. Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach.

Intell. 11, 773–781 (1989)

14. Gonzáles-Barrios, J.M., Quiroz, A.J.: A clustering procedure based on the comparsion between the k nearest neighbors graph and the minimal spanning tree. Stat. Probab. Lett. 62, 23–34 (2003)

15. Gower, J.C., Ross, G.J.S.: Minimal spanning trees and single linkage cluster analysis. Appl.

Stat. 18, 54–64 (1969)

16. Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes.

In: Proceedings of the 15th international conference on data engeneering, pp. 512–521 (1999) 17. Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Vis. Lang.

Comput. 17, 225–253 (2006)

18. Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. Proc. IEEE 80(9), 1502–1517 (1992)

19. Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neigh-bors. IEEE Trans. Comput. C22, 1025–1034 (1973)

20. Karypis, G., Han, E.-H., Kumar, V.: Chameleon: hierarchical clustering using dynamic mod-eling. IEEE Comput. 32(8), 68–75 (1999)

21. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem.

Proc. Am. Math. Soc. 7(1), 48–50 (1956)

22. Päivinen, N.: Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recog. Lett. 26, 921–930 (2005)

23. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 1389–1401 (1957)

24. Raghavan, V.V., Yu, C.T.: A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 3, 393–402 (1980)

25. Toussaint, G.T.: The relative neighborhood graph of a finite planar set. Pattern Recogn. 12, 261–268 (1980)

26. Varma, S., Simon, R.: Iterative class discovery and feature selection using Minimal Spanning Trees. BMC Bioinform. 5, 126–134 (2004)

27. Vathy-Fogarassy, A., Kiss, A., Abonyi, J.: Hybrid minimal spanning tree and mixture of Gaus-sians based clustering algorithm. In: Lecture Notes in Computer Science: Foundations of Infor-mation and Knowledge Systems vol. 3861, pp. 313–330. Springer, Heidelberg (2006) 28. Vathy-Fogarassy, A., Kiss, A., Abonyi, J.: Improvement of Jarvis-Patrick clustering based on

fuzzy similarity. In: Masulli, F., Mitra, S., Pasi, G. (eds.) Applications of Fuzzy Sets Theory, LNCS, vol. 4578, pp. 195–202. Springer, Heidelberg (2007)

29. Yao, A.: On constructing minimum spanning trees in k-dimensional spaces and related prob-lems. SIAM J. Comput. 11, 721–736 (1892)

30. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans.

Comput. C20, 68–86 (1971)

Chapter 3

Graph-Based Visualisation of High Dimensional Data

Abstract In this chapter we give an overview of classical dimensionality reduction and graph based visualisation methods that are able to uncover hidden structure of high dimensional data and visualise it in a low-dimensional vector space.