Haut PDF Spectral redemption in clustering sparse networks

Spectral redemption in clustering sparse networks

Spectral redemption in clustering sparse networks

Finally we turn towards real networks to illustrate the advantages of spectral clustering based on the non- backtracking matrix in practical applications. In Fig. 6 we show B’s spectrum for several networks commonly used as benchmarks for community detection. In each case we plot a circle whose radius is the square root of the largest eigenvalue. Even though these networks were not generated by the stochastic block model, these spectra look quali- tatively similar to the picture discussed above (Fig. 2). This leads to several very convenient properties. For each of these networks we observed that only the eigenvectors with real eigenvalues are correlated to the group assignment given by the ground truth. Moreover, the real eigenvalues that lie outside of the circle are clearly identifiable. This is very unlike the situation for the operators used in standard spectral clustering algorithms, where one must decide which eigenvalues are in the bulk and which are outside.
En savoir plus

12 En savoir plus

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

all networks). One can see that OCCAM is slightly more prone to overestimate the amount of overlap compared to SAAC. We then try SAAC on co-authorship networks built from DBLP in the following way. Nodes corre- spond to authors and we fix as ground-truth communities some conferences (or group of conferences): an author belongs to some community if she/he has published at least one paper in the corresponding conference(s). We then build the network of authors by putting an edge between authors if they have published a paper together in one of the considered conferences. We present results for some confer- ences with machine learning in their scopes : ICML, NIPS, and two theory-oriented conferences that we group together, ALT and COLT. We compare the three spectral algorithms in terms of estimation error and false positive / false negative rates. Results are presented in Table 7, in which the estimated amount of overlap ˆ c = ∑ i,k Z ˆ i,k / n is also reported. In this case, SAAC and OCCAM significantly outperform SC, although the error is relatively high. The amount of overlap is under-estimated by both algorithms, but SAAC appears to recover slightly more overlapping nodes. The difficulty of recovering communities in that case may come from the fact that the networks constructed are very sparse.
En savoir plus

33 En savoir plus

Clustering from sparse pairwise measurements

Clustering from sparse pairwise measurements

where ∂i denotes the set of neighbors of node i in the graph G, and w is defined in (8). A simple computation, analogous to [9], allows to show that (λ ≥ 1, v) is an eigenpair of B, if and only H(λ)v = 0. This property justifies the following picture [17]. For x large enough, H(x) is positive definite and has no negative eigenvalue. As we decrease x, H(x) gains a new negative eigenvalue whenever x becomes smaller than an eigenvalue of B. Finally, at x = 1, there is a one to one correspondence between the negative eigenvalues of H(x) and the real eigenvalues of B that are larger than 1. We call Bethe Hessian the matrix H(1), and propose the following spectral algorithm, by analogy with Sec. II-B. First, compute all the negative eigenvalues of H(1). Let r be their number. If r = 0, raise an error. Otherwise, denoting v i , ..., v r ∈ R n the
En savoir plus

6 En savoir plus

Spectral Detection on Sparse Hypergraphs

Spectral Detection on Sparse Hypergraphs

4 Institut de Physique Th´eorique, CEA Saclay, 91191 Gif-sur-Yvette, France These problems have been well studied in the case of graphs with simple edges between couples of vertices. How- ever, many networks have a different structure, and the relationships between vertex-variables are not established in couples but in k-uplets, with k > 2. An exemple is given, for instance, by the network of scientific collaborations, of skype conference calls, email exchanges or recommendation systems where we can associate a user with a specific content and a rating. Translating the hypergraph into pairwise interaction would inevitably lead to a loss of information, and therefore some effort has been made to generalize spectral methods to multi-body interactions [12], [13], [14].
En savoir plus

9 En savoir plus

Robust spectral clustering using LASSO regularization

Robust spectral clustering using LASSO regularization

Keywords: Spectral clustering, community detec- tion, eigenvectors basis, ` 1 -penalty. 1 Introduction Graphs play a central role in complex systems as they can conveniently model interactions be- tween the variables of a system. Finding variable sets with similar attributes can then help under- standing the mechanisms underlying a complex system. Graphs are commonly used in a wide range of applications, ranging from Mathematics (graph theory) to Physics [12], Social Networks [10], Informatics [27] or Biology [14, 23]. For instance, in genetics, groups of genes with high interactions are likely to be involved in a same function that drives a specific biological process.
En savoir plus

16 En savoir plus

Accelerating consensus by spectral clustering and polynomial filters

Accelerating consensus by spectral clustering and polynomial filters

I. C ONSENSUS ACCELERATION S INCE their introduction in [1], (discrete-time) consensus algorithms have attracted almost as much attention as their dual, fast mixing Markov chains [2], [3]. Improving the convergence speed of this basic building block for e.g., distributed computation [4], [5], Kalman filtering [6], [7] or control of distributed systems [8]–[10] has been a major focus, whose results cannot be comprehensively reviewed here. A few existing approaches are mentioned below, after introducing some basic definitions to facilitate an explicit discussion. For synchronized fixed networks, some particular acceleration methods include: optimizing the weights on the links [2], [11], adding local memory [12], or introducing time-varying filters [13], [14]. The purpose of the present paper is to study where and how the polynomial filter [13] can be helpful, in particular in the novel context of combining it with optimization of link weights. The analysis focuses on spectral properties, to presumably facilitate integration of the insights into more general linear dynamical networks, and show explicit connections to spectral graph theory as treated in e.g., [15]. Before detailing the related state of knowledge as well as our contributions, let us introduce the basic setting.
En savoir plus

11 En savoir plus

l1-spectral clustering algorithm: a robust spectral clustering using Lasso regularization

l1-spectral clustering algorithm: a robust spectral clustering using Lasso regularization

Graphs play a central role in complex systems as they can model interactions between variables of the system. They are commonly used in a wide range of applications, from social sciences (e.g. social networks (Handcock and Gile, 2010)) to technologies (e.g. telecommunications (Smith, 1997), wireless sensor networks (Akyildiz et al., 2002)) or biology (gene regulatory networks (Davidson and Levin, 2005), metabolic networks (Jeong et al., 2000)). One of the most relevant features when analyzing graphs is the identification of their underlying structures, such as cluster structures, generally defined as connected subsets of nodes that are more densely connected to each other than to the rest of the graph. These clusters can provide an invaluable help in understanding and visualizing the functional components of the whole graph (Girvan and Newman,
En savoir plus

24 En savoir plus

Efficient Eigen-updating for Spectral Graph Clustering

Efficient Eigen-updating for Spectral Graph Clustering

Abstract Partitioning a graph into groups of vertices such that those within each group are more densely connected than vertices assigned to different groups, known as graph clustering, is often used to gain insight into the or- ganisation of large scale networks and for visualisation purposes. Whereas a large number of dedicated techniques have been recently proposed for static graphs, the design of on-line graph clustering methods tailored for evolving networks is a challenging problem, and much less documented in the literature. Motivated by the broad variety of applications concerned, ranging from the study of biological networks to the analysis of networks of scientific references through the exploration of communications networks such as the World Wide Web, it is the main purpose of this paper to introduce a novel, computationally efficient, approach to graph clustering in the evolutionary context. Namely, the method promoted in this article can be viewed as an incremental eigenvalue solution for the spectral clus- tering method described by Ng. et al. (2001). The incremental eigenvalue solution is a general technique for finding the approximate eigenvectors of a symmetric matrix given a change. As well as outlining the approach in detail, we present a theoretical bound on the quality of the approximate eigenvectors using perturbation theory. We then derive a novel spectral clustering algorithm called Incremental Approximate Spectral Clustering (IASC). The IASC algorithm is simple to implement and its efficacy is demonstrated on both synthetic and real datasets modelling the evolution of a HIV epidemic, a citation network and the purchase history graph of an e-commerce website.
En savoir plus

28 En savoir plus

Incremental Spectral Clustering with the Normalised Laplacian

Incremental Spectral Clustering with the Normalised Laplacian

Abstract Partitioning a graph into groups of vertices such that those within each group are more densely connected than vertices assigned to different groups, known as graph clustering , is often used to gain insight into the organization of large scale networks and for visualization purposes. Whereas a large number of dedicated techniques have been recently proposed for static graphs, the design of on-line graph clustering methods tailored for evolving networks is a challenging problem, and much less documented in the literature. Motivated by the broad variety of applications concerned, ranging from the study of biological networks to graphs of scientific references through to the exploration of communications networks such as the World Wide Web, it is the main purpose of this paper to introduce a novel, computationally efficient, approach to graph clustering in the evolutionary context. Namely, the method promoted in this article is an incremental eigenvalue solution for the spectral clustering method described by Ng. et al. (2001). Be- yond a precise description of its practical implementation and an evaluation of its complexity, its performance is illustrated through numerical experiments, based on datasets modelling the evolution of a HIV epidemic and the purchase history graph of an e-commerce website.
En savoir plus

7 En savoir plus

Overlapping clustering methods for networks

Overlapping clustering methods for networks

Calculate betweenness scores for all edges; Remove the edge with the highest score; until No edges remain ; Rather than building the complete dendrogram (with edge removals) and then choosing the optimal division using the modularity criterion, [45] sug- gested to focus directly on the optimization of the modularity. Thus, he pro- posed an algorithm which falls in the general category of agglomerative hi- erarchical clustering methods [24, 53]. Starting with a configuration in which each vertex is the sole member of one of N communities, the communities are iteratively joined together in pairs, choosing at each step the join that results in the greatest increase (or smallest decrease) in mod (1.1). Again, this leads to a dendrogram for which the best cut is chosen by looking for the maximal value of the modularity. The computational cost of the entire algorithm is in O ((m + N )N ), or O(N 3 ) for dense networks and O(N 2 ) for sparse networks.
En savoir plus

26 En savoir plus

Community detection in sparse networks via Grothendieck's inequality

Community detection in sparse networks via Grothendieck's inequality

the (random) program ( 1.2 ), which we can rewrite as follows: maximize hA, xx T i subject to x ∈ {−1, 1} n . (2.1) Note that if we maximized hA, xx T i over the Euclidean ball B(0, √ n), then the problem would be simple – the solution x would be the eigenvector corresponding to the eigenvalue of A of largest magnitude. This simpler problem underlies the most basic algorithm for community detection called spectral clustering, where the communities are recovered based on the signs of an eigenvector of the adjacency matrix (going back to [ 39 , 14 , 50 ], see [ 64 ]). The optimization problem ( 2.1 ) is harder and more subtle; the replacement of the Euclidean ball by the cube introduces a strong restriction on the coordinates of x. This restruction rules out localized solutions x where most of the mass of x is concentrated on a small fraction of coordinates. Since eigenvectors of sparse matrices tend to be localized (see [ 15 ]), basic spectral clustering is often unsuccessful for sparse networks.
En savoir plus

24 En savoir plus

Modularity-based Sparse Soft Graph Clustering

Modularity-based Sparse Soft Graph Clustering

parameter to optimize. In [27], Nicosia et al. use a genetic algorithm to optimize a more general relaxation of the modularity problem. In [12], Griechisch et al. do not directly study the optimization of the relax- ation problem, but they rely on an external quadratic solver. Finally, in [14], Havens et al. use a spectral approach that does not directly solve the relaxation of the modularity problem, but where modularity is used as a selection criterion. The main limitation of these methods lies in the maximum number of clusters K that must be specified. We could get around this issue by taking large values for K, but all the approaches cited above do not scale well to large K. Indeed, the solutions p ∈ R n×K found by these methods are dense matrices. In other words, the number of parameters to store in memory and to optimize is in O(nK), which quickly becomes prohibitive for large values of K. For instance, in the approach of [4], the matrix p is the dens- est possible matrix, i.e. all its coefficients are positive. In the approach of [27], the genetic algorithm starts with dense random matrices of R n×K , and its hybrida-
En savoir plus

11 En savoir plus

Power Spectral Clustering

Power Spectral Clustering

Université Paris-Est, LIGM, Equipe A3SI, ESIEE, France. E-mail: laurent.najman@esiee.fr 1 Introduction Spectral clustering has been widely popular due to its usage in image segmentation [32]. It plays an impor- tant role in globalizing local information in the recent state-of-the-art method for segmentation - multiscale combinatorial grouping [30]. Although convolution neu- ral networks form the current state-of-the-art for image segmentation [26], this can be attributed to the avail- ability of huge labelled datasets. There exists domains where data is not easy to obtain, such as hyperspec- tral image datasets, where unsupervised techniques can be very useful. In methods such as those described in [35], even after using convolution neural nets, spectral clustering is used as the last step for segmentation.
En savoir plus

20 En savoir plus

Spectral inference methods on sparse graphs : theory and applications

Spectral inference methods on sparse graphs : theory and applications

1 M A R K O V R A N D O M F I E L D S Graphical models are a powerful paradigm for multivariate statis- tical modeling. They allow to encode information about the condi- tional dependencies of a large number of interacting variables in a compact way, and provide a unified view of inference and learning problems in areas as diverse as statistical physics, computer vision, coding theory or machine learning (see e. g. [ 86 , 151 ] for reviews of applications). In this first chapter, we motivate and introduce in sec- tion 1.1 the formalism of undirected graphical models, often called Markov random fields ( MRFs ). We then focus in section 1.3 on the particular case of pairwise MRFs which will be particularly important in the following. The analyses of this dissertation will apply to mod- els drawn from certain random graph ensembles, and section 1.4 is devoted to a review of some of their basic properties. Computing the marginals of pairwise MRFs is computationally hard, and we will
En savoir plus

256 En savoir plus

Word sense discrimination in information retrieval: a spectral clustering-based approach

Word sense discrimination in information retrieval: a spectral clustering-based approach

a b s t r a c t Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved doc- ument list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries.
En savoir plus

18 En savoir plus

Clustering behaviors in networks of integrate-and-fire oscillators

Clustering behaviors in networks of integrate-and-fire oscillators

To measure the level of clustering in the population, we computed the fraction of “traveling oscillators.” In a popula- tion of identical oscillators, each oscillator is trapped in one of the N g clusters and converges to a phase-locked configu- ration. In a phase-locked configuration, each oscillator re- turns to a fixed position every N g firings. This clustering behavior is expected to persist in a population of nonidenti- cal oscillators, at least if the frequency distribution is suffi- ciently narrow 共or if the coupling strength is strong enough兲. The clustering configuration is clearly visible in Fig. 3, which nevertheless presents a situation of significant hetero- geneity. Each cluster spreads over a finite range due to the oscillators discrepancies. This “snapshot” clustering configu- ration at one time instant does not preclude oscillator ex- changes between the clusters as time evolves.
En savoir plus

8 En savoir plus

Optimal Laplacian Regularization for Sparse Spectral Community Detection

Optimal Laplacian Regularization for Sparse Spectral Community Detection

Our findings notably shed light on the connection between the two benchmark approaches to community detection in sparse net- works, provided for one by the statistics community and for the other by the physics community; these approaches have so far have been treated independently. We strongly suggest that bridging both sets of results has the capability to improve state-of-the-art knowledge of machine learning algorithms in sparse conditions (for which a direct application of standard algorithms is often inappropriate). Similar outcomes could arise for instance in KNN-based kernel learning or for any algorithm involving numerous data which, for computational reasons, imposes a sparsification of the information matrices.
En savoir plus

6 En savoir plus

3D+t segmentation of PET images using spectral clustering

3D+t segmentation of PET images using spectral clustering

were obtained with AD-KSC compared to other methods, its use for clinical applications is still constrained to 2D+t PET data due to the computation complexity of the dominant eigenvectors. However, it is important to consider the entire 3D sequence, and segment it to similar functional volume. This improves the statistical robustness compared to 2D segmentation to obtain a smooth representation of each region. To make it applicable in 3D, a preprocessing step reducing the size of the data clustered is applied to PET data. Several authors proposed approaches to deal with large datasets. Chaoji et al. [6] have proposed a method that handles full- dimensional, arbitrary shaped clusters. Their SPARCL method consists first on running a carefully initialized version of the K-means algorithm to generate many small seed clusters then iteratively merges the generated clusters. Guo et al. [7] presented a method that combines a pre-clustering process using a histogram based thresholding with a hierarchical cluster analysis. They have extended their method to make it adapted for 3D PET data. They integrated a clustering slice by slice with a hierarchical clustering technique. In this paper, we propose a method ADKSC-3D, inspired from Guo et al. [7], Chaoji et al. [6] and Zbib et al. [3] in which a preprocessing step using a principal component analysis and a clustering with the Global K-means approach is applied slice by slice on the initial TACs. As a result, many small seed clusters are generated. Then AD-KSC-3D is applied on the reduced data to obtain the final partition. To validate our approach, GATE Monte Carlo simulations of the Zubal head phantom were performed. The AD-KSC-3D was evaluated on this simulated phantom and favorably compared to the K-means approach.
En savoir plus

5 En savoir plus

Multiple change points detection and clustering in dynamic networks

Multiple change points detection and clustering in dynamic networks

In this paper, we extend the work of Matias et al (2015) to simultaneously uncover clusters of nodes sharing connection profiles, and to look for adjacent time intervals on which the connectivity patterns between pairs of clusters are stationary. In practice, considering dynamic interactions over a continuous time interval, we assume the intensity functions of the NHPPP to depend on the hidden node clusters and to be piecewise constant. Moreover, they are assumed to share D − 1 common discontinuities whose location and number are unknown. These discontinuities induce a segmentation of the entire time interval over which the interactions are observed. In order to perform inference, a VEM algorithm is de- rived. We show that the V-M step can be tackled relying on a multiple changepoint detection tecnique for univariate time series, the pruned exact linear time (PELT) (Killick et al, 2012) method, which we adapted to our framework. Finally, the number of clusters of vertices is estimated using a Bayesian information criterion (BIC) involving variational approximations.
En savoir plus

30 En savoir plus

Networks clustering with bee colony

Networks clustering with bee colony

We use the modularity function to measure the strength of the community structure found by our method, which gives us an objective metric for choosing the number of communities (cluste[r]

1 En savoir plus

Show all 10000 documents...