Organisation - Optimization of Perron eigenvectors and applications: from web ranking to chrono

pre de Floquet de la dynamique de population, avec la contrainte que le taux de croissance de la population de cellules saines reste au dessus d’un seuil de toxicité donné. Le but est de trouver des programmes de chimiothérapie périodiques qui sont viables sur le long terme et efficaces dans la lutte contre le cancer. Quand nous discrétisons le problème, les valeurs propres de Floquet sont approchées par les valeurs propres de Perron de matrices positives creuses. Nous avons mis en place une méthode des multiplicateurs pour l’optimisation locale des taux de croissance, qui profite d’une propriété de petit rang du gradient de la valeur propre de Perron. L’algorithme d’optimisation de la valeur propre s’appuie sur l’algorithme développé dans le chapitre 7. Nous avons calculé le gradient de la fonction objectif en jeu. Nous avons implémenté la méthode des multiplicateurs pour résoudre le problème, où les problèmes d’optimisation non contraints internes sont résolus par l’algorithme couplant les itérations puissance et gradient.

1.8 Organisation

Dans le chapitre 2, nous présentons les résultats antérieurs pour l’optimisation du PageRank et quelques algorithmes de classement de pages utilisant la structure d’hyperliens du web.

Dans le chapitre 3, nous donnons des résultats nouveaux sur la résolution effective des problèmes de décision markoviens bien décrits avec un nombre d’actions qui peut être expo- nentiel ou des espaces d’actions convexes. Le théorème 3.1 a été publié dans [FABG13].

Dans le chapitre 4, nous montrons que le problème d’optimisation du PageRank peut être résolu en temps polynomial en le réduisant à un problème de coût moyen espéré en horizon infini sur un processus de décision markovien bien décrit. Nous donnons un algorithme très efficace pour résoudre le problème d’optimisation : nous montrons qu’optimiser le PageRank n’est pas fondamentalement plus difficile que le calculer. Ensuite, nous traitons des problèmes avec des contraintes qui couplent le comportement de plusieurs pages. Ce chapitre suit les lignes de [FABG13].

Dans le chapitre 5, nous nous appuyons sur nos résultats sur l’optimisation du PageRank pour développer un nouvel algorithme de classement appelé MaxRank fait pour combattre le spam de liens.

Dans le chapitre 6, nous étudions la convergence de l’algorithme HOTS de Tomlin. Ces résultats ont été soumis dans [Fer12a].

Dans le chapitre 7, nous étudions les problèmes d’optimisation de la valeur propre et du vecteur propre de Perron. Nous donnons un algorithme efficace pour le calcul de la matrice des dérivées partielles du critère, qui utilise la propriété de petit rang de cette matrice. Nous donnons un algorithme qui passe à l’échelle qui couple les itérations gradient et puissance et donne un minimum local du problème d’optimisation du vecteur de Perron. Nous prouvons la convergence en le considérant comme une méthode de gradient approché. Nous appliquons ensuite ces résultats à l’optimisation de HITS de Kleinberg et HOTS de Tomlin. Ces résultats ont été soumis dans [Fer12b].

Dans le chapitre 8, nous présentons une autre application de l’optimisation de la valeur propre de Perron à la chimiothérapie. Ce travail a été publié dans [BCF+_{11a, BCF}+_11b,

CHAPTER

2

Web ranking and (nonlinear) Perron

Frobenius theory

2.1 Google’s PageRank

One of the main ranking methods relies on the PageRank introduced by Brin and Page [BP98]. It is defined as the invariant measure of a walk made by a random surfer on the web graph. When reading a given page, the surfer either selects a link from the current page (with a uniform probability), and moves to the page pointed by that link, or interrupts his current search, and then moves to an arbitrary page, which is selected according to given “zapping” probabilities. The rank of a page is defined as its frequency of visit by the random surfer. It is interpreted as the “popularity” of the page.

The PageRank has motivated a number of works, dealing in particular with computational issues. Classically, the PageRank vector is computed by the power algorithm [BP98]. There has been a considerable work on designing new, more efficient approaches for its computation [Ber05, LM06]: Gauss-Seidel method [ANTT02], aggregation/disaggregation [LM06] or distributed randomized algorithms [NP09, IT10, ITB12]. Other active fields are the develop- ment of new ranking algorithms [BRR05] or the study of the web graph [BL04].

We recall here the basic elements of the Google PageRank computation. We call web graph the directed graph with a node per web page and an arc from page i to page j if page i contains a hyperlink to page j. We identify the set of pages to [n] :=_{{1, . . . , n}.}

34 Chapter 2. Web ranking and (nonlinear) Perron Frobenius theory all i∈ [n], meaning that every page has at least one outlink. Then, we construct the n × n stochastic matrix S, which is such that

Si,j =

(

N_i−1 if page j is pointed to from page i

0 otherwise (2.1)

This is the transition matrix of a Markov chain modeling the behavior of a surfer choosing a link at random, uniformly among the ones included in the current page and moving to the page pointed by this link. The matrix S only depends of the web graph.

We also fix a row vector z ∈ Rn

+, the zapping or teleportation vector, which must be

stochastic (so, P

j∈[n]zj = 1), together with a damping factor α∈ [0, 1] and define the new

stochastic matrix

P = αS + (1− α)ez (2.2)

where e is the (column) vector in Rnwith all entries equal to 1.

Consider now a Markov chain (Xt)t≥0 with transition matrix P , so that for all i, j ∈ [n],

P(Xt+1 = j|Xt= i) = Pi,j. Then, Xt represents the position of a websurfer at time t: when

at page i, the websurfer continues his current exploration of the web with probability α and moves to the next page by following the links included in page i, as above, or with probability 1_{− α, stops his current exploration and then teleports to page j with probability z}j.

When some page i has no outlink, Ni= 0, and so the entries of the ith row of the matrix

S cannot be defined according to (2.1). Then, we set Si,j := zj. In other words, when visiting

a page without any outlink, the websurfer interrupts its current exploration and teleports to page j again with probability zj. It is also possible to define another probability vector Z

(different from z) for the teleportation from these “dangling nodes”.

The PageRank π is defined as the invariant measure of the Markov chain (Xt)t≥0 repre-

senting the behavior of the websurfer. This invariant measure is unique if α < 1, or if P is irreducible.

Typically, one takes α = 0.85, meaning that at each step, a websurfer interrupts his current search with probability 0.15 ≃ 1/7. The advantages of the introduction of the damping factor and of the teleportation vector are well known. First, it guarantees that the power algorithm converges to the PageRank with a geometric rate α independent of the size (and other characteristics) of the web graph. In addition, the teleportation vector may be used to “tune” the PageRank if necessary. By default, z = eT/n is the uniform stochastic vector. We will assume in the sequel that α < 1 and zj > 0 for all j ∈ [n], so that P is irreducible.

The graph on Figure 2.1 represents a fragment of the web graph. We obtained the graph by performing a crawl of our laboratory with 1500 pages. We set the teleportation vector in such a way that the 5 surrounding institutional pages are dominant. The teleportation probabilities to these pages were taken to be proportional to the PageRank (we used the Google Toolbar, which gives a rough indication of the PageRank, on a logarithmic scale). After running the PageRank algorithm on this graph, we found that within the controlled site, the main page of this author has the biggest PageRank (consistently with the results provided by Google search).

Dans le document Optimization of Perron eigenvectors and applications: from web ranking to chronotherapeutics (Page 32-35)