• Aucun résultat trouvé

Finding Related Pages Using Green Measures: The Example of Wikipedia

N/A
N/A
Protected

Academic year: 2022

Partager "Finding Related Pages Using Green Measures: The Example of Wikipedia"

Copied!
100
0
0

Texte intégral

(1)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Finding Related Pages Using Green Measures:

The Example of Wikipedia

Yann Ollivier Pierre Senellart

AAAI July 24th, 2007

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 1 / 34

(2)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Related nodes in a graph

Given a hyperlinked environment (= a graph)...

2 1 3

6

7 9

4

5

10

8

Problem

Finding nodes semantically related to some given node.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 2 / 34

(3)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Related nodes in a graph

Given a hyperlinked environment (= a graph)...

2 1 3

6

7 9

4

5

10

8

Problem

Finding nodes semantically related to some given node.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 2 / 34

(4)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Example of related nodes

Example (World Wide Web) Nodes: Web pages

Edges: hyperlinks

Related nodes: similar/related pages (cf Google)

Example (Wikipedia) Nodes: articles

Edges: hyperlinks

Related nodes: related articles (= articles on semantically related topics)

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 3 / 34

(5)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Example of related nodes

Example (World Wide Web) Nodes: Web pages

Edges: hyperlinks

Related nodes: similar/related pages (cf Google)

Example (Wikipedia) Nodes: articles

Edges: hyperlinks

Related nodes: related articles (= articles on semantically related topics)

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 3 / 34

(6)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Example of related nodes

Example (World Wide Web) Nodes: Web pages

Edges: hyperlinks

Related nodes: similar/related pages (cf Google)

Example (Wikipedia) Nodes: articles

Edges: hyperlinks

Related nodes: related articles (= articles on semantically related topics)

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 3 / 34

(7)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Classical approaches

Classical approaches for finding related nodes (e.g. on the World Wide Web):

Based on the use of variants of PageRank on local subgraphs.

Text Mining techniques : cocitations, vector-space model...

Our approach

Use of a classical Markov chain tool: Green measures.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34

(8)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Classical approaches

Classical approaches for finding related nodes (e.g. on the World Wide Web):

Based on the use of variants of PageRank on local subgraphs.

Text Mining techniques : cocitations, vector-space model...

Our approach

Use of a classical Markov chain tool: Green measures.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34

(9)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Classical approaches

Classical approaches for finding related nodes (e.g. on the World Wide Web):

Based on the use of variants of PageRank on local subgraphs.

Text Mining techniques : cocitations, vector-space model...

Our approach

Use of a classical Markov chain tool: Green measures.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34

(10)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Classical approaches

Classical approaches for finding related nodes (e.g. on the World Wide Web):

Based on the use of variants of PageRank on local subgraphs.

Text Mining techniques : cocitations, vector-space model...

Our approach

Use of a classical Markov chain tool: Green measures.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34

(11)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Contributions

Our contributions:

1

A novel use of Green measures for extracting semantic information in a graph.

2

An extensive comparative study with classical approaches, on the English version of Wikipedia.

Remark

Only pure mathematical methods, no Wikipedia-specific tricks included.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34

(12)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Contributions

Our contributions:

1

A novel use of Green measures for extracting semantic information in a graph.

2

An extensive comparative study with classical approaches, on the English version of Wikipedia.

Remark

Only pure mathematical methods, no Wikipedia-specific tricks included.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34

(13)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Contributions

Our contributions:

1

A novel use of Green measures for extracting semantic information in a graph.

2

An extensive comparative study with classical approaches, on the English version of Wikipedia.

Remark

Only pure mathematical methods, no Wikipedia-specific tricks included.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34

(14)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Contributions

Our contributions:

1

A novel use of Green measures for extracting semantic information in a graph.

2

An extensive comparative study with classical approaches, on the English version of Wikipedia.

Remark

Only pure mathematical methods, no Wikipedia-specific tricks included.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34

(15)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Outline

1 Introduction

2 Green measures

Graphs as Markov chains Green measures

3 Methods Compared

4 Experiment on Wikipedia

5 Conclusion

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 6 / 34

(16)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Graph = Markov chain

Definition (Markov chain)

Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .

For a directed graph:

State space: set of nodes

Transition probabilities: based on existence (and weight) of edges

Remark

All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34

(17)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Graph = Markov chain

Definition (Markov chain)

Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .

For a directed graph:

State space: set of nodes

Transition probabilities: based on existence (and weight) of edges

Remark

All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34

(18)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Graph = Markov chain

Definition (Markov chain)

Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .

For a directed graph:

State space: set of nodes

Transition probabilities: based on existence (and weight) of edges

Remark

All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34

(19)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Graph = Markov chain

Definition (Markov chain)

Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .

For a directed graph:

State space: set of nodes

Transition probabilities: based on existence (and weight) of edges

Remark

All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34

(20)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Equilibrium measure

Definition (Measure)

Assignments of real numbers to the state set.

Definition (Propagation operator)

Operator which maps a measure to a measure 0 computed as:

0 j = X

i

( i p ij )

Result

If we iterate the propagation operator from any measure summing to 1, we converge to a unique equilibrium measure. (PageRank with no random jumps).

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 8 / 34

(21)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Equilibrium measure

Definition (Measure)

Assignments of real numbers to the state set.

Definition (Propagation operator)

Operator which maps a measure to a measure 0 computed as:

0 j = X

i

( i p ij )

Result

If we iterate the propagation operator from any measure summing to 1, we converge to a unique equilibrium measure. (PageRank with no random jumps).

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 8 / 34

(22)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Equilibrium measure

Definition (Measure)

Assignments of real numbers to the state set.

Definition (Propagation operator)

Operator which maps a measure to a measure 0 computed as:

0 j = X

i

( i p ij )

Result

If we iterate the propagation operator from any measure summing to 1, we converge to a unique equilibrium measure. (PageRank with no random jumps).

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 8 / 34

(23)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 1

0.100 0.100

0.100

0.100

0.100 0.100

0.100

0.100

0.100

0.100

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(24)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 2

0.033 0.317

0.075

0.108

0.025 0.058

0.083

0.150

0.117

0.033

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(25)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 3

0.036 0.193

0.108

0.163

0.079 0.090

0.074

0.154

0.094

0.008

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(26)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 4

0.054 0.212

0.093

0.152

0.048 0.051

0.108

0.149

0.106

0.026

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(27)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 5

0.051 0.247

0.078

0.143

0.053 0.062

0.097

0.153

0.099

0.016

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(28)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 6

0.048 0.232

0.093

0.156

0.062 0.067

0.087

0.138

0.099

0.018

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(29)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 7

0.052 0.226

0.092

0.148

0.058 0.064

0.098

0.146

0.096

0.021

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(30)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 8

0.049 0.238

0.088

0.149

0.057 0.063

0.095

0.141

0.099

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(31)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 9

0.050 0.232

0.091

0.149

0.060 0.066

0.094

0.143

0.096

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(32)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 10

0.050 0.233

0.091

0.150

0.058 0.064

0.095

0.142

0.098

0.020

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(33)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 11

0.050 0.234

0.090

0.148

0.058 0.065

0.095

0.143

0.097

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(34)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 12

0.049 0.233

0.091

0.149

0.058 0.065

0.095

0.142

0.098

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(35)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 13

0.050 0.233

0.091

0.149

0.058 0.065

0.095

0.143

0.097

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(36)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRank — Iteration ] 14

0.050 0.234

0.091

0.149

0.058 0.065

0.095

0.142

0.097

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34

(37)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Background on Green measures

Green functions

Come from electrostatic theory (potential created by a charge distribution).

Analogy between electrostatic potential theory and Markov chains.

Green measures: discrete analogues of Green functions.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 10 / 34

(38)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Background on Green measures

Green functions

Come from electrostatic theory (potential created by a charge distribution).

Analogy between electrostatic potential theory and Markov chains.

Green measures: discrete analogues of Green functions.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 10 / 34

(39)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Background on Green measures

Green functions

Come from electrostatic theory (potential created by a charge distribution).

Analogy between electrostatic potential theory and Markov chains.

Green measures: discrete analogues of Green functions.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 10 / 34

(40)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Definition of Green measures

Definition (Green measure centered at node i )

Only fixed point of the operator on measures defined by:

j 7! P k ( k p kj ) + ( ij j ) where ij =

( 1 if i = j 0 otherwise

Interpretations

PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .

Time spent at a node knowing the initial node is i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34

(41)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Definition of Green measures

Definition (Green measure centered at node i )

Only fixed point of the operator on measures defined by:

j 7! P k ( k p kj ) + ( ij j ) where ij =

( 1 if i = j 0 otherwise

Interpretations

PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .

Time spent at a node knowing the initial node is i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34

(42)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Definition of Green measures

Definition (Green measure centered at node i )

Only fixed point of the operator on measures defined by:

j 7! P k ( k p kj ) + ( ij j ) where ij =

( 1 if i = j 0 otherwise

Interpretations

PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .

Time spent at a node knowing the initial node is i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34

(43)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Definition of Green measures

Definition (Green measure centered at node i )

Only fixed point of the operator on measures defined by:

j 7! P k ( k p kj ) + ( ij j ) where ij =

( 1 if i = j 0 otherwise

Interpretations

PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .

Time spent at a node knowing the initial node is i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34

(44)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Outline

1 Introduction

2 Green measures

3 Methods Compared Green and SymGreen PageRankOfLinks Cosine

Cocitations

4 Experiment on Wikipedia

5 Conclusion

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 12 / 34

(45)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Purpose

Finding nodes in the graph related to i .

For each method, output an ordered list of nodes related to i . Each method provides a similarity score to i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 13 / 34

(46)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Purpose

Finding nodes in the graph related to i .

For each method, output an ordered list of nodes related to i . Each method provides a similarity score to i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 13 / 34

(47)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Purpose

Finding nodes in the graph related to i .

For each method, output an ordered list of nodes related to i . Each method provides a similarity score to i .

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 13 / 34

(48)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Method Description

Method Description

Direct application of the theory of Green measures.

Improvement: multiplication by a term favoring uncommon nodes log ( 1 = j ) (quantity of information).

Iteration until reasonable convergence on the top results.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 14 / 34

(49)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Method Description

Method Description

Direct application of the theory of Green measures.

Improvement: multiplication by a term favoring uncommon nodes log ( 1 = j ) (quantity of information).

Iteration until reasonable convergence on the top results.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 14 / 34

(50)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Method Description

Method Description

Direct application of the theory of Green measures.

Improvement: multiplication by a term favoring uncommon nodes log ( 1 = j ) (quantity of information).

Iteration until reasonable convergence on the top results.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 14 / 34

(51)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 1

-0.050 -0.233

-0.091

-0.149

-0.058 0.935

-0.095

-0.143

-0.097

-0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(52)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 2

-0.099 0.034

0.319

-0.298

-0.117 0.870

-0.190

-0.285

-0.194

-0.039

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(53)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 3

-0.149 -0.199

0.353

-0.322

-0.050 0.930

-0.035

-0.178

-0.291

-0.058

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(54)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 4

-0.157 -0.078

0.325

-0.304

-0.109 0.865

-0.026

-0.258

-0.222

-0.036

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(55)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 5

-0.151 -0.096

0.322

-0.333

-0.078 0.903

-0.034

-0.203

-0.273

-0.055

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(56)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 6

-0.161 -0.096

0.337

-0.300

-0.083 0.892

-0.045

-0.256

-0.242

-0.045

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(57)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 7

-0.150 -0.108

0.331

-0.328

-0.083 0.895

-0.027

-0.218

-0.267

-0.047

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(58)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 8

-0.159 -0.086

0.330

-0.312

-0.086 0.892

-0.039

-0.246

-0.248

-0.047

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(59)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 9

-0.154 -0.104

0.334

-0.321

-0.081 0.897

-0.034

-0.228

-0.262

-0.048

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(60)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 10

-0.157 -0.094

0.332

-0.314

-0.085 0.892

-0.035

-0.241

-0.252

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(61)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 11

-0.155 -0.098

0.332

-0.320

-0.083 0.895

-0.034

-0.231

-0.259

-0.047

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(62)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 12

-0.157 -0.095

0.332

-0.315

-0.084 0.893

-0.036

-0.239

-0.253

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(63)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 13

-0.155 -0.098

0.332

-0.318

-0.084 0.894

-0.034

-0.234

-0.257

-0.047

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(64)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 14

-0.156 -0.095

0.332

-0.316

-0.084 0.893

-0.035

-0.238

-0.254

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(65)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 15

-0.156 -0.097

0.332

-0.318

-0.084 0.894

-0.034

-0.235

-0.256

-0.047

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(66)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 16

-0.156 -0.096

0.332

-0.316

-0.084 0.893

-0.035

-0.237

-0.254

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(67)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 17

-0.156 -0.097

0.332

-0.317

-0.084 0.894

-0.034

-0.236

-0.255

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(68)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 18

-0.156 -0.096

0.332

-0.316

-0.084 0.893

-0.034

-0.237

-0.254

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(69)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 19

-0.156 -0.096

0.332

-0.317

-0.084 0.893

-0.034

-0.236

-0.255

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(70)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Green — Iteration ] 20

-0.156 -0.096

0.332

-0.316

-0.085 0.893

-0.034

-0.237

-0.254

-0.046

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34

(71)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

SymGreen — Method Description

Method Description

Green goes only forward, may be a limitation.

Symmetrize the graph, in a canonical sense in relation to the equilibrium measure:

~ p ij = 1

2 p ij + p ji j

i

The resulting graph has the same equilibrium measure.

Same as Green on this symmetrized graph.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 16 / 34

(72)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

SymGreen — Method Description

Method Description

Green goes only forward, may be a limitation.

Symmetrize the graph, in a canonical sense in relation to the equilibrium measure:

~ p ij = 1

2 p ij + p ji j

i

The resulting graph has the same equilibrium measure.

Same as Green on this symmetrized graph.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 16 / 34

(73)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

SymGreen — Method Description

Method Description

Green goes only forward, may be a limitation.

Symmetrize the graph, in a canonical sense in relation to the equilibrium measure:

~ p ij = 1

2 p ij + p ji j

i

The resulting graph has the same equilibrium measure.

Same as Green on this symmetrized graph.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 16 / 34

(74)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRankOfLinks

0.050 0.233

0.091

0.149

0.058 0.065

0.095

0.143

0.097

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 17 / 34

(75)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRankOfLinks

0.050 0.233

0.091

0.149

0.058 0.065

0.095

0.143

0.097

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 18 / 34

(76)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

PageRankOfLinks

0.050 0.233

0.091

0.149

0.058 0.065

0.095

0.143

0.097

0.019

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 19 / 34

(77)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cosine

2 1 3

6

7 9

4

5

10

8

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 20 / 34

(78)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cosine

2 1 3

6

7 9

4

5

10

8

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 21 / 34

(79)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cosine

2 1 3

6

7 9

4

5

10

8

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 22 / 34

(80)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cosine

Dimensions

1 2 3 4 6 7 9 10 Cosine with 9

Do c umen ts

1 X 0.40

2 X X X X 0.43

4 X 0.40

6 X X X 0.09

8 X X X 0.13

9 X X 1.00

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 23 / 34

(81)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cocitations

2 1 3

6

7 9

4

5

10

8

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 24 / 34

(82)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cocitations

2 1 3

6

7 9

4

5

10

8

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 25 / 34

(83)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Cocitations

2 1 3

6

7 9

4

5

10

8

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 26 / 34

(84)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Outline

1 Introduction

2 Green measures

3 Methods Compared

4 Experiment on Wikipedia Wikipedia graph Evaluation Results

5 Conclusion

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 27 / 34

(85)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

The graph of Wikipedia

Statistics

1 ; 606 ; 896 nodes (as of September 25th, 2006).

38 ; 896 ; 462 edges.

95 % of the nodes belong to the largest strongly connected component.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 28 / 34

(86)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Evaluation methodology

Blind evaluation of the methods.

Articles selected for their diversity:

Clique (graph theory) Germany

Hungarian language Pierre de Fermat Star Wars

Theory of relativity 1989

66 evaluators asked to give a mark to each list of words.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 29 / 34

(87)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Evaluation methodology

Blind evaluation of the methods.

Articles selected for their diversity:

Clique (graph theory) Germany

Hungarian language Pierre de Fermat Star Wars

Theory of relativity 1989

66 evaluators asked to give a mark to each list of words.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 29 / 34

(88)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Evaluation methodology

Blind evaluation of the methods.

Articles selected for their diversity:

Clique (graph theory) Germany

Hungarian language Pierre de Fermat Star Wars

Theory of relativity 1989

66 evaluators asked to give a mark to each list of words.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 29 / 34

(89)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Output on Germany

Green SymGreen PageRankOfLinks Cosine Cocitations

1.Germany 2.Berlin 3.German

language 4.Christian

Democratic Union (Germany) 5.Austria 6.Hamburg 7.German

reunification 8.Social

Democratic Party of Germany 9.German Empire 10.German

Democratic Republic

1.Germany 2.Berlin 3.France 4.Austria 5.German language 6.Bavaria 7.World War II 8.German

Democratic Republic 9.European

Union 10.Hamburg

1.United States 2.United

Kingdom 3.France 4.2005 5.Germany 6.World War II 7.Canada 8.English language 9.Japan 10.Italy

1.Germany 2.History of

Germany since 1945 3.History of

Germany 4.Timeline of

German history 5.States of

Germany 6.Politics of

Germany 7.List of

Germany- related topics 8.Hildesheimer

Rabbinical Seminary 9.Pleasure

Victim 10.German

Unity Day

1.Germany 2.United States 3.France 4.United Kingdom 5.World War II 6.Italy 7.Netherlands 8.Japan 9.2005 10.Cate-

gory:Living people

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 30 / 34

(90)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Results

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 31 / 34

(91)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Outline

1 Introduction

2 Green measures

3 Methods Compared

4 Experiment on Wikipedia

5 Conclusion Summary Perspectives

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 32 / 34

(92)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Summary

Green measures: a tool for extracting semantic information in a graph.

In comparison to other methods, in the case of Wikipedia:

Better overall performance.

Robustness.

Discovery of relevant semantic relations.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34

(93)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Summary

Green measures: a tool for extracting semantic information in a graph.

In comparison to other methods, in the case of Wikipedia:

Better overall performance.

Robustness.

Discovery of relevant semantic relations.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34

(94)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Summary

Green measures: a tool for extracting semantic information in a graph.

In comparison to other methods, in the case of Wikipedia:

Better overall performance.

Robustness.

Discovery of relevant semantic relations.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34

(95)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Summary

Green measures: a tool for extracting semantic information in a graph.

In comparison to other methods, in the case of Wikipedia:

Better overall performance.

Robustness.

Discovery of relevant semantic relations.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34

(96)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Summary

Green measures: a tool for extracting semantic information in a graph.

In comparison to other methods, in the case of Wikipedia:

Better overall performance.

Robustness.

Discovery of relevant semantic relations.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34

(97)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Perspectives

Application to the Web graph.

Interpolation between Green and SymGreen.

Clustering using Green measures:

unpractical now because of computation times.

Use of Green measures on other Markov chains, e.g. for computing authority scores.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34

(98)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Perspectives

Application to the Web graph.

Interpolation between Green and SymGreen.

Clustering using Green measures:

unpractical now because of computation times.

Use of Green measures on other Markov chains, e.g. for computing authority scores.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34

(99)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Perspectives

Application to the Web graph.

Interpolation between Green and SymGreen.

Clustering using Green measures:

unpractical now because of computation times.

Use of Green measures on other Markov chains, e.g. for computing authority scores.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34

(100)

Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion

Perspectives

Application to the Web graph.

Interpolation between Green and SymGreen.

Clustering using Green measures:

unpractical now because of computation times.

Use of Green measures on other Markov chains, e.g. for computing authority scores.

Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34

Références

Documents relatifs

We introduce a new method for finding nodes semantically related to a given node in a hyperlinked graph: the Green method, based on a classical Markov chain tool.. It is

1 A novel use of Green measures for extracting semantic information in a graph.. 2 An extensive comparative study with classical approaches, on the English version

[r]

In this paper, we investigated the possibility of using language models trained on georeferenced Flickr photos for finding the coordinates of Wikipedia pages.. Our experiments show

Looking in the articles describing people we can build an expert profile for each individual that can be later retrieved using a query describing the topic of expertise in which we

On suppose qu’au premier tour de chaque partie, Zig comme Puce choisissent l’entier pair qui optimise leurs chances de gain et qu’aux tours suivants l’un et l’autre jouent au

16-22 Our objec- tive, therefore, was to describe 3 measures of pregnancy- related weight (prepregnancy BMI, gestational weight gain, and interpregnancy weight change) in Nova

In this case, the zero section is very regular (even C ∞ ), but the Lyapunov exponents of every invariant measure whose support is contained in N are non-zero (except two, the