Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Finding Related Pages Using Green Measures:
The Example of Wikipedia
Yann Ollivier Pierre Senellart
AAAI July 24th, 2007
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 1 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Related nodes in a graph
Given a hyperlinked environment (= a graph)...
2 1 3
6
7 9
4
5
10
8
Problem
Finding nodes semantically related to some given node.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 2 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Related nodes in a graph
Given a hyperlinked environment (= a graph)...
2 1 3
6
7 9
4
5
10
8
Problem
Finding nodes semantically related to some given node.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 2 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Example of related nodes
Example (World Wide Web) Nodes: Web pages
Edges: hyperlinks
Related nodes: similar/related pages (cf Google)
Example (Wikipedia) Nodes: articles
Edges: hyperlinks
Related nodes: related articles (= articles on semantically related topics)
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 3 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Example of related nodes
Example (World Wide Web) Nodes: Web pages
Edges: hyperlinks
Related nodes: similar/related pages (cf Google)
Example (Wikipedia) Nodes: articles
Edges: hyperlinks
Related nodes: related articles (= articles on semantically related topics)
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 3 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Example of related nodes
Example (World Wide Web) Nodes: Web pages
Edges: hyperlinks
Related nodes: similar/related pages (cf Google)
Example (Wikipedia) Nodes: articles
Edges: hyperlinks
Related nodes: related articles (= articles on semantically related topics)
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 3 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Classical approaches
Classical approaches for finding related nodes (e.g. on the World Wide Web):
Based on the use of variants of PageRank on local subgraphs.
Text Mining techniques : cocitations, vector-space model...
Our approach
Use of a classical Markov chain tool: Green measures.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Classical approaches
Classical approaches for finding related nodes (e.g. on the World Wide Web):
Based on the use of variants of PageRank on local subgraphs.
Text Mining techniques : cocitations, vector-space model...
Our approach
Use of a classical Markov chain tool: Green measures.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Classical approaches
Classical approaches for finding related nodes (e.g. on the World Wide Web):
Based on the use of variants of PageRank on local subgraphs.
Text Mining techniques : cocitations, vector-space model...
Our approach
Use of a classical Markov chain tool: Green measures.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Classical approaches
Classical approaches for finding related nodes (e.g. on the World Wide Web):
Based on the use of variants of PageRank on local subgraphs.
Text Mining techniques : cocitations, vector-space model...
Our approach
Use of a classical Markov chain tool: Green measures.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 4 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Contributions
Our contributions:
1
A novel use of Green measures for extracting semantic information in a graph.
2
An extensive comparative study with classical approaches, on the English version of Wikipedia.
Remark
Only pure mathematical methods, no Wikipedia-specific tricks included.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Contributions
Our contributions:
1
A novel use of Green measures for extracting semantic information in a graph.
2
An extensive comparative study with classical approaches, on the English version of Wikipedia.
Remark
Only pure mathematical methods, no Wikipedia-specific tricks included.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Contributions
Our contributions:
1
A novel use of Green measures for extracting semantic information in a graph.
2
An extensive comparative study with classical approaches, on the English version of Wikipedia.
Remark
Only pure mathematical methods, no Wikipedia-specific tricks included.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Contributions
Our contributions:
1
A novel use of Green measures for extracting semantic information in a graph.
2
An extensive comparative study with classical approaches, on the English version of Wikipedia.
Remark
Only pure mathematical methods, no Wikipedia-specific tricks included.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 5 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Outline
1 Introduction
2 Green measures
Graphs as Markov chains Green measures
3 Methods Compared
4 Experiment on Wikipedia
5 Conclusion
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 6 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Graph = Markov chain
Definition (Markov chain)
Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .
For a directed graph:
State space: set of nodes
Transition probabilities: based on existence (and weight) of edges
Remark
All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Graph = Markov chain
Definition (Markov chain)
Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .
For a directed graph:
State space: set of nodes
Transition probabilities: based on existence (and weight) of edges
Remark
All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Graph = Markov chain
Definition (Markov chain)
Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .
For a directed graph:
State space: set of nodes
Transition probabilities: based on existence (and weight) of edges
Remark
All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Graph = Markov chain
Definition (Markov chain)
Probabilistic process on a state space, defined by transition probabilities p ij from each state i to each state j .
For a directed graph:
State space: set of nodes
Transition probabilities: based on existence (and weight) of edges
Remark
All graphs will be supposed strongly connected and with gcd of length of all cycles equal to 1.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 7 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Equilibrium measure
Definition (Measure)
Assignments of real numbers to the state set.
Definition (Propagation operator)
Operator which maps a measure to a measure 0 computed as:
0 j = X
i
( i p ij )
Result
If we iterate the propagation operator from any measure summing to 1, we converge to a unique equilibrium measure. (PageRank with no random jumps).
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 8 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Equilibrium measure
Definition (Measure)
Assignments of real numbers to the state set.
Definition (Propagation operator)
Operator which maps a measure to a measure 0 computed as:
0 j = X
i
( i p ij )
Result
If we iterate the propagation operator from any measure summing to 1, we converge to a unique equilibrium measure. (PageRank with no random jumps).
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 8 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Equilibrium measure
Definition (Measure)
Assignments of real numbers to the state set.
Definition (Propagation operator)
Operator which maps a measure to a measure 0 computed as:
0 j = X
i
( i p ij )
Result
If we iterate the propagation operator from any measure summing to 1, we converge to a unique equilibrium measure. (PageRank with no random jumps).
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 8 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 1
0.100 0.100
0.100
0.100
0.100 0.100
0.100
0.100
0.100
0.100
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 2
0.033 0.317
0.075
0.108
0.025 0.058
0.083
0.150
0.117
0.033
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 3
0.036 0.193
0.108
0.163
0.079 0.090
0.074
0.154
0.094
0.008
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 4
0.054 0.212
0.093
0.152
0.048 0.051
0.108
0.149
0.106
0.026
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 5
0.051 0.247
0.078
0.143
0.053 0.062
0.097
0.153
0.099
0.016
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 6
0.048 0.232
0.093
0.156
0.062 0.067
0.087
0.138
0.099
0.018
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 7
0.052 0.226
0.092
0.148
0.058 0.064
0.098
0.146
0.096
0.021
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 8
0.049 0.238
0.088
0.149
0.057 0.063
0.095
0.141
0.099
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 9
0.050 0.232
0.091
0.149
0.060 0.066
0.094
0.143
0.096
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 10
0.050 0.233
0.091
0.150
0.058 0.064
0.095
0.142
0.098
0.020
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 11
0.050 0.234
0.090
0.148
0.058 0.065
0.095
0.143
0.097
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 12
0.049 0.233
0.091
0.149
0.058 0.065
0.095
0.142
0.098
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 13
0.050 0.233
0.091
0.149
0.058 0.065
0.095
0.143
0.097
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRank — Iteration ] 14
0.050 0.234
0.091
0.149
0.058 0.065
0.095
0.142
0.097
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 9 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Background on Green measures
Green functions
Come from electrostatic theory (potential created by a charge distribution).
Analogy between electrostatic potential theory and Markov chains.
Green measures: discrete analogues of Green functions.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 10 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Background on Green measures
Green functions
Come from electrostatic theory (potential created by a charge distribution).
Analogy between electrostatic potential theory and Markov chains.
Green measures: discrete analogues of Green functions.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 10 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Background on Green measures
Green functions
Come from electrostatic theory (potential created by a charge distribution).
Analogy between electrostatic potential theory and Markov chains.
Green measures: discrete analogues of Green functions.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 10 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Definition of Green measures
Definition (Green measure centered at node i )
Only fixed point of the operator on measures defined by:
j 7! P k ( k p kj ) + ( ij j ) where ij =
( 1 if i = j 0 otherwise
Interpretations
PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .
Time spent at a node knowing the initial node is i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Definition of Green measures
Definition (Green measure centered at node i )
Only fixed point of the operator on measures defined by:
j 7! P k ( k p kj ) + ( ij j ) where ij =
( 1 if i = j 0 otherwise
Interpretations
PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .
Time spent at a node knowing the initial node is i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Definition of Green measures
Definition (Green measure centered at node i )
Only fixed point of the operator on measures defined by:
j 7! P k ( k p kj ) + ( ij j ) where ij =
( 1 if i = j 0 otherwise
Interpretations
PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .
Time spent at a node knowing the initial node is i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Definition of Green measures
Definition (Green measure centered at node i )
Only fixed point of the operator on measures defined by:
j 7! P k ( k p kj ) + ( ij j ) where ij =
( 1 if i = j 0 otherwise
Interpretations
PageRank with source at i : standard PageRank computation while, at each iteration, adding 1 to the measure of i , and subtracting j to every node j .
Time spent at a node knowing the initial node is i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 11 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Outline
1 Introduction
2 Green measures
3 Methods Compared Green and SymGreen PageRankOfLinks Cosine
Cocitations
4 Experiment on Wikipedia
5 Conclusion
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 12 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Purpose
Finding nodes in the graph related to i .
For each method, output an ordered list of nodes related to i . Each method provides a similarity score to i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 13 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Purpose
Finding nodes in the graph related to i .
For each method, output an ordered list of nodes related to i . Each method provides a similarity score to i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 13 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Purpose
Finding nodes in the graph related to i .
For each method, output an ordered list of nodes related to i . Each method provides a similarity score to i .
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 13 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Method Description
Method Description
Direct application of the theory of Green measures.
Improvement: multiplication by a term favoring uncommon nodes log ( 1 = j ) (quantity of information).
Iteration until reasonable convergence on the top results.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 14 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Method Description
Method Description
Direct application of the theory of Green measures.
Improvement: multiplication by a term favoring uncommon nodes log ( 1 = j ) (quantity of information).
Iteration until reasonable convergence on the top results.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 14 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Method Description
Method Description
Direct application of the theory of Green measures.
Improvement: multiplication by a term favoring uncommon nodes log ( 1 = j ) (quantity of information).
Iteration until reasonable convergence on the top results.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 14 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 1
-0.050 -0.233
-0.091
-0.149
-0.058 0.935
-0.095
-0.143
-0.097
-0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 2
-0.099 0.034
0.319
-0.298
-0.117 0.870
-0.190
-0.285
-0.194
-0.039
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 3
-0.149 -0.199
0.353
-0.322
-0.050 0.930
-0.035
-0.178
-0.291
-0.058
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 4
-0.157 -0.078
0.325
-0.304
-0.109 0.865
-0.026
-0.258
-0.222
-0.036
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 5
-0.151 -0.096
0.322
-0.333
-0.078 0.903
-0.034
-0.203
-0.273
-0.055
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 6
-0.161 -0.096
0.337
-0.300
-0.083 0.892
-0.045
-0.256
-0.242
-0.045
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 7
-0.150 -0.108
0.331
-0.328
-0.083 0.895
-0.027
-0.218
-0.267
-0.047
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 8
-0.159 -0.086
0.330
-0.312
-0.086 0.892
-0.039
-0.246
-0.248
-0.047
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 9
-0.154 -0.104
0.334
-0.321
-0.081 0.897
-0.034
-0.228
-0.262
-0.048
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 10
-0.157 -0.094
0.332
-0.314
-0.085 0.892
-0.035
-0.241
-0.252
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 11
-0.155 -0.098
0.332
-0.320
-0.083 0.895
-0.034
-0.231
-0.259
-0.047
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 12
-0.157 -0.095
0.332
-0.315
-0.084 0.893
-0.036
-0.239
-0.253
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 13
-0.155 -0.098
0.332
-0.318
-0.084 0.894
-0.034
-0.234
-0.257
-0.047
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 14
-0.156 -0.095
0.332
-0.316
-0.084 0.893
-0.035
-0.238
-0.254
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 15
-0.156 -0.097
0.332
-0.318
-0.084 0.894
-0.034
-0.235
-0.256
-0.047
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 16
-0.156 -0.096
0.332
-0.316
-0.084 0.893
-0.035
-0.237
-0.254
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 17
-0.156 -0.097
0.332
-0.317
-0.084 0.894
-0.034
-0.236
-0.255
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 18
-0.156 -0.096
0.332
-0.316
-0.084 0.893
-0.034
-0.237
-0.254
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 19
-0.156 -0.096
0.332
-0.317
-0.084 0.893
-0.034
-0.236
-0.255
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Green — Iteration ] 20
-0.156 -0.096
0.332
-0.316
-0.085 0.893
-0.034
-0.237
-0.254
-0.046
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 15 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
SymGreen — Method Description
Method Description
Green goes only forward, may be a limitation.
Symmetrize the graph, in a canonical sense in relation to the equilibrium measure:
~ p ij = 1
2 p ij + p ji j
i
The resulting graph has the same equilibrium measure.
Same as Green on this symmetrized graph.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 16 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
SymGreen — Method Description
Method Description
Green goes only forward, may be a limitation.
Symmetrize the graph, in a canonical sense in relation to the equilibrium measure:
~ p ij = 1
2 p ij + p ji j
i
The resulting graph has the same equilibrium measure.
Same as Green on this symmetrized graph.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 16 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
SymGreen — Method Description
Method Description
Green goes only forward, may be a limitation.
Symmetrize the graph, in a canonical sense in relation to the equilibrium measure:
~ p ij = 1
2 p ij + p ji j
i
The resulting graph has the same equilibrium measure.
Same as Green on this symmetrized graph.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 16 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRankOfLinks
0.050 0.233
0.091
0.149
0.058 0.065
0.095
0.143
0.097
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 17 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRankOfLinks
0.050 0.233
0.091
0.149
0.058 0.065
0.095
0.143
0.097
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 18 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
PageRankOfLinks
0.050 0.233
0.091
0.149
0.058 0.065
0.095
0.143
0.097
0.019
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 19 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cosine
2 1 3
6
7 9
4
5
10
8
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 20 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cosine
2 1 3
6
7 9
4
5
10
8
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 21 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cosine
2 1 3
6
7 9
4
5
10
8
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 22 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cosine
Dimensions
1 2 3 4 6 7 9 10 Cosine with 9
Do c umen ts
1 X 0.40
2 X X X X 0.43
4 X 0.40
6 X X X 0.09
8 X X X 0.13
9 X X 1.00
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 23 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cocitations
2 1 3
6
7 9
4
5
10
8
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 24 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cocitations
2 1 3
6
7 9
4
5
10
8
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 25 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Cocitations
2 1 3
6
7 9
4
5
10
8
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 26 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Outline
1 Introduction
2 Green measures
3 Methods Compared
4 Experiment on Wikipedia Wikipedia graph Evaluation Results
5 Conclusion
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 27 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
The graph of Wikipedia
Statistics
1 ; 606 ; 896 nodes (as of September 25th, 2006).
38 ; 896 ; 462 edges.
95 % of the nodes belong to the largest strongly connected component.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 28 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Evaluation methodology
Blind evaluation of the methods.
Articles selected for their diversity:
Clique (graph theory) Germany
Hungarian language Pierre de Fermat Star Wars
Theory of relativity 1989
66 evaluators asked to give a mark to each list of words.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 29 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Evaluation methodology
Blind evaluation of the methods.
Articles selected for their diversity:
Clique (graph theory) Germany
Hungarian language Pierre de Fermat Star Wars
Theory of relativity 1989
66 evaluators asked to give a mark to each list of words.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 29 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Evaluation methodology
Blind evaluation of the methods.
Articles selected for their diversity:
Clique (graph theory) Germany
Hungarian language Pierre de Fermat Star Wars
Theory of relativity 1989
66 evaluators asked to give a mark to each list of words.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 29 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Output on Germany
Green SymGreen PageRankOfLinks Cosine Cocitations
1.Germany 2.Berlin 3.German
language 4.Christian
Democratic Union (Germany) 5.Austria 6.Hamburg 7.German
reunification 8.Social
Democratic Party of Germany 9.German Empire 10.German
Democratic Republic
1.Germany 2.Berlin 3.France 4.Austria 5.German language 6.Bavaria 7.World War II 8.German
Democratic Republic 9.European
Union 10.Hamburg
1.United States 2.United
Kingdom 3.France 4.2005 5.Germany 6.World War II 7.Canada 8.English language 9.Japan 10.Italy
1.Germany 2.History of
Germany since 1945 3.History of
Germany 4.Timeline of
German history 5.States of
Germany 6.Politics of
Germany 7.List of
Germany- related topics 8.Hildesheimer
Rabbinical Seminary 9.Pleasure
Victim 10.German
Unity Day
1.Germany 2.United States 3.France 4.United Kingdom 5.World War II 6.Italy 7.Netherlands 8.Japan 9.2005 10.Cate-
gory:Living people
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 30 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Results
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 31 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Outline
1 Introduction
2 Green measures
3 Methods Compared
4 Experiment on Wikipedia
5 Conclusion Summary Perspectives
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 32 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Summary
Green measures: a tool for extracting semantic information in a graph.
In comparison to other methods, in the case of Wikipedia:
Better overall performance.
Robustness.
Discovery of relevant semantic relations.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Summary
Green measures: a tool for extracting semantic information in a graph.
In comparison to other methods, in the case of Wikipedia:
Better overall performance.
Robustness.
Discovery of relevant semantic relations.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Summary
Green measures: a tool for extracting semantic information in a graph.
In comparison to other methods, in the case of Wikipedia:
Better overall performance.
Robustness.
Discovery of relevant semantic relations.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Summary
Green measures: a tool for extracting semantic information in a graph.
In comparison to other methods, in the case of Wikipedia:
Better overall performance.
Robustness.
Discovery of relevant semantic relations.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Summary
Green measures: a tool for extracting semantic information in a graph.
In comparison to other methods, in the case of Wikipedia:
Better overall performance.
Robustness.
Discovery of relevant semantic relations.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 33 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Perspectives
Application to the Web graph.
Interpolation between Green and SymGreen.
Clustering using Green measures:
unpractical now because of computation times.
Use of Green measures on other Markov chains, e.g. for computing authority scores.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Perspectives
Application to the Web graph.
Interpolation between Green and SymGreen.
Clustering using Green measures:
unpractical now because of computation times.
Use of Green measures on other Markov chains, e.g. for computing authority scores.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Perspectives
Application to the Web graph.
Interpolation between Green and SymGreen.
Clustering using Green measures:
unpractical now because of computation times.
Use of Green measures on other Markov chains, e.g. for computing authority scores.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34
Introduction Green measures Methods Compared Experiment on Wikipedia Conclusion
Perspectives
Application to the Web graph.
Interpolation between Green and SymGreen.
Clustering using Green measures:
unpractical now because of computation times.
Use of Green measures on other Markov chains, e.g. for computing authority scores.
Ollivier, Senellart (CNRS & INRIA) Related Pages & Green Measures AAAI, 2007/07/24 34 / 34