• Aucun résultat trouvé

Data and Algorithms of the Web

N/A
N/A
Protected

Academic year: 2022

Partager "Data and Algorithms of the Web"

Copied!
121
0
0

Texte intégral

(1)

Data and Algorithms of the Web

Link Analysis Algorithms Page Rank

some slides from:

Anand Rajaraman, Jeffrey D. Ullman

(2)

Link Analysis Algorithms

 Page Rank

 Hubs and Authorities

 Topic-Specific Page Rank

 Spam Detection Algorithms

 Other interesting topics we won’t cover

 Detecting duplicates and mirrors

 Mining for communities

(3)

Ranking web pages

(4)

Ranking web pages

 Web pages are not equally “important”

(5)

Ranking web pages

 Web pages are not equally “important”

www.bernard.com

and

www.stanford.edu both contain both the term “stanford” but:

www.stanford.edu has 23,400 webpages linking to it

www.bernard.com has 10 webpages linking to it

(6)

Ranking web pages

 Web pages are not equally “important”

www.bernard.com

and

www.stanford.edu both contain both the term “stanford” but:

www.stanford.edu has 23,400 webpages linking to it

www.bernard.com has 10 webpages linking to it

 Are all webpages linking to www.stanford.edu

equally important?

 The webpage of MIT is more “important”

than the webpage of a friend of bernard

(7)

Ranking web pages

-> Recursive definition of importance

 Web pages are not equally “important”

www.bernard.com

and

www.stanford.edu both contain both the term “stanford” but:

www.stanford.edu has 23,400 webpages linking to it

www.bernard.com has 10 webpages linking to it

 Are all webpages linking to www.stanford.edu

equally important?

 The webpage of MIT is more “important”

than the webpage of a friend of bernard

(8)

Simple recursive formulation

(9)

Simple recursive formulation

 The importance of a page P is

proportional to the importance of pages

Q where Q -> P (predecessors).

(10)

Simple recursive formulation

 The importance of a page P is

proportional to the importance of pages Q where Q -> P (predecessors).

 Each page Q votes for its successors. If page Q with importance x has n

successors, each succ. P gets x/n votes

(11)

Simple recursive formulation

 The importance of a page P is

proportional to the importance of pages Q where Q -> P (predecessors).

 Each page Q votes for its successors. If page Q with importance x has n

successors, each succ. P gets x/n votes

 Page P’s own importance is the sum of

the votes of its predecessors Q.

(12)

Simple “flow” model

Yahoo

M’soft Amazon

y

a m

(13)

Simple “flow” model

Yahoo

M’soft Amazon

y

a m

a/2

a/2

(14)

Simple “flow” model

Yahoo

M’soft Amazon

y

a m

y/2

y/2

a/2

a/2

(15)

Simple “flow” model

Yahoo

M’soft Amazon

y

a m

y/2

y/2

a/2

a/2 m

(16)

Simple “flow” model

Yahoo

M’soft Amazon

y

a m

y/2

y/2

a/2

a/2 m

y = y /2 + a /2

a = y /2 + m

m = a /2

(17)

Solving the flow equations

(18)

Solving the flow equations

 3 equations, 3 unknowns, no constants

 No unique solution

 All solutions equivalent modulo scale factor

(19)

Solving the flow equations

 3 equations, 3 unknowns, no constants

 No unique solution

 All solutions equivalent modulo scale factor

 Additional constraint forces uniqueness

 y+a+m = 1

 y = 2/5, a = 2/5, m = 1/5

(20)

Solving the flow equations

 3 equations, 3 unknowns, no constants

 No unique solution

 All solutions equivalent modulo scale factor

 Additional constraint forces uniqueness

 y+a+m = 1

 y = 2/5, a = 2/5, m = 1/5

 Gaussian elimination method works for

small examples, but we need a better

method for large graphs

(21)

Matrix formulation

(22)

Matrix formulation

 Matrix M has one row and one column for each web page (n x n, where n is the num of pages)

(23)

Matrix formulation

 Matrix M has one row and one column for each web page (n x n, where n is the num of pages)

 Suppose page j has k successors

If j -> i, then Mij=1/k

Else Mij=0

(24)

Matrix formulation

 Matrix M has one row and one column for each web page (n x n, where n is the num of pages)

 Suppose page j has k successors

If j -> i, then Mij=1/k

Else Mij=0

M is a column stochastic matrix

Columns sum to 1

(25)

Matrix formulation

 Matrix M has one row and one column for each web page (n x n, where n is the num of pages)

 Suppose page j has k successors

If j -> i, then Mij=1/k

Else Mij=0

M is a column stochastic matrix

Columns sum to 1

 Let r be the rank vector where:

ri is the importance score of page i

|r| = 1

(26)

Example

Suppose page j links to 3 pages, including i

i

j

M

X

r_i (contribution from predecessors) is obtained by multiplying ith row of M with r

(27)

Example

Suppose page j links to 3 pages, including i

i

j

M

X

1/3

r_i (contribution from predecessors) is obtained by multiplying ith row of M with r

(28)

Example

i

j

M

X

r_i (contribution from predecessors) is obtained by multiplying ith row of M with r

(29)

Example

i

j

M r

X

r_i (contribution from predecessors) is obtained by multiplying ith row of M with r

(30)

Example

i

j

M r

=

i

X

r_i (contribution from predecessors) is obtained by multiplying ith row of M with r

(31)

Example

i

j

M r r

=

i

X

r_i (contribution from predecessors) is obtained by multiplying ith row of M with r

(32)

Eigenvector formulation

 The system of linear eq. can be written r = Mr

 So the rank vector is an eigenvector of the stochastic web matrix

 In fact, its first or principal eigenvector, with corresponding eigenvalue...

Definition

. The vector x is an eigenvector of the matrix A with eigenvalue λ (lambda) if the following equation

holds: Ax = λx.

(33)

Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

(34)

Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

r = Mr

y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m

(35)

Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y = y /2 + a /2 a = y /2 + m m = a /2

r = Mr

y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m

(36)

Power Iteration method

(37)

Power Iteration method

 Simple iterative scheme (aka relaxation)

(38)

Power Iteration method

 Simple iterative scheme (aka relaxation)

 Suppose there are N web pages

(39)

Power Iteration method

 Simple iterative scheme (aka relaxation)

 Suppose there are N web pages

 Initialize: r

0

= [1/N,….,1/N]

T

(40)

Power Iteration method

 Simple iterative scheme (aka relaxation)

 Suppose there are N web pages

 Initialize: r

0

= [1/N,….,1/N]

T

 Iterate: r

k+1

= Mr

k

(41)

Power Iteration method

 Simple iterative scheme (aka relaxation)

 Suppose there are N web pages

 Initialize: r

0

= [1/N,….,1/N]

T

 Iterate: r

k+1

= Mr

k

 Stop when |r

k+1

- r

k

|

1

< ε

 |x|1 = ∑1≤i≤N|xi| is the L1 norm

 Can use any other vector norm e.g., Euclidean

(42)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

(43)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

(44)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

1/3 1/3 1/3

(45)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

1/3 1/3 1/3

1/3 1/2 1/6

(46)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

1/3 1/3 1/3

1/3 1/2 1/6

5/12 1/3 1/4

(47)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

1/3 1/3 1/3

1/3 1/2 1/6

5/12 1/3 1/4

3/8 11/24 1/6

(48)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

1/3 1/3 1/3

1/3 1/2 1/6

5/12 1/3 1/4

3/8 11/24

1/6 . . .

(49)

Power Iteration Example

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0

y a m

y

a = m

1/3 1/3 1/3

1/3 1/2 1/6

5/12 1/3 1/4

3/8 11/24 1/6

2/5 2/5 1/5 . . .

(50)

Random Walk Interpretation

(51)

Random Walk Interpretation

 Imagine a random web surfer

 At any time t, surfer is on some page P

 At time t+1, the surfer follows an outlink from P uniformly at random

 Ends up on some page Q linked from P

 Process repeats indefinitely

(52)

Random Walk Interpretation

 Imagine a random web surfer

 At any time t, surfer is on some page P

 At time t+1, the surfer follows an outlink from P uniformly at random

 Ends up on some page Q linked from P

 Process repeats indefinitely

 Let p(t) be a vector whose i

th

component is the probability that the surfer is at page i at time t

p(t) is a probability distribution on pages

(53)

The stationary distribution

(54)

The stationary distribution

 Where is the surfer at time t+1?

 Follows a link uniformly at random

p(t+1) = Mp(t)

(55)

The stationary distribution

 Where is the surfer at time t+1?

 Follows a link uniformly at random

p(t+1) = Mp(t)

 Suppose the random walk reaches a state such that p(t+1) = Mp(t) = p(t)

 Then p(t) is called a stationary distribution for the random walk

(56)

The stationary distribution

 Where is the surfer at time t+1?

 Follows a link uniformly at random

p(t+1) = Mp(t)

 Suppose the random walk reaches a state such that p(t+1) = Mp(t) = p(t)

 Then p(t) is called a stationary distribution for the random walk

 Our rank vector r satisfies r = Mr

 So it is a stationary distribution for the random surfer

(57)

Existence and Uniqueness

A central result from the theory of random walks (aka Markov processes):

For graphs that satisfy certain

conditions, the stationary distribution is unique and eventually will be reached no matter what the initial probability

distribution at time t = 0.

(58)

Spider traps

(59)

Spider traps

 A group of pages is a spider trap if there are no links from within the group to

outside the group

 Random surfer gets trapped

(60)

Spider traps

 A group of pages is a spider trap if there are no links from within the group to

outside the group

 Random surfer gets trapped

 Spider traps violate the conditions

needed for the random walk theorem

(61)

Microsoft becomes a spider trap

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1

y a m

y

a = m

1/3 1/3 1/3

(62)

Microsoft becomes a spider trap

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1

y a m

y

a = m

1/3 1/3 1/3

1/3 1/6 1/2

(63)

Microsoft becomes a spider trap

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1

y a m

y

a = m

1/3 1/3 1/3

1/3 1/6 1/2

1/4 1/6 7/12

(64)

Microsoft becomes a spider trap

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1

y a m

y

a = m

1/3 1/3 1/3

1/3 1/6 1/2

1/4 1/6 7/12

5/24 1/8 2/3

(65)

Microsoft becomes a spider trap

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1

y a m

y

a = m

1/3 1/3 1/3

1/3 1/6 1/2

1/4 1/6 7/12

5/24 1/8

2/3 . . .

(66)

Microsoft becomes a spider trap

Yahoo

M’soft Amazon

y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1

y a m

y

a = m

1/3 1/3 1/3

1/3 1/6 1/2

1/4 1/6 7/12

5/24 1/8 2/3

0 0 1 . . .

(67)

Random teleports

(68)

Random teleports

 The Google solution for spider traps

(69)

Random teleports

 The Google solution for spider traps

 At each time step, the random surfer has two options:

 With probability β, follow a link at random

 With probability 1-β, jump to some page uniformly at random

 Common values for β are in the range 0.8 to 0.9

(70)

Random teleports

 The Google solution for spider traps

 At each time step, the random surfer has two options:

 With probability β, follow a link at random

 With probability 1-β, jump to some page uniformly at random

 Common values for β are in the range 0.8 to 0.9

 Surfer will teleport out of spider trap

within a few time steps

(71)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

(72)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

(73)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

(74)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2

1/2

(75)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

0.8*1/2

0.8*1/2

(76)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

0.8*1/2

0.8*1/2

0.2*1/3

0.2*1/3 0.2*1/3

(77)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

0.8*1/2

0.8*1/2

0.2*1/3

0.2*1/3 0.2*1/3

y 1/2 a 1/2 m 0

y

1/2 1/2 0

y 0.8*

1/3 1/3 1/3 y + 0.2*

(78)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

0.8*1/2

0.8*1/2

0.2*1/3

0.2*1/3 0.2*1/3

y 1/2 a 1/2 m 0

y

1/2 1/2 0

y 0.8*

1/3 1/3 1/3 y + 0.2*

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3

0.8 + 0.2

(79)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

0.8*1/2

0.8*1/2

0.2*1/3

0.2*1/3 0.2*1/3

y 1/2 a 1/2 m 0

y

1/2 1/2 0

y 0.8*

1/3 1/3 1/3 y + 0.2*

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

(80)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

y

a = m

1/3 1/3 1/3

(81)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

y

a = m

1/3 1/3 1/3

1/3 0.20 0.47

(82)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

y

a = m

1/3 1/3 1/3

1/3 0.20 0.47

0.27 0.20 0.52

(83)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

y

a = m

1/3 1/3 1/3

1/3 0.20 0.47

0.27 0.20 0.52

0.258 0.178 0.562

(84)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

y

a = m

1/3 1/3 1/3

1/3 0.20 0.47

0.27 0.20 0.52

0.258 0.178

0.562 . . .

(85)

Random teleports ( β = 0.8 )

Yahoo

M’soft Amazon

1/2 1/2 0 1/2 0 0 0 1/2 1

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 13/15

0.8 + 0.2

y

a = m

1/3 1/3 1/3

1/3 0.20 0.47

0.27 0.20 0.52

0.258 0.178 0.562

7/33 5/33 21/33 . . .

(86)

Page Rank

(87)

Page Rank

 Construct the NxN matrix A as follows

 Aij = βMij + (1-β)/N

(88)

Page Rank

 Construct the NxN matrix A as follows

 Aij = βMij + (1-β)/N

 Verify that A is a stochastic matrix

(89)

Page Rank

 Construct the NxN matrix A as follows

 Aij = βMij + (1-β)/N

 Verify that A is a stochastic matrix

 The page rank vector r is the principal eigenvector of this matrix

 satisfying r = Ar

(90)

Page Rank

 Construct the NxN matrix A as follows

 Aij = βMij + (1-β)/N

 Verify that A is a stochastic matrix

 The page rank vector r is the principal eigenvector of this matrix

 satisfying r = Ar

 Equivalently, r is the stationary

distribution of the random walk with

teleports

(91)

Dead ends

 The description of the PageRank

algorithm is essentially complete. Minor problem with “dead ends”.

 Pages with no outlinks are “dead ends”

for the random surfer -> Nowhere to go in the next step.

 Our algorithm so far is not well-defined

when the number of successors k=0 (we

would have 1/0!).

(92)

Microsoft becomes a dead end

Yahoo

M’soft Amazon

y

a = m

1/3 1/3 1/3

1/2 1/2 0 1/2 0 0 0 1/2 0

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 1/15

0.8 + 0.2

(93)

Microsoft becomes a dead end

Yahoo

M’soft Amazon

y

a = m

1/3 1/3 1/3

1/2 1/2 0 1/2 0 0 0 1/2 0

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 1/15

0.8 + 0.2

Non-

stochastic!

(94)

Microsoft becomes a dead end

Yahoo

M’soft Amazon

y

a = m

1/3 1/3

1/3 . . .

1/2 1/2 0 1/2 0 0 0 1/2 0

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 1/15

0.8 + 0.2

Non-

stochastic!

(95)

Microsoft becomes a dead end

Yahoo

M’soft Amazon

y

a = m

1/3 1/3 1/3

0 0 0 . . .

1/2 1/2 0 1/2 0 0 0 1/2 0

1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15

a 7/15 1/15 1/15 m 1/15 7/15 1/15

0.8 + 0.2

Non-

stochastic!

(96)

Dealing with dead-ends

(97)

Dealing with dead-ends

 Teleport

 Follow random teleport links with probability 1.0 from dead-ends

 Adjust matrix accordingly

(98)

Dealing with dead-ends

 Teleport

 Follow random teleport links with probability 1.0 from dead-ends

 Adjust matrix accordingly

 More efficient: prune and propagate

 Preprocess the graph to eliminate dead-ends

 Might require multiple passes

 Compute page rank on reduced graph

 Approximate values for deadends by propagating values from reduced graph

(99)

Efficiency issues

(100)

Efficiency issues

 Key step is matrix-vector multiplication

rnew = Arold

(101)

Efficiency issues

 Key step is matrix-vector multiplication

rnew = Arold

 Easy if we have enough main memory to

hold A, r

old

, r

new

(102)

Efficiency issues

 Key step is matrix-vector multiplication

rnew = Arold

 Easy if we have enough main memory to hold A, r

old

, r

new

 Say N = 1 billion pages

 Matrix A has N2 entries

1018 is a large number!

(103)

Rearranging the equation

(104)

Rearranging the equation

r = Ar, where

(105)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N

(106)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N

r

i

= ∑

1≤j≤N

A

ij

r

j

(107)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N r

i

= ∑

1≤j≤N

A

ij

r

j

r

i

= ∑

1≤j≤N

[ β M

ij

+ (1- β )/N] r

j

(108)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N r

i

= ∑

1≤j≤N

A

ij

r

j

r

i

= ∑

1≤j≤N

[ β M

ij

+ (1- β )/N] r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N ∑

1≤j≤N

r

j

(109)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N r

i

= ∑

1≤j≤N

A

ij

r

j

r

i

= ∑

1≤j≤N

[ β M

ij

+ (1- β )/N] r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N ∑

1≤j≤N

r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N, since |r| = 1

(110)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N r

i

= ∑

1≤j≤N

A

ij

r

j

r

i

= ∑

1≤j≤N

[ β M

ij

+ (1- β )/N] r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N ∑

1≤j≤N

r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N, since |r| = 1

r = β Mr + [(1- β )/N]

N

(111)

Rearranging the equation

r = Ar, where

A

ij

= β M

ij

+ (1- β )/N r

i

= ∑

1≤j≤N

A

ij

r

j

r

i

= ∑

1≤j≤N

[ β M

ij

+ (1- β )/N] r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N ∑

1≤j≤N

r

j

= β ∑

1≤j≤N

M

ij

r

j

+ (1- β )/N, since |r| = 1 r = β Mr + [(1- β )/N]

N

where [x]N is a vector with N entries equal to x

(112)

Sparse matrix formulation

(113)

Sparse matrix formulation

 We can rearrange the page rank equation:

r = βMr + [(1-β)/N]N

[(1-β)/N]N is an N-vector with all entries (1-β)/N

(114)

Sparse matrix formulation

 We can rearrange the page rank equation:

r = βMr + [(1-β)/N]N

[(1-β)/N]N is an N-vector with all entries (1-β)/N

M is a sparse matrix!

10 links per node, approx 10N entries

(115)

Sparse matrix formulation

 We can rearrange the page rank equation:

r = βMr + [(1-β)/N]N

[(1-β)/N]N is an N-vector with all entries (1-β)/N

M is a sparse matrix!

10 links per node, approx 10N entries

 So in each iteration, we need to:

Compute rnew = βMrold

Add a constant value (1-β)/N to each entry in rnew

(116)

Sparse matrix encoding

 Encode sparse matrix using only nonzero entries

 Space proportional roughly to number of links

 say 10N, or 4*10*1 billion = 40GB

 still won’t fit in memory, but will fit on disk

0 1

0 5

2 17

source node destination node

(117)

PageRank: summary

(118)

PageRank: summary

 Remove iteratively dead ends from G

(119)

PageRank: summary

 Remove iteratively dead ends from G

 Build stochastic matrix M

G

(M for short)

(120)

PageRank: summary

 Remove iteratively dead ends from G

 Build stochastic matrix M

G

(M for short)

 Initialize: r

0

= [1/N,….,1/N]

T

(121)

PageRank: summary

 Remove iteratively dead ends from G

 Build stochastic matrix M

G

(M for short)

 Initialize: r

0

= [1/N,….,1/N]

T

 Iterate:

r

k+1

= β Mr

k

+ [(1- β )/N]

N

 Stop when |rk+1 - rk|1 < ε

Références

Documents relatifs

In this paper, we present a metaquery language based on second-order Description Logics but implementable with standard tech- nologies underlying the Web of Data, and briefly

The application combines audio recordings with data from several Semantic Web resources, including Live Music Archive Linked Data 7 and DB- pedia 8 , for information about

Like Rhapsody and Amazon, web services and web databases with structured I/O channels are called by web components?. 3 User Interface Component and

In Section 5, we perform step (B) of the proof by proving the anisotropic local law in Theorem 2.18 using the entrywise local law proved in Section 4.. Finally in Section 6 we

Opportunities for Market Insights.. 70 / 74 Télécom ParisTech Pierre Senellart.

We describe the space of ends of a unimodular random manifold, a random object that generalizes finite volume manifolds, random leaves of measured foliations and invariant

We already established that it is not possible to build a prob- ability measure on the sample space such that both random variables X and W have a uniform distribution?. Is it

Experimental analysis on ten local similarity indices and four disparate real networks reveals a surprising result that the Leicht–Holme–Newman (LHN) index [15] performs the