• Aucun résultat trouvé

Multi-cultural Wikipedia mining of geopolitics interactions leveraging reduced Google matrix analysis

N/A
N/A
Protected

Academic year: 2021

Partager "Multi-cultural Wikipedia mining of geopolitics interactions leveraging reduced Google matrix analysis"

Copied!
10
0
0

Texte intégral

(1)

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

Any correspondence concerning this service should be sent

to the repository administrator:

tech-oatao@listes-diff.inp-toulouse.fr

This is an author’s version published in:

http://oatao.univ-toulouse.fr/2

2156

To cite this version:

Frahm, Klaus and El Zant, Samer

and Jaffrès-Runser, Katia

and

Shepelyansky, Dima Multi-cultural Wikipedia mining of geopolitics interactions

leveraging reduced Google matrix analysis.

(2017) Physics Letters A, 381 (33).

2677-2685. ISSN 0375-9601.

Official URL:

https://doi.org/10.1016/j.physleta.2017.06.021

(2)

Multi-cultural

Wikipedia

mining

of

geopolitics

interactions

leveraging

reduced

Google

matrix

analysis

Klaus

M. Frahm

a

,

Samer El

Zant

b

,

Katia Jaffrès-Runser

b

,

Dima

L. Shepelyansky

a

,

aLaboratoire de Physique Théorique du CNRS, IRSAMC, Université de Toulouse, UPS, F-31062 Toulouse, France bInstitut de Recherche en Informatique de Toulouse, Université de Toulouse, INPT, 31061 Toulouse, France

a

b

s

t

r

a

c

t

Keywords: Markovchains Googlematrix Wikipedianetwork

Geopoliticalinteractionsofcountries

Geopoliticsfocusesonpoliticalpowerinrelationtogeographicspace.Interactionsamongworldcountries havebeenwidelystudiedatvariousscales,observingeconomicexchanges,worldhistoryorinternational politics amongothers.Thiswork exhibits the potential ofWikipediamining for suchstudies.Indeed, Wikipediastoresvaluablefine-graineddependenciesamongcountriesbylinkingwebpagestogetherfor diversetypesofinteractions(notonlyrelatedtoeconomical,politicalorhistoricalfacts).Wemineherein the Wikipedianetworks of severallanguage editions usingthe recently proposed methodofreduced Googlematrixanalysis.Thisapproachallowstoestablishdirectand hiddenlinksbetweenasubsetof nodes that belongtoamuch largerdirected network.Ourstudy concentrateson40 major countries chosenworldwide. Ouraimisto offeramulticultural perspectiveontheir interactionsby comparing networks extracted fromfivedifferent Wikipedialanguage editions,emphasizing English,Russianand Arabicones.We demonstratethatthisapproachallowstorecovermeaningfuldirectand hiddenlinks amongthe40countriesofinterest.

1. Introduction

Political and economic interactions between regions of the worldhavealwaysbeenofutmostinteresttomeasureandpredict theirrelativeinfluence.Suchstudiesbelongtothefieldof geopoli-ticsthatfocusesonpoliticalpowerinrelationtogeographicspace. Interactionsamong world countries havebeen widely studied at variousscales(worldwide,continentalorregional)usingdifferent types of information. Studies are driven by observing economic exchanges,socialchanges,history,internationalpoliticsand diplo-macy among others [1,2]. The major finding of this paper is to showthatmeaningfulworldwideinteractionscanbeautomatically extractedfromtheglobalandfreeonlineEncyclopaediaWikipedia

[3]for a given set of countries. All information gatheredin this collaborativeknowledgebasecanbeleveragedtoprovideapicture ofcountriesrelationships,fosteringanewframeworkforthorough geopoliticsstudies.

Wikipedia has become the largest open source of knowledge beingclosetoEncyclopaediaBritannica [4]by theaccuracy ofits

*

Correspondingauthor.

E-mail addresses:frahm@irsamc.ups-tlse.fr(K.M. Frahm),

scientific entries [5] andovercoming the latter by the enormous quantityofavailableinformation.Adetailedanalysisofstrongand weak features of Wikipedia is given at [6,7]. Wikipedia articles make citations to each other,providing a direct relationship be-tweenwebpagesandtopics.Assuch,Wikipediageneratesalarger directednetwork ofarticletitleswitharatherclearmeaning.For thesereasons, it is interesting to apply algorithms developed for searchenginesofWorldWideWeb(WWW)suchasthePageRank algorithm[8](seealso[9]), toanalyzetherankingpropertiesand relationsbetweenWikipediaarticles.Forvariouslanguageeditions of Wikipediait was shownthat the PageRank vector produces a reliable ranking of historical figures over 35 centuries of human history[10–14]andasolidWikipediarankingofworlduniversities (WRWU)[10,15].IthasbeenshownthattheWikipediarankingof historicalfiguresisinagoodagreementwiththewell-knownHart ranking [16], while the WRWUis in a goodagreement with the ShanghaiAcademicrankingofworlduniversities[17].

Atpresentdirected networksofrealsystemscanbeverylarge (about 4

.

2 million articles for the English Wikipedia edition in 2013[13] or3

.

5 billionweb pages(calledalso nodes)fora pub-liclyaccessiblewebcrawlthatwasgatheredbytheCommonCrawl Foundation in2012 [18]). Forsome studies, one might be inter-estedonlyintheparticularinteractionsbetweenaverysmall sub-set of nodes compared to the full network size. For instance, in

samer.elzant@enseeiht.fr (S. ElZant),kjr@enseeiht.fr (K. Jaffrès-Runser),

(3)

Fig. 1. Geographical distributionofthe 40selected countries.Color codegroups countriesinto7sets:orange(OC)forEnglishspeakingcountries,blue(BC)for for-merSovietunionones,red(RC)forEuropeanones,green(GC)forLatinAmerican ones,yellow(YC)forMiddleEasternones,purple(PUC)forNorth-EastAsianones andfinallypink(PIC)forSouth-Easterncountries(seecolorsandcountrynamesin

Table 1;othercountriesareshowninblack).(Forinterpretationofthereferencesto colorinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

thispaper, we are interested in capturingthe interactions ofthe 40 countries represented in Fig. 1 using the networks extracted fromfiveWikipedia language editions covering afew millions of articleseach.However,letusassumethatthereisarather impor-tantperson(having hisownWikipediaarticlecorresponding toa nodeC )whowasbornincountry A andworkedthemainpartof hislife incountry B;therefore A and B mayhave linkstoC (in

eitherdirection)andthus theremaybe an indirectlink between thetwonodesA and B viathenode C (orothernodes).In previ-ousworks, a solutionto thisgeneralproblemhas beenproposed in[19,20]by definingthereducedGooglematrixtheory.Main el-ementsofReducedGooglematrix GR willbepresentednext, but

inafewwords,itcapturesina40-by-40Perron–Frobeniusmatrix thefullcontribution ofdirect andindirectinteractions happening in the full Google matrix between the 40 nodes of interest (we tooktop 40countriesofPageRankvectorofEnWiki). Elementsof reducedmatrixGR

(

i

,

j

)

canbeinterpretedastheprobabilityfora

randomsurferstarting atwebpage j toarriveinwebpage i using

directandindirectinteractions.Indirectinteractionsrefertopaths composed inpart ofwebpages differentfromthe 40 onesof in-terest.EvenmoreinterestinganduniquetoreducedGooglematrix theory,we showherethat intermediate computation stepsof GR

offeradecompositionof GRintomatricesthat clearlydistinguish

directfromindirectinteractions. As such,it ispossible toextract the probability for an indirect interaction betweentwo nodes to happen.

ReducedGooglematrixtheoryisaperfectcandidatefor analyz-ingthedirectandindirectinteractionsbetweencountriesselected worldwide. Inthispaper,we extractfrom GR andits

decomposi-tion intodirect and indirectmatrices ofselected subset network of Nr

=

40 countries. The Google matrix of this subset network

of Nr nodes is computed taking into account direct and hidden

(i.e. indirect) directed links. More specifically, we deduce a fine-grainedclassification ofcountriesthat captures what we callthe

hiddenfriends andhiddenfollowers ofagivencountry.Thestructure of these graphs provides relevant social information: communi-ties of countries withstrong tiescan be clearly exhibited while countriesactingasbridges arepresentaswell.Thisismainlythe case for the hidden interactions networks of friends (or follow-ers)that offer newinformationcompared to thedirect networks offriends(orfollowers)whosetopologyismainlyenforcedbytop PageRank countries. The mathematical procedure of the reduced

Table 1

Listofnamesof40selectedcountrieswithPageRank

K ,

CheiRank

K

∗forEnWiki, ArWikiandRuWiki,orderedbyincreasingPageRankofEnWikiedition.Fig. 1gives colorcorrespondencedetails.AWikipediaarticlewithcountrynamerepresentsone nodeofthewholenetworkwith

N nodes,

e.g.https://en.wikipedia.org/wiki/France

for“France”inEnWiki.(Forinterpretationofthereferencestocolorinthistable, thereaderisreferredtothewebversionofthisarticle.)

Country EnWiki ArWiki RuWiki

K KK KK K

United States (US) OC 1 9 1 5 2 27

France (FR) RC 2 19 3 31 3 14

United Kingdom (UK) OC 3 25 6 20 7 3

Germany (DE) RC 4 33 8 14 4 24

Canada (CA) OC 5 26 13 19 12 26

India (IN) PIC 6 23 9 25 13 8

Australia (AU) OC 7 35 16 22 18 12 Italy (IT) RC 8 15 5 1 6 32 Japan (JP) PUC 9 4 11 9 11 7 China (CN) PUC 10 8 12 17 9 21 Russia (RU) BC 11 6 7 2 1 2 Spain (ES) RC 12 30 4 8 8 15 Poland (PL) RC 13 12 26 32 10 17 The Netherlands (NL) RC 14 37 18 33 15 31 Iran (IR) YC 15 2 14 15 30 22 Brazil (BR) GC 16 3 21 26 20 1 Sweden (SE) RC 17 22 22 7 19 5 New Zealand (NZ) OC 18 28 34 24 36 4 Mexico (MX) GC 19 40 23 38 22 37 Switzerland (CH) RC 20 38 20 34 16 18 Norway (NO) RC 21 32 35 16 27 11 Romania (RO) RC 22 10 19 6 32 36 Turkey (TR) YC 23 7 15 13 21 38

South Africa (ZA) OC 24 24 29 39 35 20

Belgium (BE) RC 25 18 27 37 29 30 Austria (AT) RC 26 39 28 28 14 28 Greece (GR) RC 27 21 10 36 25 25 Argentina (AR) GC 28 1 32 29 33 23 Philippines (PH) PIC 29 17 36 21 39 33 Portugal (PT) RC 30 36 24 12 17 9 Pakistan (PK) PUC 31 5 25 35 37 29 Denmark (DK) RC 32 16 33 10 31 19 Israel (IL) YC 33 20 17 18 28 6 Finland (FI) RC 34 14 37 4 26 16 Egypt (EG) YC 35 31 2 3 24 39 Indonesia (ID) PIC 36 13 31 11 34 10

Hungary (HU) RC 37 11 40 40 23 40

Taiwan (TW) PUC 38 27 39 27 40 34

South Korea (KR) PUC 39 34 38 30 38 35

Ukraine (UA) BC 40 29 30 23 5 13

GR matrixconstructionfor Nr nodesisdescribed indetailin

Sec-tion2.

The networksof GR directandhiddeninteractions canbe

cal-culated for different Wikipedia language editions. In this paper, reduced Google matrix analysisis applied to the same setof 40 countries on networks representing five different Wikipedia edi-tions:English(EnWiki),Arabic (ArWiki),Russian(RuWiki), French (FrWiki) andGerman(DeWiki) editions. Wetake for analysisthe top 40 countries according to the EnWiki PageRank. Wikipedia languageeditionsareusuallymodifiedbyauthorswhomainly be-long to the regionassociated withthis language.Thus ourstudy showstheimpactofthisculturalbiaswhencomparingdirectand hidden networks of friends (or followers) among different lan-guage editions. We show that part ofthe interactions are cross-cultural whileothersare clearly biasedby the culture ofthe au-thors.

InSection2weintroducethemainelementsofreducedGoogle matrix theory,Section 3describes GR calculated for40countries

andforfivedifferentWikipediaeditions.Specificemphasisisgiven totheverydifferentEnglish,ArabicandRussianeditions.Networks offriendsandfollowersfordirectandhiddeninteractionmatrices arecreatedanddiscussedinSection4,andconclusionisdrawnin Section5.

(4)

Fig. 2. Position of countries in the local(K,K)plane of the reduced network of 40 countries in the EnWiki (left), ArWiki (middle) and RuWiki (right) networks.

2. ReducedGooglematrixtheory

2.1.Googlematrix

Itis convenient to describe thenetwork of N Wikipedia arti-cles: a network node is givenby the article name,all nodes are numbered,somenodescorrespondtoarticleswithcountrynames, seeTable 1;thenumberofcountriesismuchsmallerthanthe to-talnumberofarticles N. ThentheGoogle matrixGisconstructed fromtheadjacencymatrix Ai j withelements 1 ifarticle(node) j

pointstoarticle(node)i andzerootherwise.Inthiscase,elements oftheGooglematrixtakethestandardform[8,9]

Gi j

=

αS

i j

+ (

1

α

)/

N

,

(1)

where S isthe matrixofMarkovtransitionswithelements Si j

=

Ai j

/

kout

(

j

)

, kout

(

j

)

=



N

i=1Ai j

=

0 being the node j out-degree

(number ofoutgoing links) and with Si j

=

1

/

N if j has no

out-goinglinks(danglingnode).Here0

<

α

<

1 isthedampingfactor whichfora random surferdetermines the probability

(

1

α

)

to jumptoanynode;belowwe use

α

=

0

.

85.Therighteigenvectors

ψ

i

(

j

)

ofG aredefinedby:



j

Gj j

ψ

i

(

j

)

= λ

i

ψ

i

(

j

) .

(2)

The PageRank eigenvector P

(

j

)

= ψ

i=0

(

j

)

corresponds to the

largest eigenvalue

λ

i=0

=

1 [8,9]. It has positive elements which

give the probability to find a random surfer on a given node in thestationarylongtimelimitoftheMarkovprocess.Allnodescan beordered by a monotonicallydecreasing probability P

(

Ki

)

with

thehighestprobabilityat K

=

1.Theindex K isthePageRank in-dex.ThePageRankvectoriscomputedbythePageRankalgorithm withiterativemultiplication ofinitialrandomvector by G matrix [8,9,14]. Left eigenvectors are biorthogonal to right eigenvectors ofdifferent eigenvalues. The left eigenvector for

λ

=

1 has iden-tical(unit) entriesduetothecolumnsumnormalizationofG.In thefollowingwe usethe notations

ψ

LT and

ψ

R forleft andright

eigenvectors,respectively. Notation T stands forvector ormatrix transposition.

Inaddition tothe matrix G itis usefultointroduce a Google matrix G∗ constructed from the adjacency matrix of the same network but with inverted direction of all links [21]. The vec-tor P

(

K

)

is called the CheiRank vector [10,21] and the index numbering nodes in order of monotonic decrease of probability

P∗ is notedasCheiRankindex K∗.Thus, nodeswithmany ingo-ing (or outgoing) links have small values of K

=

1

,

2

,

3

...

(or of

K

=

1

,

2

,

3

,

...

)[9,14].Weshowthedistributionofselected coun-triesonthePageRank–ChiRankplane

(

K

,

K

)

inFig. 2.

2.2. ReducedGooglematrix

We construct the reduced Google matrix fora certain subset of Nr selectednodes fromthe globalWikipedianetwork with N

nodes(Nr



N).Asasubsetwe choosetop40countrieswiththe

largestPageRankprobabilitiesforEnWikinetwork(thenamesare given in Table 1). The reduced Google matrix GR is constructed

onthe mathematicalbasisdescribed below.The mainelement of thisconstructionistokeepthesamePageRankprobabilitiesofNr

nodesasintheglobalnetwork(uptoafixedmultipliercoefficient) andtotakeintoaccountallindirectlinksbetweenNr nodes

cou-pled by transitions via N

Nr nodesof the globalnetwork (see

also[20]).

Let G be a typical Google matrix (1) for a network with

N nodes such that Gi j

0 and the column sum normalization



N

i=1Gi j

=

1 isverified.We considerasub-network withNr

<

N

nodes,called“reducednetwork”.Inthiscasewecanwrite G ina blockform: G

=



Grr Grs Gsr Gss



(3)

where the index“r” refers to the nodes ofthe reduced network and“s”totheotherNs

=

N

Nrnodeswhichforma

complemen-tarynetworkwhichwewillcallthe“scatteringnetwork”.ThusGrr

is given by the direct links between selected Nr nodes, Gss

de-scribes links between all other N

Nr nodes, Grs and Gsr give

linksbetweenthesetwoparts.PageRankvectorofthefullnetwork isgivenby: P

=



Pr Ps



(4)

which satisfiesthe equation GP

=

P orin other words P is the right eigenvector of G for the unit eigenvalue. This eigenvalue equationreadsinblocknotations:

(

1

Grr

)

Pr

GrsPs

=

0

,

(5)

GsrPr

+ (

1

Gss

)

Ps

=

0

.

(6)

Here1 istheunitmatrixofcorrespondingsizeNrorNs.Assuming

thatthe matrix1

Gss isnotsingular,i.e.all eigenvalues Gssare

strictlysmallerthanunity(inmodulus),weobtainfrom(6)that

(5)

whichgivestogetherwith(5):

GRPr

=

Pr

,

GR

=

Grr

+

Grs

(

1

Gss

)

−1Gsr (8)

wherethematrixGRofsize Nr

×

Nr,definedforthereduced

net-work,can beviewedasan effectivereducedGooglematrix.Here the contribution of Grr accounts for direct links in the reduced

network and the second matrix inverse term corresponds to all contributions of indirectlinks ofarbitrary order. The matrix ele-mentsof GRare non-negativesincethematrixinversein(8)can

beexpandedas:

(

1

Gss

)

−1

=



l=0 Gssl

.

(9)

In(9) theinteger l represents theorder ofindirectlinks, i.e. the numberofindirectlinkswhichareusedtoconnect indirectlytwo nodesofthereducednetwork.Wereferthe readerto[20] toget the proof that GR also fulfillsthe condition ofcolumn sum

nor-malizationbeingunity.

2.3. NumericalevaluationofGR

Wecanquestionhowtoevaluatepracticallytheexpression(8)

ofGR foraparticular sparseandquitelarge networkwhen Nr

102–103 issmall compared to N and N

s

N

Nr.If Ns is too

large(e.g.Ns

>

105)adirectnaiveevaluationofthematrixinverse

(

1

Gss

)

−1 in(8)by Gaussalgorithmisnot efficient.Inthiscase

wecantrytheexpansion(9)provideditconvergessufficientlyfast withamodestnumberofterms.However, thisismostlikelynot thecasefortypicalapplicationssince Gss isverylikelytohaveat

leastoneeigenvalueveryclosetounity.

Therefore,weconsiderthesituationwherethefullGoogle ma-trixhas a well definedgap betweenthe leading unit eigenvalue andthesecondlargesteigenvalue (inmodulus). Forexampleif G

isdefinedusingadampingfactor

α

inthestandardway,asin(1), the gap is at least 1

α

which is 0

.

15 for the standard choice

α

=

0

.

85 [9]. In order to evaluate the expansion (9) efficiently, we need to take out analytically the contribution of the leading eigenvalueofGss closetounitywhichisresponsiblefortheslow

convergence.

Below we denote by

λ

c this leading eigenvalue of Gss and

by

ψ

R (

ψ

LT) the corresponding right (left) eigenvector such that

Gss

ψ

R

= λ

c

ψ

R (or

ψ

LTGss

= λ

c

ψ

LT).Bothleftandrighteigenvectors

aswell as

λ

c can be efficientlycomputedby the poweriteration

methodinasimilarwayasthestandardPageRankmethod.Vectors

ψ

R are normalized with ETs

ψ

R

=

1 and

ψ

L with

ψ

LT

ψ

R

=

1. It is

wellknown(andeasytoshow)that

ψ

LT isorthogonaltoallother righteigenvectors(and

ψ

R isorthogonaltoallotherleft

eigenvec-tors)ofGss witheigenvalues differentfrom

λ

c. Weintroducethe

operator

P

c

= ψ

R

ψ

LT whichistheprojectorontotheeigenspaceof

λ

c andwe denote by

Q

c

=

1

P

c the complementary projector.

One verifiesdirectly that both projectorscommute withthe ma-trixGssandinparticular

P

cGss

=

Gss

P

c

= λ

c

P

c.Thereforewecan

derive:

(

1

Gss

)

−1

=

P

c 1 1

− λ

c

+

Q

c



l=0

¯

Gssl (10)

withG

¯

ss

=

Q

cGss

Q

candusingthestandardidentity

P

c

Q

c

=

0 for

complementaryprojectors.Theexpansionin(10)convergesrapidly since G

¯

ssl

∼ |λ

c,2

|

l with

λ

c,2 being the second largest eigenvalue

whichissignificantlylowerthanunity.

Thecombinationof(8)and(10)providesan explicitalgorithm feasiblefora numericalimplementationformodest valuesof Nr,

large values of Ns and of course if sparse matrices G, Gss are

considered.Wereferthereaderto[20] formoreadvanced imple-mentationconsiderations.

Fig. 3. Densityplotsofmatrices

G

R(topleft),

G

pr(topright),

G

rr(bottomleft)and

Gqr(bottomright)forthereducednetworkof40countriesintheEnWikinetwork. Thenodes

N

rareorderedinlinesbyincreasingPageRankindex(lefttoright)andin columnsbyincreasingPageRankindexfromtoptobottom.Colorscalerepresents maximum valuesinred(0.15 intoppanels;0.01 inbottomleftpanel;0.03 in bottomrightpanel),intermediateingreenandminimum(approximatelyzero)in blue.(Forinterpretationofthereferencestocolorinthisfigurelegend,thereader isreferredtothewebversionofthisarticle.)

2.4. DecompositionofGR

Onthebasis ofequations(8)–(10),thereducedGoogle matrix canbepresentedasasumofthreecomponents:

GR

=

Grr

+

Gpr

+

Gqr

,

(11)

with thefirst component Grr given by directmatrix elements of

G amongtheselected Nr nodes.Thesecond projectorcomponent

Gpr isgivenby:

Gpr

=

Grs

P

cGsr

/(

1

− λ

c

),

P

c

= ψ

R

ψ

LT

.

(12)

The thirdcomponentGqr isofparticularinterest inthisstudy

asitcharacterizestheimpactofindirectorhiddenlinks.Itisgiven by: Gqr

=

Grs

[

Q

c



l=0

¯

Gssl

]

Gsr

,

Q

c

=

1

P

c

,

¯

Gss

=

Q

cGss

Q

c

.

(13)

Wedocharacterizethestrengthofthese3componentsbytheir respectiveweightsWrr,Wpr,Wqrgivenrespectivelybythesumof

allmatrixelementsofGrr,Gpr,GqrdividedbyNr.Bydefinitionwe

haveWrr

+

Wpr

+

Wqr

=

1.

Reduced Google matrix has beencomputed, together withits components Grr,Gpr and Gqr,forthe Englishlanguage editionof

Wikipedia (EnWiki) and for the NR

=

40 countries listed in Ta-ble 1. Thesecountries are the oneswithtop PageRank K in the network of EnWiki. Density plots of GR, Grr, Gpr and Gqr, are

given inFig. 3. Countriesare ordered by increasing K value. The weightofthethreematrixcomponentsof GR are Wpr

=

0

.

96120, Wqr

=

0

.

029702 and Wrr

=

0

.

009098. Predominantcomponentis

clearly Gpr butaswe willexplain next, itisnot themost

(6)

The meaning of Grr is clear as it is directly extracted from

theglobalGoogle matrixG. Itgivesthe directlinksbetweenthe selected nodes and more specifically the probability Grr

(

i

,

j

)

for

the surferto go directly from column j country to line i

coun-try.However, since eachcolumnis normalizedby thenumberof outgoinglinks, absoluteprobabilities cannotbecomparedto each otheracrosscolumns.

ThesumofGpr andGqrrepresentsthecontributionofall

indi-rectlinksthroughthescatteringmatrixGss.AsseenonFig. 3,the

projectorcomponentGpr iscomposedofnearlyidenticalcolumns.

Moreover,valuesofeachcolumnareproportionaltothePageRank ofthe countries(lines andcolumns are ordered by increasing K

values). As detailed in [20], we observe numerically that Gpr

PrETr

/(

1

− λ

c

)

,meaningthateachcolumnisclosetothe

normal-izedvector Pr

/(

1

− λ

c

)

.Assuch, Gpr transposesessentiallyin GR

thecontributionofthefirsteigenvectorofG.Wecanconcludethat eveniftheoverallcolumnsumsofGpraccountfor

95–97% ofthe

totalcolumnsumof GR, Gpr doesn’toffer innovativeinformation

comparedtoPageRankanalysis.

AwaymoreinterestingcontributionistheoneofGqr.This

ma-trixcaptureshigher-orderindirectlinksbetweentheNr nodesdue

totheirinteractionswiththeglobalnetworkenvironment.Wewill refertotheselinksashiddenlinks.Wenotethat Gqr iscomposed

oftwopartsGqr

=

Gqrd

+

Gqrndwherethefirsttermgivesonlythe

diagonalpart of the matrix Gqrd and thus represents the

proba-bilitiestostayonthesamenodeduringmultipleiterationsofG

¯

ss

in(13)whilethesecondmatrixcapturesonlynon-diagonalterms inGqrnd.Assuch,Gqrnd representsindirect(hidden)linksbetween

theNr nodesappearingviatheglobalnetwork.Wenote that

cer-tainmatrixelementsofGqrcanbenegative,whichispossibledue

tothenegativetermsin

Q

c

=

1

P

c appearingin(13).Thetotal

weightofnegativeelementsishowevermuchsmallerthanWqr(at

least6timessmallerandevennon-existinginArWiki).Ofcourse, thefullreducedGooglematrixGR hasonlypositiveorzeromatrix

elements.In thefollowing, ourstudyconcentrates mainly onthe meaningofGqrnd,emphasizingthemeaningofitslargestpositive

valuesinthestudyofSection4.

3. Matricesofworldcountries

Our study focuses on the networks representing 5 different Wikipediaeditions1 fromtheset of24analyzed in[13]: EnWiki, ArWiki, RuWiki, DeWiki and FrWiki that contain 4.212, 0.203, 0.966,1.533and1.353million ofarticleseach.

3.1.Selectedcountries

The40countrieslistedinTable 1havebeenselectedfromthe EnWiki network after computing the PageRank for the complete network.The40countrieswithlargestPageRankprobabilityhave been chosen, and ordered by a local PageRank index K varying

between1 and40.Themostinfluentialcountriesarefound tobe atthetop values K

=

1

,

2

,

...

.Inaddition we determinethelocal CheiRankindex K∗ of theselected countries usingthe CheiRank vectoroftheglobalnetwork[14,21].AtthetopofK∗wehavethe mostcommunicativecountries.Table 1lists K and K∗ forEnWiki, ArWiki and RuWiki. Not surprisingly, the order of top countries changeswithrespect totheedition(forinstance,thetopcountry forK isUSexceptforRuWikiwhosetopcountryisRussia).

Countries that belong to the same region or having a com-mon piece of history may probably exhibit stronger interactions inWikipedia. As such, we havecreated acolor codethat groups together countries that either belong to the same geographical

1 DatacollectedmidFebruary2013.

region (e.g. Europe, Latin America, Middle East, North-East Asia, South-East Asia) orshareabig partofhistory(formerUSSR; En-glish speakingcountriesthat are thelegacyofthe formerBritish Empire).ColorcodecanbeseeneitherinFig. 1orinthetextcolor ofTable 1.

Itisconvenientaswelltoplotallnodesinthe(K ,K∗)planeto highlightthe countriesthat arethe mostinfluential(K

=

1

,

2

,

...

) and the most communicative (K

=

1

,

2

,

...

) at the same time.

Fig. 2plotsall40countriesinthe(K ,K∗)planeforEnWiki,ArWiki andRuWikieditions. Thisplotisa bi-objectiveplotwhere K and K∗ are to be minimized concurrently.It is interesting to look at thesetofnon-dominatedcountrieswhichare theonessuchthat thereisnoothercountrybeatingthemforbothK and K∗.Cultural biasisobvioushereasforEnWiki,thissetiscomposedof{US,JP, IR, AR}, for ArWiki of {US, EG, IT} and for RuWiki of only Rus-sia.We notethat accordingto Fig. 2some countrieswithhigh K

value(relativefewin-degreeandlowPageRankprobability)actas importantdiffusersofcontent(lowK∗)evenforthelanguage edi-tionsbeingdifferentfromthelanguage spokeninthosecountries (BR, AR, RO). Alsocountrieswith low K (having many citations) arerelativelypoordiffusers(FR,CA).ThusFig. 2demonstratesthat differentcultures attribute differentdegree ofcountry popularity (PageRank probability and K index) anddifferent communicative degree (CheiRank probability and K∗ index). This indicates the nontrivialfeaturesofculturespropagationandinteractions.

3.2. DensityplotsofGR,GrrandGqrnd

ForthethreeEnWiki,ArWikiandRuWikieditions, Fig. 4plots the densityof matricesGR, Gqrnd and Grr.We keep for all plots

the same orderof countriesextractedfrom the EnWiki network. This is meant to highlight cultural differences among Wikipedia editions.IntheGRplots,itisclearthatthismatrixisdominatedby

theprojectorGpr contribution,whichisproportionaltotheglobal

PageRankprobabilities.InGR,notsurprisingly,theculturalbiasis

pronouncedduetoitsstrongtietoPageRank.Forinstance,Egyptis muchmoreimportantinArWikithaninothereditions,andRussia isthetopcountryinRuWiki.

The information from direct links between countries is pro-videdbyGrr.Asexpected, theper-columnnormalizationprevents

ameaningfulper-lineanalysis.ForEnWikiandArWiki,the respec-tivecolumnsofMexicoandHungaryarepredominantduetotheir little number of outgoing links. On the contrary, Gqrnd offers a

much more unified view of countriesinteractions as itseems to highlightmoregeneralinteractionthatarelessbiasedbycultural views. For instance, for these three Wikipedia editions, the hid-den links connecting Taiwan to China and Pakistan to India are reallystronginGqrnd.ThelinkconnectingUkrainetoRussiaisvery

stronginEnWikiandArWiki.ItissurprisinglyabsentfromRuWiki (or maybe thisis exactly a cultural bias we are observing since during a long time both countries were part of USSR, and there was thus no specific difference betweenthem). Other interesting hiddenlinksarehighlightedinEnWikiasNew-Zealandisdirected toAustraliaorinRuWikilinkingCanadatotheUSA.

3.3. Friendsandfollowers

In order to better capture the interactions provided by Gqrnd

and GR, we have listed for all 5 Wikipedia editions the top 4

friends andtop 4 followers of a set of 7 leading countries. One leadingcountrypergrouphasbeenselected:USforEnglish speak-ing countries, Brazil for Latin America, France for Europe, Japan forNorth-EastAsia,IndiaforSouth-EastAsia,RussiafortheSoviet block and Turkey forthe Middle-East. To pick them inside each group, we havechosen thecountry whoseworst PageRank order overall5Wikipediaeditionsisthehighest.

(7)

Fig. 4. Densityplotsofthematrixelementsof

G

R(firstline),

G

qrnd(secondline)and

G

rr(thirdline)forthereducednetworkof40countriesofEnWiki(leftcolumn),ArWiki (middlecolumn)andRuWiki(rightcolumn).ThecountrynamesaregivenontheaxesintheordernamesinTable 1,thusthenodes

N

rareorderedinlinesandcolumnsby thereferencePageRankofEnWiki.Thecolorsrepresentmaximum(redcorrespondsto:0.15,0.19,0.13intoppanelsfromlefttoright;0.01,0.03,0.012inmiddlepanelsand 0.01,0.011,0.006inbottompanelsrespectively),intermediate(green)andminimum(blueforzero)valuesforagiven matrix.(Forinterpretationofthereferencestocolor inthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

Foreachleading country,weextract fromboth matricesGqrnd

and GR the top 4 Friends (resp. Followers) ofcountry j given by

the 4 best values of the elements of column j (resp. of line j).

In other words, it corresponds to destinations of the 4 strongest outgoinglinksof j and thecountriesattheoriginofstrongest4 ingoinglinks.Asummary ofrelevantresultsisgiveninTable 2to showcross-cultureinteractions.

Looking at GR friends, top friends of leading countries are

strongly related to the top PageRank countries (we have a pre-dominance of US, France and Germany). Similarly, cross-edition followers of US are Mexico and Canada, and followers of Japan areChina, KoreaandTaiwan.Ontheopposite, higher-order inter-actions of Gqrnd are not as much influenced by PageRank. More

subtle but realistic interactions appear: Canada is always identi-fiedasahiddenfriendofUSAwhileitwas neverthecaseinGR.

Similarly, Ukraine is always tagged asa hidden friend of Russia; Italy andSpain asfriendsofFrance.Thus Gqrnd seems to

empha-size morefine-grainedregionalinteractions. Nextsectionexploits theconceptoffriendsandfollowerstocreatenewnetwork repre-sentationsderivedfromGRandGqrnd.

4. Networksof40countries

Thisstudyconcentratesagainonthesame7leadingcountries asbefore.Top4friendsandtop4followersoftheseleading coun-triesareextractedfromGRandGqrndtoplotthegraphsofFigs. 5

(8)

Table 2

Cross-editiondirectfriendsandfollowersextractedfrom

G

Rmatrix(toptable)andfrom

G

qrnd(bottomtable)forthetopcountriesofeacharea.Foreachtopcountry,welist thedirectfriends(followers)presentinthedirectfriendslistgivenbyallfiveWikipediaeditions,theonespresentin4editionsoutof5andtheonespresentin3editions outoffive.

Top country GRWiki friends present in GRWiki followers present in

all 5 editions 4 out of 5 editions 3 out of 5 editions all 5 editions 4 out of 5 editions 3 out of 5 editions

US FR DE UK MX–CA IL UK BR US–FR DE AR–PT MX FR US DE BE ES IT RU US–FR DE UA–FI PL TR US–FR DE GR–IR JP US–FR DE KR–TW–CN PH IN US–FR DE PK–ID ZA

Top country GqrWiki friends present in GqrWiki followers present in

all 5 editions 4 out of 5 editions 3 out of 5 editions all 5 editions 4 out of 5 editions 3 out of 5 editions

US CA UK–MX FR UK CA–MX

BR PT–AR ES–MX AR PT UK

FR IT–DE US–BE UK–BE–ES CH

RU UA DE CN–PL–US FI UA–PL IR

TR GR IR–RU GR–IR EG–RO

JP CN–TW–US–KR KR–CN–TW

IN PK–CN ID IR PK–ID CN

Fig. 5. Networkstructureoffriends(topline)andfollowers(bottomline)inducedbythe7topcountriesofeachgeographicalarea(US,FR,IN,JP,BR,TR,RU)in

G

R.Results areplottedforEnWiki(leftcolumn),ArWiki(middlecolumn)andRuWiki(rightcolumn).Nodecolorsrepresentgeographicappartenancetoagroupofcountries(cf.Table 1

fordetails).Top(bottom)graphs:thetopcountrynodepoints(ispointedby)withaboldblackarrowtoitstop4friends(followers).Redarrowsshowfriendsoffriends (resp.followersoffollowers)interactionscomputeduntilnonewedgesareaddedtothegraph.Drawnwith[22].(Forinterpretationofthereferencestocolorinthisfigure legend,thereaderisreferredtothewebversionofthisarticle.)

(9)

Fig. 6. SamelegendasFig. 5exceptfriendsandfollowersarecomputedfrom

G

qrnd(hiddenrelationships).(Forinterpretationofthecolorsinthisfigure,thereaderisreferred tothewebversionofthisarticle.)

and6, respectively. Notethat Fig. 6 essentially highlights hidden links.The blackthick arrowsidentifythetop4 friendsandtop4 followersinteractions.Redarrowsrepresentthe friendsoffriends (respectivelythefollowersoffollowers)interactionsthatare com-puted recursively until no new edge is added to the graph. All graphsareplottedusingaforcedirectlayout.

ThenetworksoffriendsobtainedfromGRneverexpandtothe

fullsetof40nodes.Theyonlyconcentrateonabout10countries (includingthe7leadingones).ThishappenssinceGRisdominated

bytheprojectorcomponent.Lookingatthefollowergraphs,more informationcanbeobserved.North-EastAsian,Middle-Easternand Latin American create, in all editions, a cluster ofnodes densely interconnected.EuropeancountriesencloseRussiaandUkraine as thesecountries are linked to EU countries that were part ofthe formerSovietUnionzoneofinfluence(e.g.Romania,Hungary, Fin-land,etc.).Thenetworksoffollowersendup almostspanningthe fullsetof40countries.

The networks of friends obtained from Gqrnd don’t

concen-trate to a limited set of countriesas it is the casefor GR. They

endup spanningthefull setofcountries. Thehiddenfriend links show that the interactions between the geographical groups are coherent.North-EastAsiancountriesarelinkedtoSouth-EastAsian countriesandtoEnglishspeakingcountriesinEnWikiandRuWiki. Interestingly,thesetofBalticcountries(SE,NO,DK,FI)createmost ofthetimefullmeshes,andinterconnectEuropeandRussia. Cul-tural biascan be observed aswell intheseplots. Fornon-Arabic editions,Middle-Eastern countries createa well-connectedcluster

of nodes. But for ArWiki, Turkey exhibits a stronger connection withEuropethanwiththeother Middle-Easterncountries. Inthe viewofArabiccountries,TurkeyisseenclosertoEuropethan oth-ersforsure.

5. Conclusion

Thisworkoffers anewperspectiveforfuturegeopolitics stud-ies.Itispossibletoextractfrommulti-culturalWikipedianetworks a globalunderstanding oftheinteractionsbetweencountriesata global, continental orregionalscale. Onthe basis ofthe reduced Google matrix analysisof theWikipedia network we determined the relations offriends andfollowersof countrieson purely sta-tistical mathematicalgroundsobtainedfrom thehugeknowledge accumulatedbyWikipediaeditionsindifferentlanguages.This ap-proachprovides resultsindependentofculturalbias.Thereduced Googlematrixtheoryhasbeenshowntocapturehiddenand indi-rect interactionsamongcountries,resultinginnewknowledgeon geopolitics.

Acknowledgements

This work was supported by APR 2015 call of University of Toulouse and by Région Occitanie (project GOMOBILE), CNRS MASTODONS-2016 projectAPLIGOOGLE, andEU CHIST-ERA MACACOprojectANR-13-CHR2-0002-06.

(10)

References

[1]E.Jones,TheEuropeanMiracle:Environments,EconomiesandGeopoliticsin theHistoryofEuropeandAsia,CambridgeUniversityPress,Cambridge,MA, 2003.

[2]J.Dittmer,K.Dodds,Populargeopoliticspastandfuture:fandom,identitiesand audiences,Geopolitics13(2008)3.

[3]http://www.wikipedia.org(accessedAugust2016).

[4] EncyclopaediaBritannica,http://www.britannica.com/(accessedAugust2016). [5]J.Giles,Nature438(2005)900.

[6]J.M.ReagleJr.,GoodFaithCollaboration:TheCultureofWikipedia,MITPress, Cambridge,MA,2010.

[7] F.A.Nielsen,Wikipediaresearchandtools:reviewandcomments,availableat SSRN:http://dx.doi.org/10.2139/ssrn.2129874,2012.

[8]S.Brin,L.Page,Comput.Netw.ISDNSyst.30(1998)107.

[9]A.M. Langville,C.D. Meyer, Google’s PageRank and Beyond:The Science of SearchEngineRankings,PrincetonUniversityPress,Princeton,2006.

[10]A.O.Zhirov,O.V.Zhirov,D.L.Shepelyansky,Eur.Phys.J.B77(2010)523.

[11]Y.-H.Eom,K.M.Frahm,A.Benczur,D.L.Shepelyansky,Eur.Phys.J.B86(2013) 492.

[12]Y.-H.Eom,D.L.Shepelyansky,PLoSONE8 (10)(2013)e74554.

[13]Y.-H.Eom,P.Aragon,D.Laniado,A.Kaltenbrunner,S.Vigna,D.L.Shepelyansky, PLoSONE10 (3)(2015)e0114825.

[14]L.Ermann,K.M.Frahm,D.L.Shepelyansky,Rev.Mod.Phys.87(2015)1261.

[15]J.Lages,A.Patt,D.L.Shepelyansky,Eur.Phys.J.B89(2016)69.

[16]M.H.Hart,The100:RankingoftheMostInfluentialPersonsinHistory,Citadel Press,N.Y.,1992.

[17] Academicrankingofworlduniversities,http://www.shanghairanking.com/ (ac-cessedAugust2016).

[18]R.Meusel,S.Vigna,O.Lehmberg,C.Bizer,J.WebSci.1(2015)33.

[19]K.M.Frahm,D.L.Shepelyansky,arXiv:1602.02394[physics.soc],2016.

[20]K.M.Frahm,K.Jaffrès-Runser,D.L.Shepelyansky,Eur.Phys.J.B89(2016)269.

[21]A.D.Chepelianskii,arXiv:1003.5455[cs.SE],2010.

[22]M.Bastian,S.Heymann,M.Jacomy,Gephi:anopensourcesoftwarefor explor-ingandmanipulatingnetworks,in:Proc.ofInternationalAAAIConferenceon WeblogsandSocialMedia,2009.

Figure

Fig. 1. Geographical distribution of the 40 selected countries. Color code groups countries into 7 sets: orange (OC) for English speaking countries, blue (BC) for  for-mer Soviet union ones, red (RC) for European ones, green (GC) for Latin American ones, y
Fig. 2. Position of countries in the local ( K , K ∗ ) plane of the reduced network of 40 countries in the EnWiki (left), ArWiki (middle) and RuWiki (right) networks.
Fig. 3. Density plots of matrices G R (top left), G pr (top right), G rr (bottom left) and G qr (bottom right) for the reduced network of 40 countries in the EnWiki network
Fig. 4. Density plots of the matrix elements of G R (first line), G qrnd (second line) and G rr (third line) for the reduced network of 40 countries of EnWiki (left column), ArWiki (middle column) and RuWiki (right column)
+3

Références

Documents relatifs

On average the probability given by the PageRank vector is proportional to the number of ingoing links [Langville and Meyer, 2006], this relation is established for scale-free

Dans le chapitre 4 nous nous posent sur une structure de précodage linéaire optimal récemment décrit [36, Eq (3.33)] pour proposer un schéma de précodage adapté à

Coefficients of fractional parentage are most conve- niently studied by sum-rule methods. The intermediate state expansion is usually given in terms of coupled

Pour cela, cet article se propose de décrire en logique quelques propriétés de sûreté de fonctionnement, de sécurité, ou encore de couplage, afin de diriger deux types

L’objectif de ce travail est le d´ eveloppement d’une approche analytique bas´ ee sur une th´ eorie raffin´ ee de d´ eformation par cisaillement avec l’effet d’´ etirement

We tested expression and localization of Foxp2 in HD mouse model tissue to determine whether levels of soluble Foxp2 change and whether Foxp2 co-aggregates with mHTT in vivo.. R6/2

Normalized magnetization as a function of time delay recorded with three different probe photon energies (57.4, 60.5, and 63.6 eV) for the CoPt-2 sample magnetized out-of-plane.

2) Slide the remaining four magnets and spacers into the assembly tube.. Figure 15: Assembly Rig, Third Configuration... The third configuration of the assembly rig