OATAO is an open access repository that collects the work of Toulouse
researchers and makes it freely available over the web where possible
Any correspondence concerning this service should be sent
to the repository administrator:
tech-oatao@listes-diff.inp-toulouse.fr
This is an author’s version published in:
http://oatao.univ-toulouse.fr/2
2156
To cite this version:
Frahm, Klaus and El Zant, Samer
and Jaffrès-Runser, Katia
and
Shepelyansky, Dima Multi-cultural Wikipedia mining of geopolitics interactions
leveraging reduced Google matrix analysis.
(2017) Physics Letters A, 381 (33).
2677-2685. ISSN 0375-9601.
Official URL:
https://doi.org/10.1016/j.physleta.2017.06.021
Multi-cultural
Wikipedia
mining
of
geopolitics
interactions
leveraging
reduced
matrix
analysis
Klaus
M. Frahm
a,
Samer El
Zant
b,
Katia Jaffrès-Runser
b,
Dima
L. Shepelyansky
a,
∗
aLaboratoire de Physique Théorique du CNRS, IRSAMC, Université de Toulouse, UPS, F-31062 Toulouse, France bInstitut de Recherche en Informatique de Toulouse, Université de Toulouse, INPT, 31061 Toulouse, France
a
b
s
t
r
a
c
t
Keywords: Markovchains Googlematrix Wikipedianetwork
Geopoliticalinteractionsofcountries
Geopoliticsfocusesonpoliticalpowerinrelationtogeographicspace.Interactionsamongworldcountries havebeenwidelystudiedatvariousscales,observingeconomicexchanges,worldhistoryorinternational politics amongothers.Thiswork exhibits the potential ofWikipediamining for suchstudies.Indeed, Wikipediastoresvaluablefine-graineddependenciesamongcountriesbylinkingwebpagestogetherfor diversetypesofinteractions(notonlyrelatedtoeconomical,politicalorhistoricalfacts).Wemineherein the Wikipedianetworks of severallanguage editions usingthe recently proposed methodofreduced Googlematrixanalysis.Thisapproachallowstoestablishdirectand hiddenlinksbetweenasubsetof nodes that belongtoamuch largerdirected network.Ourstudy concentrateson40 major countries chosenworldwide. Ouraimisto offeramulticultural perspectiveontheir interactionsby comparing networks extracted fromfivedifferent Wikipedialanguage editions,emphasizing English,Russianand Arabicones.We demonstratethatthisapproachallowstorecovermeaningfuldirectand hiddenlinks amongthe40countriesofinterest.
1. Introduction
Political and economic interactions between regions of the worldhavealwaysbeenofutmostinteresttomeasureandpredict theirrelativeinfluence.Suchstudiesbelongtothefieldof geopoli-ticsthatfocusesonpoliticalpowerinrelationtogeographicspace. Interactionsamong world countries havebeen widely studied at variousscales(worldwide,continentalorregional)usingdifferent types of information. Studies are driven by observing economic exchanges,socialchanges,history,internationalpoliticsand diplo-macy among others [1,2]. The major finding of this paper is to showthatmeaningfulworldwideinteractionscanbeautomatically extractedfromtheglobalandfreeonlineEncyclopaediaWikipedia
[3]for a given set of countries. All information gatheredin this collaborativeknowledgebasecanbeleveragedtoprovideapicture ofcountriesrelationships,fosteringanewframeworkforthorough geopoliticsstudies.
Wikipedia has become the largest open source of knowledge beingclosetoEncyclopaediaBritannica [4]by theaccuracy ofits
*
Correspondingauthor.E-mail addresses:frahm@irsamc.ups-tlse.fr(K.M. Frahm),
scientific entries [5] andovercoming the latter by the enormous quantityofavailableinformation.Adetailedanalysisofstrongand weak features of Wikipedia is given at [6,7]. Wikipedia articles make citations to each other,providing a direct relationship be-tweenwebpagesandtopics.Assuch,Wikipediageneratesalarger directednetwork ofarticletitleswitharatherclearmeaning.For thesereasons, it is interesting to apply algorithms developed for searchenginesofWorldWideWeb(WWW)suchasthePageRank algorithm[8](seealso[9]), toanalyzetherankingpropertiesand relationsbetweenWikipediaarticles.Forvariouslanguageeditions of Wikipediait was shownthat the PageRank vector produces a reliable ranking of historical figures over 35 centuries of human history[10–14]andasolidWikipediarankingofworlduniversities (WRWU)[10,15].IthasbeenshownthattheWikipediarankingof historicalfiguresisinagoodagreementwiththewell-knownHart ranking [16], while the WRWUis in a goodagreement with the ShanghaiAcademicrankingofworlduniversities[17].
Atpresentdirected networksofrealsystemscanbeverylarge (about 4
.
2 million articles for the English Wikipedia edition in 2013[13] or3.
5 billionweb pages(calledalso nodes)fora pub-liclyaccessiblewebcrawlthatwasgatheredbytheCommonCrawl Foundation in2012 [18]). Forsome studies, one might be inter-estedonlyintheparticularinteractionsbetweenaverysmall sub-set of nodes compared to the full network size. For instance, insamer.elzant@enseeiht.fr (S. ElZant),kjr@enseeiht.fr (K. Jaffrès-Runser),
Fig. 1. Geographical distributionofthe 40selected countries.Color codegroups countriesinto7sets:orange(OC)forEnglishspeakingcountries,blue(BC)for for-merSovietunionones,red(RC)forEuropeanones,green(GC)forLatinAmerican ones,yellow(YC)forMiddleEasternones,purple(PUC)forNorth-EastAsianones andfinallypink(PIC)forSouth-Easterncountries(seecolorsandcountrynamesin
Table 1;othercountriesareshowninblack).(Forinterpretationofthereferencesto colorinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)
thispaper, we are interested in capturingthe interactions ofthe 40 countries represented in Fig. 1 using the networks extracted fromfiveWikipedia language editions covering afew millions of articleseach.However,letusassumethatthereisarather impor-tantperson(having hisownWikipediaarticlecorresponding toa nodeC )whowasbornincountry A andworkedthemainpartof hislife incountry B;therefore A and B mayhave linkstoC (in
eitherdirection)andthus theremaybe an indirectlink between thetwonodesA and B viathenode C (orothernodes).In previ-ousworks, a solutionto thisgeneralproblemhas beenproposed in[19,20]by definingthereducedGooglematrixtheory.Main el-ementsofReducedGooglematrix GR willbepresentednext, but
inafewwords,itcapturesina40-by-40Perron–Frobeniusmatrix thefullcontribution ofdirect andindirectinteractions happening in the full Google matrix between the 40 nodes of interest (we tooktop 40countriesofPageRankvectorofEnWiki). Elementsof reducedmatrixGR
(
i,
j)
canbeinterpretedastheprobabilityforarandomsurferstarting atwebpage j toarriveinwebpage i using
directandindirectinteractions.Indirectinteractionsrefertopaths composed inpart ofwebpages differentfromthe 40 onesof in-terest.EvenmoreinterestinganduniquetoreducedGooglematrix theory,we showherethat intermediate computation stepsof GR
offeradecompositionof GRintomatricesthat clearlydistinguish
directfromindirectinteractions. As such,it ispossible toextract the probability for an indirect interaction betweentwo nodes to happen.
ReducedGooglematrixtheoryisaperfectcandidatefor analyz-ingthedirectandindirectinteractionsbetweencountriesselected worldwide. Inthispaper,we extractfrom GR andits
decomposi-tion intodirect and indirectmatrices ofselected subset network of Nr
=
40 countries. The Google matrix of this subset networkof Nr nodes is computed taking into account direct and hidden
(i.e. indirect) directed links. More specifically, we deduce a fine-grainedclassification ofcountriesthat captures what we callthe
hiddenfriends andhiddenfollowers ofagivencountry.Thestructure of these graphs provides relevant social information: communi-ties of countries withstrong tiescan be clearly exhibited while countriesactingasbridges arepresentaswell.Thisismainlythe case for the hidden interactions networks of friends (or follow-ers)that offer newinformationcompared to thedirect networks offriends(orfollowers)whosetopologyismainlyenforcedbytop PageRank countries. The mathematical procedure of the reduced
Table 1
Listofnamesof40selectedcountrieswithPageRank
K ,
CheiRankK
∗forEnWiki, ArWikiandRuWiki,orderedbyincreasingPageRankofEnWikiedition.Fig. 1gives colorcorrespondencedetails.AWikipediaarticlewithcountrynamerepresentsone nodeofthewholenetworkwithN nodes,
e.g.https://en.wikipedia.org/wiki/Francefor“France”inEnWiki.(Forinterpretationofthereferencestocolorinthistable, thereaderisreferredtothewebversionofthisarticle.)
Country EnWiki ArWiki RuWiki
K K∗ K K∗ K K∗
United States (US) OC 1 9 1 5 2 27
France (FR) RC 2 19 3 31 3 14
United Kingdom (UK) OC 3 25 6 20 7 3
Germany (DE) RC 4 33 8 14 4 24
Canada (CA) OC 5 26 13 19 12 26
India (IN) PIC 6 23 9 25 13 8
Australia (AU) OC 7 35 16 22 18 12 Italy (IT) RC 8 15 5 1 6 32 Japan (JP) PUC 9 4 11 9 11 7 China (CN) PUC 10 8 12 17 9 21 Russia (RU) BC 11 6 7 2 1 2 Spain (ES) RC 12 30 4 8 8 15 Poland (PL) RC 13 12 26 32 10 17 The Netherlands (NL) RC 14 37 18 33 15 31 Iran (IR) YC 15 2 14 15 30 22 Brazil (BR) GC 16 3 21 26 20 1 Sweden (SE) RC 17 22 22 7 19 5 New Zealand (NZ) OC 18 28 34 24 36 4 Mexico (MX) GC 19 40 23 38 22 37 Switzerland (CH) RC 20 38 20 34 16 18 Norway (NO) RC 21 32 35 16 27 11 Romania (RO) RC 22 10 19 6 32 36 Turkey (TR) YC 23 7 15 13 21 38
South Africa (ZA) OC 24 24 29 39 35 20
Belgium (BE) RC 25 18 27 37 29 30 Austria (AT) RC 26 39 28 28 14 28 Greece (GR) RC 27 21 10 36 25 25 Argentina (AR) GC 28 1 32 29 33 23 Philippines (PH) PIC 29 17 36 21 39 33 Portugal (PT) RC 30 36 24 12 17 9 Pakistan (PK) PUC 31 5 25 35 37 29 Denmark (DK) RC 32 16 33 10 31 19 Israel (IL) YC 33 20 17 18 28 6 Finland (FI) RC 34 14 37 4 26 16 Egypt (EG) YC 35 31 2 3 24 39 Indonesia (ID) PIC 36 13 31 11 34 10
Hungary (HU) RC 37 11 40 40 23 40
Taiwan (TW) PUC 38 27 39 27 40 34
South Korea (KR) PUC 39 34 38 30 38 35
Ukraine (UA) BC 40 29 30 23 5 13
GR matrixconstructionfor Nr nodesisdescribed indetailin
Sec-tion2.
The networksof GR directandhiddeninteractions canbe
cal-culated for different Wikipedia language editions. In this paper, reduced Google matrix analysisis applied to the same setof 40 countries on networks representing five different Wikipedia edi-tions:English(EnWiki),Arabic (ArWiki),Russian(RuWiki), French (FrWiki) andGerman(DeWiki) editions. Wetake for analysisthe top 40 countries according to the EnWiki PageRank. Wikipedia languageeditionsareusuallymodifiedbyauthorswhomainly be-long to the regionassociated withthis language.Thus ourstudy showstheimpactofthisculturalbiaswhencomparingdirectand hidden networks of friends (or followers) among different lan-guage editions. We show that part ofthe interactions are cross-cultural whileothersare clearly biasedby the culture ofthe au-thors.
InSection2weintroducethemainelementsofreducedGoogle matrix theory,Section 3describes GR calculated for40countries
andforfivedifferentWikipediaeditions.Specificemphasisisgiven totheverydifferentEnglish,ArabicandRussianeditions.Networks offriendsandfollowersfordirectandhiddeninteractionmatrices arecreatedanddiscussedinSection4,andconclusionisdrawnin Section5.
Fig. 2. Position of countries in the local(K,K∗)plane of the reduced network of 40 countries in the EnWiki (left), ArWiki (middle) and RuWiki (right) networks.
2. ReducedGooglematrixtheory
2.1.Googlematrix
Itis convenient to describe thenetwork of N Wikipedia arti-cles: a network node is givenby the article name,all nodes are numbered,somenodescorrespondtoarticleswithcountrynames, seeTable 1;thenumberofcountriesismuchsmallerthanthe to-talnumberofarticles N. ThentheGoogle matrixGisconstructed fromtheadjacencymatrix Ai j withelements 1 ifarticle(node) j
pointstoarticle(node)i andzerootherwise.Inthiscase,elements oftheGooglematrixtakethestandardform[8,9]
Gi j
=
αS
i j+ (
1−
α
)/
N,
(1)where S isthe matrixofMarkovtransitionswithelements Si j
=
Ai j
/
kout(
j)
, kout(
j)
=
Ni=1Ai j
=
0 being the node j out-degree(number ofoutgoing links) and with Si j
=
1/
N if j has noout-goinglinks(danglingnode).Here0
<
α
<
1 isthedampingfactor whichfora random surferdetermines the probability(
1−
α
)
to jumptoanynode;belowwe useα
=
0.
85.Therighteigenvectorsψ
i(
j)
ofG aredefinedby: jGj j
ψ
i(
j)
= λ
iψ
i(
j) .
(2)The PageRank eigenvector P
(
j)
= ψ
i=0(
j)
corresponds to thelargest eigenvalue
λ
i=0=
1 [8,9]. It has positive elements whichgive the probability to find a random surfer on a given node in thestationarylongtimelimitoftheMarkovprocess.Allnodescan beordered by a monotonicallydecreasing probability P
(
Ki)
withthehighestprobabilityat K
=
1.Theindex K isthePageRank in-dex.ThePageRankvectoriscomputedbythePageRankalgorithm withiterativemultiplication ofinitialrandomvector by G matrix [8,9,14]. Left eigenvectors are biorthogonal to right eigenvectors ofdifferent eigenvalues. The left eigenvector forλ
=
1 has iden-tical(unit) entriesduetothecolumnsumnormalizationofG.In thefollowingwe usethe notationsψ
LT andψ
R forleft andrighteigenvectors,respectively. Notation T stands forvector ormatrix transposition.
Inaddition tothe matrix G itis usefultointroduce a Google matrix G∗ constructed from the adjacency matrix of the same network but with inverted direction of all links [21]. The vec-tor P∗
(
K∗)
is called the CheiRank vector [10,21] and the index numbering nodes in order of monotonic decrease of probabilityP∗ is notedasCheiRankindex K∗.Thus, nodeswithmany ingo-ing (or outgoing) links have small values of K
=
1,
2,
3...
(or ofK∗
=
1,
2,
3,
...
)[9,14].Weshowthedistributionofselected coun-triesonthePageRank–ChiRankplane(
K,
K∗)
inFig. 2.2.2. ReducedGooglematrix
We construct the reduced Google matrix fora certain subset of Nr selectednodes fromthe globalWikipedianetwork with N
nodes(Nr
N).Asasubsetwe choosetop40countrieswiththelargestPageRankprobabilitiesforEnWikinetwork(thenamesare given in Table 1). The reduced Google matrix GR is constructed
onthe mathematicalbasisdescribed below.The mainelement of thisconstructionistokeepthesamePageRankprobabilitiesofNr
nodesasintheglobalnetwork(uptoafixedmultipliercoefficient) andtotakeintoaccountallindirectlinksbetweenNr nodes
cou-pled by transitions via N
−
Nr nodesof the globalnetwork (seealso[20]).
Let G be a typical Google matrix (1) for a network with
N nodes such that Gi j
≥
0 and the column sum normalization Ni=1Gi j
=
1 isverified.We considerasub-network withNr<
Nnodes,called“reducednetwork”.Inthiscasewecanwrite G ina blockform: G
=
Grr Grs Gsr Gss (3)where the index“r” refers to the nodes ofthe reduced network and“s”totheotherNs
=
N−
Nrnodeswhichformacomplemen-tarynetworkwhichwewillcallthe“scatteringnetwork”.ThusGrr
is given by the direct links between selected Nr nodes, Gss
de-scribes links between all other N
−
Nr nodes, Grs and Gsr givelinksbetweenthesetwoparts.PageRankvectorofthefullnetwork isgivenby: P
=
Pr Ps (4)which satisfiesthe equation GP
=
P orin other words P is the right eigenvector of G for the unit eigenvalue. This eigenvalue equationreadsinblocknotations:(
1−
Grr)
Pr−
GrsPs=
0,
(5)−
GsrPr+ (
1−
Gss)
Ps=
0.
(6)Here1 istheunitmatrixofcorrespondingsizeNrorNs.Assuming
thatthe matrix1
−
Gss isnotsingular,i.e.all eigenvalues Gssarestrictlysmallerthanunity(inmodulus),weobtainfrom(6)that
whichgivestogetherwith(5):
GRPr
=
Pr,
GR=
Grr+
Grs(
1−
Gss)
−1Gsr (8)wherethematrixGRofsize Nr
×
Nr,definedforthereducednet-work,can beviewedasan effectivereducedGooglematrix.Here the contribution of Grr accounts for direct links in the reduced
network and the second matrix inverse term corresponds to all contributions of indirectlinks ofarbitrary order. The matrix ele-mentsof GRare non-negativesincethematrixinversein(8)can
beexpandedas:
(
1−
Gss)
−1=
∞ l=0 Gssl.
(9)In(9) theinteger l represents theorder ofindirectlinks, i.e. the numberofindirectlinkswhichareusedtoconnect indirectlytwo nodesofthereducednetwork.Wereferthe readerto[20] toget the proof that GR also fulfillsthe condition ofcolumn sum
nor-malizationbeingunity.
2.3. NumericalevaluationofGR
Wecanquestionhowtoevaluatepracticallytheexpression(8)
ofGR foraparticular sparseandquitelarge networkwhen Nr
∼
102–103 issmall compared to N and N
s
≈
NNr.If Ns is toolarge(e.g.Ns
>
105)adirectnaiveevaluationofthematrixinverse(
1−
Gss)
−1 in(8)by Gaussalgorithmisnot efficient.Inthiscasewecantrytheexpansion(9)provideditconvergessufficientlyfast withamodestnumberofterms.However, thisismostlikelynot thecasefortypicalapplicationssince Gss isverylikelytohaveat
leastoneeigenvalueveryclosetounity.
Therefore,weconsiderthesituationwherethefullGoogle ma-trixhas a well definedgap betweenthe leading unit eigenvalue andthesecondlargesteigenvalue (inmodulus). Forexampleif G
isdefinedusingadampingfactor
α
inthestandardway,asin(1), the gap is at least 1−
α
which is 0.
15 for the standard choiceα
=
0.
85 [9]. In order to evaluate the expansion (9) efficiently, we need to take out analytically the contribution of the leading eigenvalueofGss closetounitywhichisresponsiblefortheslowconvergence.
Below we denote by
λ
c this leading eigenvalue of Gss andby
ψ
R (ψ
LT) the corresponding right (left) eigenvector such thatGss
ψ
R= λ
cψ
R (orψ
LTGss= λ
cψ
LT).Bothleftandrighteigenvectorsaswell as
λ
c can be efficientlycomputedby the poweriterationmethodinasimilarwayasthestandardPageRankmethod.Vectors
ψ
R are normalized with ETsψ
R=
1 andψ
L withψ
LTψ
R=
1. It iswellknown(andeasytoshow)that
ψ
LT isorthogonaltoallother righteigenvectors(andψ
R isorthogonaltoallotherlefteigenvec-tors)ofGss witheigenvalues differentfrom
λ
c. Weintroducetheoperator
P
c= ψ
Rψ
LT whichistheprojectorontotheeigenspaceofλ
c andwe denote byQ
c=
1−
P
c the complementary projector.One verifiesdirectly that both projectorscommute withthe ma-trixGssandinparticular
P
cGss=
GssP
c= λ
cP
c.Thereforewecanderive:
(
1−
Gss)
−1=
P
c 1 1− λ
c+
Q
c ∞ l=0¯
Gssl (10)withG
¯
ss=
Q
cGssQ
candusingthestandardidentityP
cQ
c=
0 forcomplementaryprojectors.Theexpansionin(10)convergesrapidly since G
¯
ssl∼ |λ
c,2|
l withλ
c,2 being the second largest eigenvaluewhichissignificantlylowerthanunity.
Thecombinationof(8)and(10)providesan explicitalgorithm feasiblefora numericalimplementationformodest valuesof Nr,
large values of Ns and of course if sparse matrices G, Gss are
considered.Wereferthereaderto[20] formoreadvanced imple-mentationconsiderations.
Fig. 3. Densityplotsofmatrices
G
R(topleft),G
pr(topright),G
rr(bottomleft)andGqr(bottomright)forthereducednetworkof40countriesintheEnWikinetwork. Thenodes
N
rareorderedinlinesbyincreasingPageRankindex(lefttoright)andin columnsbyincreasingPageRankindexfromtoptobottom.Colorscalerepresents maximum valuesinred(0.15 intoppanels;0.01 inbottomleftpanel;0.03 in bottomrightpanel),intermediateingreenandminimum(approximatelyzero)in blue.(Forinterpretationofthereferencestocolorinthisfigurelegend,thereader isreferredtothewebversionofthisarticle.)2.4. DecompositionofGR
Onthebasis ofequations(8)–(10),thereducedGoogle matrix canbepresentedasasumofthreecomponents:
GR
=
Grr+
Gpr+
Gqr,
(11)with thefirst component Grr given by directmatrix elements of
G amongtheselected Nr nodes.Thesecond projectorcomponent
Gpr isgivenby:
Gpr
=
GrsP
cGsr/(
1− λ
c),
P
c= ψ
Rψ
LT.
(12)The thirdcomponentGqr isofparticularinterest inthisstudy
asitcharacterizestheimpactofindirectorhiddenlinks.Itisgiven by: Gqr
=
Grs[
Q
c ∞ l=0¯
Gssl]
Gsr,
Q
c=
1−
P
c,
¯
Gss=
Q
cGssQ
c.
(13)Wedocharacterizethestrengthofthese3componentsbytheir respectiveweightsWrr,Wpr,Wqrgivenrespectivelybythesumof
allmatrixelementsofGrr,Gpr,GqrdividedbyNr.Bydefinitionwe
haveWrr
+
Wpr+
Wqr=
1.Reduced Google matrix has beencomputed, together withits components Grr,Gpr and Gqr,forthe Englishlanguage editionof
Wikipedia (EnWiki) and for the NR
=
40 countries listed in Ta-ble 1. Thesecountries are the oneswithtop PageRank K in the network of EnWiki. Density plots of GR, Grr, Gpr and Gqr, aregiven inFig. 3. Countriesare ordered by increasing K value. The weightofthethreematrixcomponentsof GR are Wpr
=
0.
96120, Wqr=
0.
029702 and Wrr=
0.
009098. Predominantcomponentisclearly Gpr butaswe willexplain next, itisnot themost
The meaning of Grr is clear as it is directly extracted from
theglobalGoogle matrixG. Itgivesthe directlinksbetweenthe selected nodes and more specifically the probability Grr
(
i,
j)
forthe surferto go directly from column j country to line i
coun-try.However, since eachcolumnis normalizedby thenumberof outgoinglinks, absoluteprobabilities cannotbecomparedto each otheracrosscolumns.
ThesumofGpr andGqrrepresentsthecontributionofall
indi-rectlinksthroughthescatteringmatrixGss.AsseenonFig. 3,the
projectorcomponentGpr iscomposedofnearlyidenticalcolumns.
Moreover,valuesofeachcolumnareproportionaltothePageRank ofthe countries(lines andcolumns are ordered by increasing K
values). As detailed in [20], we observe numerically that Gpr
≈
PrETr/(
1− λ
c)
,meaningthateachcolumnisclosetothenormal-izedvector Pr
/(
1− λ
c)
.Assuch, Gpr transposesessentiallyin GRthecontributionofthefirsteigenvectorofG.Wecanconcludethat eveniftheoverallcolumnsumsofGpraccountfor
∼
95–97% ofthetotalcolumnsumof GR, Gpr doesn’toffer innovativeinformation
comparedtoPageRankanalysis.
AwaymoreinterestingcontributionistheoneofGqr.This
ma-trixcaptureshigher-orderindirectlinksbetweentheNr nodesdue
totheirinteractionswiththeglobalnetworkenvironment.Wewill refertotheselinksashiddenlinks.Wenotethat Gqr iscomposed
oftwopartsGqr
=
Gqrd+
Gqrndwherethefirsttermgivesonlythediagonalpart of the matrix Gqrd and thus represents the
proba-bilitiestostayonthesamenodeduringmultipleiterationsofG
¯
ssin(13)whilethesecondmatrixcapturesonlynon-diagonalterms inGqrnd.Assuch,Gqrnd representsindirect(hidden)linksbetween
theNr nodesappearingviatheglobalnetwork.Wenote that
cer-tainmatrixelementsofGqrcanbenegative,whichispossibledue
tothenegativetermsin
Q
c=
1−
P
c appearingin(13).ThetotalweightofnegativeelementsishowevermuchsmallerthanWqr(at
least6timessmallerandevennon-existinginArWiki).Ofcourse, thefullreducedGooglematrixGR hasonlypositiveorzeromatrix
elements.In thefollowing, ourstudyconcentrates mainly onthe meaningofGqrnd,emphasizingthemeaningofitslargestpositive
valuesinthestudyofSection4.
3. Matricesofworldcountries
Our study focuses on the networks representing 5 different Wikipediaeditions1 fromtheset of24analyzed in[13]: EnWiki, ArWiki, RuWiki, DeWiki and FrWiki that contain 4.212, 0.203, 0.966,1.533and1.353million ofarticleseach.
3.1.Selectedcountries
The40countrieslistedinTable 1havebeenselectedfromthe EnWiki network after computing the PageRank for the complete network.The40countrieswithlargestPageRankprobabilityhave been chosen, and ordered by a local PageRank index K varying
between1 and40.Themostinfluentialcountriesarefound tobe atthetop values K
=
1,
2,
...
.Inaddition we determinethelocal CheiRankindex K∗ of theselected countries usingthe CheiRank vectoroftheglobalnetwork[14,21].AtthetopofK∗wehavethe mostcommunicativecountries.Table 1lists K and K∗ forEnWiki, ArWiki and RuWiki. Not surprisingly, the order of top countries changeswithrespect totheedition(forinstance,thetopcountry forK isUSexceptforRuWikiwhosetopcountryisRussia).Countries that belong to the same region or having a com-mon piece of history may probably exhibit stronger interactions inWikipedia. As such, we havecreated acolor codethat groups together countries that either belong to the same geographical
1 DatacollectedmidFebruary2013.
region (e.g. Europe, Latin America, Middle East, North-East Asia, South-East Asia) orshareabig partofhistory(formerUSSR; En-glish speakingcountriesthat are thelegacyofthe formerBritish Empire).ColorcodecanbeseeneitherinFig. 1orinthetextcolor ofTable 1.
Itisconvenientaswelltoplotallnodesinthe(K ,K∗)planeto highlightthe countriesthat arethe mostinfluential(K
=
1,
2,
...
) and the most communicative (K∗=
1,
2,
...
) at the same time.Fig. 2plotsall40countriesinthe(K ,K∗)planeforEnWiki,ArWiki andRuWikieditions. Thisplotisa bi-objectiveplotwhere K and K∗ are to be minimized concurrently.It is interesting to look at thesetofnon-dominatedcountrieswhichare theonessuchthat thereisnoothercountrybeatingthemforbothK and K∗.Cultural biasisobvioushereasforEnWiki,thissetiscomposedof{US,JP, IR, AR}, for ArWiki of {US, EG, IT} and for RuWiki of only Rus-sia.We notethat accordingto Fig. 2some countrieswithhigh K
value(relativefewin-degreeandlowPageRankprobability)actas importantdiffusersofcontent(lowK∗)evenforthelanguage edi-tionsbeingdifferentfromthelanguage spokeninthosecountries (BR, AR, RO). Alsocountrieswith low K (having many citations) arerelativelypoordiffusers(FR,CA).ThusFig. 2demonstratesthat differentcultures attribute differentdegree ofcountry popularity (PageRank probability and K index) anddifferent communicative degree (CheiRank probability and K∗ index). This indicates the nontrivialfeaturesofculturespropagationandinteractions.
3.2. DensityplotsofGR,GrrandGqrnd
ForthethreeEnWiki,ArWikiandRuWikieditions, Fig. 4plots the densityof matricesGR, Gqrnd and Grr.We keep for all plots
the same orderof countriesextractedfrom the EnWiki network. This is meant to highlight cultural differences among Wikipedia editions.IntheGRplots,itisclearthatthismatrixisdominatedby
theprojectorGpr contribution,whichisproportionaltotheglobal
PageRankprobabilities.InGR,notsurprisingly,theculturalbiasis
pronouncedduetoitsstrongtietoPageRank.Forinstance,Egyptis muchmoreimportantinArWikithaninothereditions,andRussia isthetopcountryinRuWiki.
The information from direct links between countries is pro-videdbyGrr.Asexpected, theper-columnnormalizationprevents
ameaningfulper-lineanalysis.ForEnWikiandArWiki,the respec-tivecolumnsofMexicoandHungaryarepredominantduetotheir little number of outgoing links. On the contrary, Gqrnd offers a
much more unified view of countriesinteractions as itseems to highlightmoregeneralinteractionthatarelessbiasedbycultural views. For instance, for these three Wikipedia editions, the hid-den links connecting Taiwan to China and Pakistan to India are reallystronginGqrnd.ThelinkconnectingUkrainetoRussiaisvery
stronginEnWikiandArWiki.ItissurprisinglyabsentfromRuWiki (or maybe thisis exactly a cultural bias we are observing since during a long time both countries were part of USSR, and there was thus no specific difference betweenthem). Other interesting hiddenlinksarehighlightedinEnWikiasNew-Zealandisdirected toAustraliaorinRuWikilinkingCanadatotheUSA.
3.3. Friendsandfollowers
In order to better capture the interactions provided by Gqrnd
and GR, we have listed for all 5 Wikipedia editions the top 4
friends andtop 4 followers of a set of 7 leading countries. One leadingcountrypergrouphasbeenselected:USforEnglish speak-ing countries, Brazil for Latin America, France for Europe, Japan forNorth-EastAsia,IndiaforSouth-EastAsia,RussiafortheSoviet block and Turkey forthe Middle-East. To pick them inside each group, we havechosen thecountry whoseworst PageRank order overall5Wikipediaeditionsisthehighest.
Fig. 4. Densityplotsofthematrixelementsof
G
R(firstline),G
qrnd(secondline)andG
rr(thirdline)forthereducednetworkof40countriesofEnWiki(leftcolumn),ArWiki (middlecolumn)andRuWiki(rightcolumn).ThecountrynamesaregivenontheaxesintheordernamesinTable 1,thusthenodesN
rareorderedinlinesandcolumnsby thereferencePageRankofEnWiki.Thecolorsrepresentmaximum(redcorrespondsto:0.15,0.19,0.13intoppanelsfromlefttoright;0.01,0.03,0.012inmiddlepanelsand 0.01,0.011,0.006inbottompanelsrespectively),intermediate(green)andminimum(blueforzero)valuesforagiven matrix.(Forinterpretationofthereferencestocolor inthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)Foreachleading country,weextract fromboth matricesGqrnd
and GR the top 4 Friends (resp. Followers) ofcountry j given by
the 4 best values of the elements of column j (resp. of line j).
In other words, it corresponds to destinations of the 4 strongest outgoinglinksof j and thecountriesattheoriginofstrongest4 ingoinglinks.Asummary ofrelevantresultsisgiveninTable 2to showcross-cultureinteractions.
Looking at GR friends, top friends of leading countries are
strongly related to the top PageRank countries (we have a pre-dominance of US, France and Germany). Similarly, cross-edition followers of US are Mexico and Canada, and followers of Japan areChina, KoreaandTaiwan.Ontheopposite, higher-order inter-actions of Gqrnd are not as much influenced by PageRank. More
subtle but realistic interactions appear: Canada is always identi-fiedasahiddenfriendofUSAwhileitwas neverthecaseinGR.
Similarly, Ukraine is always tagged asa hidden friend of Russia; Italy andSpain asfriendsofFrance.Thus Gqrnd seems to
empha-size morefine-grainedregionalinteractions. Nextsectionexploits theconceptoffriendsandfollowerstocreatenewnetwork repre-sentationsderivedfromGRandGqrnd.
4. Networksof40countries
Thisstudyconcentratesagainonthesame7leadingcountries asbefore.Top4friendsandtop4followersoftheseleading coun-triesareextractedfromGRandGqrndtoplotthegraphsofFigs. 5
Table 2
Cross-editiondirectfriendsandfollowersextractedfrom
G
Rmatrix(toptable)andfromG
qrnd(bottomtable)forthetopcountriesofeacharea.Foreachtopcountry,welist thedirectfriends(followers)presentinthedirectfriendslistgivenbyallfiveWikipediaeditions,theonespresentin4editionsoutof5andtheonespresentin3editions outoffive.Top country GRWiki friends present in GRWiki followers present in
all 5 editions 4 out of 5 editions 3 out of 5 editions all 5 editions 4 out of 5 editions 3 out of 5 editions
US FR DE UK MX–CA IL UK BR US–FR DE AR–PT MX FR US DE BE ES IT RU US–FR DE UA–FI PL TR US–FR DE GR–IR JP US–FR DE KR–TW–CN PH IN US–FR DE PK–ID ZA
Top country GqrWiki friends present in GqrWiki followers present in
all 5 editions 4 out of 5 editions 3 out of 5 editions all 5 editions 4 out of 5 editions 3 out of 5 editions
US CA UK–MX FR UK CA–MX
BR PT–AR ES–MX AR PT UK
FR IT–DE US–BE UK–BE–ES CH
RU UA DE CN–PL–US FI UA–PL IR
TR GR IR–RU GR–IR EG–RO
JP CN–TW–US–KR KR–CN–TW
IN PK–CN ID IR PK–ID CN
Fig. 5. Networkstructureoffriends(topline)andfollowers(bottomline)inducedbythe7topcountriesofeachgeographicalarea(US,FR,IN,JP,BR,TR,RU)in
G
R.Results areplottedforEnWiki(leftcolumn),ArWiki(middlecolumn)andRuWiki(rightcolumn).Nodecolorsrepresentgeographicappartenancetoagroupofcountries(cf.Table 1fordetails).Top(bottom)graphs:thetopcountrynodepoints(ispointedby)withaboldblackarrowtoitstop4friends(followers).Redarrowsshowfriendsoffriends (resp.followersoffollowers)interactionscomputeduntilnonewedgesareaddedtothegraph.Drawnwith[22].(Forinterpretationofthereferencestocolorinthisfigure legend,thereaderisreferredtothewebversionofthisarticle.)
Fig. 6. SamelegendasFig. 5exceptfriendsandfollowersarecomputedfrom
G
qrnd(hiddenrelationships).(Forinterpretationofthecolorsinthisfigure,thereaderisreferred tothewebversionofthisarticle.)and6, respectively. Notethat Fig. 6 essentially highlights hidden links.The blackthick arrowsidentifythetop4 friendsandtop4 followersinteractions.Redarrowsrepresentthe friendsoffriends (respectivelythefollowersoffollowers)interactionsthatare com-puted recursively until no new edge is added to the graph. All graphsareplottedusingaforcedirectlayout.
ThenetworksoffriendsobtainedfromGRneverexpandtothe
fullsetof40nodes.Theyonlyconcentrateonabout10countries (includingthe7leadingones).ThishappenssinceGRisdominated
bytheprojectorcomponent.Lookingatthefollowergraphs,more informationcanbeobserved.North-EastAsian,Middle-Easternand Latin American create, in all editions, a cluster ofnodes densely interconnected.EuropeancountriesencloseRussiaandUkraine as thesecountries are linked to EU countries that were part ofthe formerSovietUnionzoneofinfluence(e.g.Romania,Hungary, Fin-land,etc.).Thenetworksoffollowersendup almostspanningthe fullsetof40countries.
The networks of friends obtained from Gqrnd don’t
concen-trate to a limited set of countriesas it is the casefor GR. They
endup spanningthefull setofcountries. Thehiddenfriend links show that the interactions between the geographical groups are coherent.North-EastAsiancountriesarelinkedtoSouth-EastAsian countriesandtoEnglishspeakingcountriesinEnWikiandRuWiki. Interestingly,thesetofBalticcountries(SE,NO,DK,FI)createmost ofthetimefullmeshes,andinterconnectEuropeandRussia. Cul-tural biascan be observed aswell intheseplots. Fornon-Arabic editions,Middle-Eastern countries createa well-connectedcluster
of nodes. But for ArWiki, Turkey exhibits a stronger connection withEuropethanwiththeother Middle-Easterncountries. Inthe viewofArabiccountries,TurkeyisseenclosertoEuropethan oth-ersforsure.
5. Conclusion
Thisworkoffers anewperspectiveforfuturegeopolitics stud-ies.Itispossibletoextractfrommulti-culturalWikipedianetworks a globalunderstanding oftheinteractionsbetweencountriesata global, continental orregionalscale. Onthe basis ofthe reduced Google matrix analysisof theWikipedia network we determined the relations offriends andfollowersof countrieson purely sta-tistical mathematicalgroundsobtainedfrom thehugeknowledge accumulatedbyWikipediaeditionsindifferentlanguages.This ap-proachprovides resultsindependentofculturalbias.Thereduced Googlematrixtheoryhasbeenshowntocapturehiddenand indi-rect interactionsamongcountries,resultinginnewknowledgeon geopolitics.
Acknowledgements
This work was supported by APR 2015 call of University of Toulouse and by Région Occitanie (project GOMOBILE), CNRS MASTODONS-2016 projectAPLIGOOGLE, andEU CHIST-ERA MACACOprojectANR-13-CHR2-0002-06.
References
[1]E.Jones,TheEuropeanMiracle:Environments,EconomiesandGeopoliticsin theHistoryofEuropeandAsia,CambridgeUniversityPress,Cambridge,MA, 2003.
[2]J.Dittmer,K.Dodds,Populargeopoliticspastandfuture:fandom,identitiesand audiences,Geopolitics13(2008)3.
[3]http://www.wikipedia.org(accessedAugust2016).
[4] EncyclopaediaBritannica,http://www.britannica.com/(accessedAugust2016). [5]J.Giles,Nature438(2005)900.
[6]J.M.ReagleJr.,GoodFaithCollaboration:TheCultureofWikipedia,MITPress, Cambridge,MA,2010.
[7] F.A.Nielsen,Wikipediaresearchandtools:reviewandcomments,availableat SSRN:http://dx.doi.org/10.2139/ssrn.2129874,2012.
[8]S.Brin,L.Page,Comput.Netw.ISDNSyst.30(1998)107.
[9]A.M. Langville,C.D. Meyer, Google’s PageRank and Beyond:The Science of SearchEngineRankings,PrincetonUniversityPress,Princeton,2006.
[10]A.O.Zhirov,O.V.Zhirov,D.L.Shepelyansky,Eur.Phys.J.B77(2010)523.
[11]Y.-H.Eom,K.M.Frahm,A.Benczur,D.L.Shepelyansky,Eur.Phys.J.B86(2013) 492.
[12]Y.-H.Eom,D.L.Shepelyansky,PLoSONE8 (10)(2013)e74554.
[13]Y.-H.Eom,P.Aragon,D.Laniado,A.Kaltenbrunner,S.Vigna,D.L.Shepelyansky, PLoSONE10 (3)(2015)e0114825.
[14]L.Ermann,K.M.Frahm,D.L.Shepelyansky,Rev.Mod.Phys.87(2015)1261.
[15]J.Lages,A.Patt,D.L.Shepelyansky,Eur.Phys.J.B89(2016)69.
[16]M.H.Hart,The100:RankingoftheMostInfluentialPersonsinHistory,Citadel Press,N.Y.,1992.
[17] Academicrankingofworlduniversities,http://www.shanghairanking.com/ (ac-cessedAugust2016).
[18]R.Meusel,S.Vigna,O.Lehmberg,C.Bizer,J.WebSci.1(2015)33.
[19]K.M.Frahm,D.L.Shepelyansky,arXiv:1602.02394[physics.soc],2016.
[20]K.M.Frahm,K.Jaffrès-Runser,D.L.Shepelyansky,Eur.Phys.J.B89(2016)269.
[21]A.D.Chepelianskii,arXiv:1003.5455[cs.SE],2010.
[22]M.Bastian,S.Heymann,M.Jacomy,Gephi:anopensourcesoftwarefor explor-ingandmanipulatingnetworks,in:Proc.ofInternationalAAAIConferenceon WeblogsandSocialMedia,2009.