Friend recommendation for cross marketing in online brand community based on intelligent attention allocation link prediction algorithm

(1)

HAL Id: hal-02383107

https://hal.inria.fr/hal-02383107

Submitted on 28 Nov 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Friend recommendation for cross marketing in online brand community based on intelligent attention

allocation link prediction algorithm

Shugang Li, Xuewei Song, Hanyu Lu, Linyi Zeng, Miaojing Shi, Fang Liu

To cite this version:

Shugang Li, Xuewei Song, Hanyu Lu, Linyi Zeng, Miaojing Shi, et al.. Friend recommendation for cross marketing in online brand community based on intelligent attention allocation link prediction algo- rithm. Expert Systems with Applications, Elsevier, 2019, 139, pp.1-11. �10.1016/j.eswa.2019.112839�.

�hal-02383107�

(2)

ContentslistsavailableatScienceDirect

Expert Systems With Applications

journalhomepage:www.elsevier.com/locate/eswa

Friend recommendation for cross marketing in online brand

community based on intelligent attention allocation link prediction algorithm

Shugang Li

^a

, Xuewei Song

^a

, Hanyu Lu

^a

, Linyi Zeng

^a^,^b^,^∗

, Miaojing Shi

^c

, Fang Liu

^a

aSchool of management, Shanghai University, Shanghai, 2004 4 4, PR China

bSoftware Development Center, Industrial and Commercial Bank of China, Shanghai, 201206, PR China

cUniv Rennes, Inria, CNRS, IRISA, 35042, France

a rt i c l e i n f o

Article history:

Received 17 August 2018 Revised 24 July 2019 Accepted 24 July 2019 Available online 25 July 2019 Keywords:

Friend recommendation Link prediction AAI

Mutually complementary indices Cross marketing

a b s t r a c t

Circlestructureofonlinebrandcommunitiesallowscompaniestoconductcross-marketingactivitiesby theinﬂuence offriendsindifferentcircles and buildstrong and lastingrelationships withcustomers.

However,existingworksonthefriendrecommendationinsocialnetworkdo notconsider establishing friendshipsbetweenusers indifferentcircles,whichhastheproblemsofnetworksparsity,neither do theystudytheadaptivegenerationofappropriatelinkpredictionalgorithmsfordifferentcirclefeatures.

Inordertofillthegapsinpreviousworks,theintelligentattentionallocationlinkpredictionalgorithmis proposedtoadaptivelybuildattentionallocationindex(AAI)accordingtothesparsenessofthenetwork andpredict thepossible friendshipsbetweenusersindifferentcircles.The AAIreflectsthe amountof attentionallocatedtotheuserpairbytheircommonfriendinthetriadicclosurestructure,whichisde- cidedbythefriendcountofthecommonfriend.Specifically,forthepurposeofovercomingtheproblem ofnetworksparsity,theAAIsofboththedirectcommonfriendsandindirectonesaredeveloped.Next, thedecisiontree(DT)methodisconstructedtoadaptivelyselectthesuitableAAIs forthecirclestruc- turebasedonthedensityofcommonfriendsandthedispersionlevelofcommonfriends’attention.In addition,forthesakeoffurtherimprovingtheaccuracyoftheselectedAAI,itscomplementaryAAIsare identifiedwithsupportvectormachinemodelaccordingtotheirsimilarityinvalue,direction,andranking.Finally,themutuallycomplementaryindicesarecombinedintoacompositeonetocomprehensively portraytheattentiondistributionofcommonfriendsofusersindifferentcirclesandpredicttheirpos- siblefriendshipsforcross-marketingactivities.ExperimentalresultsonTwitter andGoogle+showthat themodelhashighlyreliablepredictionperformance.

1. Introduction

Withthe rapiddevelopmentof theInternet, thepopularityof high-speed,stableInternetservicesandtheincreasingexperiences of online shopping, 80% of the top 500 companies in the world haveestablishedonlinebrandcommunities.Theonlinebrandcom- munitybringstogetherthescatteredtargetcustomersofthecom- pany accurately andhasbecome a newplatform forcompany to carry out marketing activities aswell asbuild strongandlasting relationshipswith customers(John, Mochon,Emrich, & Schwartz, 2017).

∗ Corresponding author.

E-mail addresses: [email protected] (X. Song), [email protected] (L.

Zeng), [email protected] (M. Shi).

Alargenumberofworkshavealreadyshownthatfriendgroups can affect individual consumer decisions, for example, product evaluation, purchase possibility(Whittler & Spira,2002), and ac- tual purchase behavior (Li, Chou, & Lin, 2014). Moreover, when brand preference conﬂicts between group and individuals, individuals may hide their consumption behavior (Thomas, Jewell,

& Jennifer, 2015). Some scholars in consumer behavior studies (Solomon, 2016) pointed out that the lower the user require- mentsforproduct,thehighertheinﬂuenceofthereferencegroup.

Aral (2013)studied thecaseof 1.4million Facebookusers down- loading a movie application to ﬁnd out the factors that affected theirdecisionmaking.Herandomlydivideduserswhohaddown- loadedthe applicationintothree groups:Theﬁrst group hadthe rightto invitetheir friends to try iton their own; friends ofthe users in the second group received an automatically generated messageindicating that their friends were using the application;

(3)

friends ofthe users in the third group did not receive anymes- sages.As a result,6% ofthe friends who receivedthe unsolicited invitationdownloadedtheapplication,comparedwith2%ofthose whoreceived automatictips. In addition,healsocomparedusers whoactivelysentinvitationsandsuccessfullyinvitedfriends with userswhosentautomatictipstofriendsandinvitedfriends.Inthe longrun,theformerusedtheapplicationformoretimethan the latter.

Friends in online brand communities share their passion for aspeciﬁc brand aswell asexchange informationandknowledge, andthesesocialinteractionspositivelyinﬂuencethememberloy- altyto the brand (Brogi,2014). In addition, some members who sharea common interest orprefer the sameproduct are further gatheredtogether to formdifferentsub-groups, i.e.circles(Wang

&Xue,2010) inabrandcommunity.Circle structureprovidesfer- tilegroundforcross-marketing,wheresalesareampliﬁedwiththe inﬂuenceoffriends though forming friendshipsbetweenusers in differentcircles.

Scoring link predictionalgorithm (SLPA)is the mainapproach topredictwhethertherearelinksbetweennodepairsinthesocial network and recommend friendships between user nodes based onnetwork topology.So far, variousSLPAs havebeenwidely applied for the recommendation of friends, such as the Common Neighborindex(CN),theHub PromotedIndex(HPI), andtheRe- sourceAllocation(RA)(Lü &Zhou,2011).However,existingworks on prediction friendships in social network do not consider establishing friendships between users in different circles, which bears on the problem of network sparsity. What’s worse, there is not yet an SLPA suitable for all network circle structures. So, constructing the suitable SLPA based on circle structure characteristics is of signiﬁcant theoretical and practical value, which is a complex work that requires a lotof expert experience. There- fore,itisofgreatsigniﬁcancetodevelopmethodthatreliessolely onnetworkdatato buildappropriate SLPA,so thatinexperienced practitioners caneasily use itto obtain highlyreliable prediction results.

To addressthese problems,thisstudyproposesthe intelligent attention allocation link prediction algorithm (IAALP A) to pre- dictthepossiblefriendshipsbetweenusers indifferentcirclesby adaptivelybuildingattentionallocationindex(AAI)forspeciﬁccir- cles,andthe AAI representsthe amountof attentionassignedto theuserpairby theircommonfriendinthetriadicclosurestruc- ture, which is determined by the friend count of the common friend.

Specifically, forthe purposeof overcoming the network spar- sityproblem inpredictingfriendships betweenusers in different circles,the AAIs of both the direct common friends and indirect onesaredevelopedbased onthe principleofattentionallocation ofcommonfriends intriadic closurestructure. Since it’s difficult to build the AAI that fits all the structural characteristics of circles,inlightofthecharacteristicsofthedensityofcommonfriends andthedispersionlevelofcommonfriends’attention,thedecision tree(DT)method(Wang,Wu, &Yao, 2017)is developedtoadap- tivelyselecttheAAIsuitableforpredictingpossiblelinksbetween user nodes in different circles. In addition, although the combi- nationof singleAAI can help improve its link predictionperfor- mance, theblind combinationofAAIs cannot bring the expected results (Gomes, Barddal, Enembreck, & Bifet, 2017). To this end, IAALPA applies the support vector machine (SVM) (Shan, Kong, Zhang,Lietal.,2018)modeltoidentifythecomplementaryAAIsof theAAIselectedbyDT,whichcanimprovetheperformanceofthe selectedAAIby combiningthemasa compositeindicator.Finally, theselected AAI andits complementary onesare used todesign the composite mutually complementary AAI to comprehensively portraythe attention allocated to usernode pairs by their common friends and forecast the possible connections between user

nodes in different circles. Consequently, the friendships between usersindifferentcirclesarerecommendedandsuccessfullycross- marketingisachieved.Specifically,agroupofusersofoneproduct circlearerecommendedtothetargetcustomerswhoaretheusers of another circle in online brand community. Accordingly, when thebrandmarketersaresupportedbythefriendgroup’sinfluence onindividuals,theycansignificantlyenhance thecross-marketing efficiencyoftheproduct.

The remainder of this paper is organized as follows:

Section 2 introduces the link prediction; Section 3 offers friend group recommendation; Section 4 explains the IAALPA;

Section 5offersthe experimentaldesign andtheresultsanalysis;

Section6givesasummaryofthisstudy.

2. Linkprediction

Linkpredictionisanetwork-relatedproblem,whichconsistsof predictingnew connections and detecting hidden linksin a network.Itisanimportanttaskapplicabletoawidevarietyofareas, such as bibliographicdomain, molecule biology, criminalinvesti- gations and recommending systems (Lü & Zhou, 2011; Martínez, Berzal, & Cubero, 2017; Xiang, 2008). The link prediction problem can be formally deﬁned as follows. Given a snapshot of a social network at time t, we seek to accurately predict the edges that will beadded to thenetwork attime t+^by ^deﬁn- ing asimilarity ora probability index(Liben-Nowell& Kleinberg, 2007; Martínez et al., 2017). The existing link prediction methods can be divided into four categories: Link prediction algorithms based on similarity measures, probabilistic and statistical methods, algorithmic approaches and preprocessing methods (Martínezetal.,2017).Amongtheseapproachestotreattheprob- lem,themostwidespreadonesrelyontheuseofsimilaritymea- suresbetweennodepairs(Lü &Zhou,2011;Martínezetal.,2017).

The similarity measures proposed and evaluated in previous literature can be broadly categorized into two groups: semantic and topological measures (Kaya & Poyraz, 2016). Semantic measures use the nodes’ content to survey similarity. For instance, ina co-authorship network, thesimilaritybetween keywordsex- tracted frompublishedpapers were applied to predict futurein- teractionamongtheauthors(Xiang,2008). Differentfromthese- mantic measures, the topological measures consist of deploying the network structure to compute the similarity scores (e.g. the numberofcommonneighborsthat two nodesshare).Topological measuresaremorecommonlyadoptedintheliteraturesincethey aremoregeneralanddonotrequirethedeﬁnitionofrichfeatures todescribe content.In fact,rich featuresare not always available anddependonthesocialnetworkconsidered.

Severaltopologicalmeasuresareproposedinexistingliterature and mainly categorizedinto neighborhood-based and path-based measures (Kaya & Poyraz, 2016). The neighborhood-based mea- surestakethenodes’immediateneighborsintoaccount.Ingeneral, thesemeasuresconsiderthattwonodesaremorelikelytoforma link iftheir sets ofneighborshave alarge overlap(Xiang, 2008).

Amongtheneighborhood-basedmeasures,Salton(Newman,2001), Sorenson,HPI,HubDepressedIndex(HDI),Leicht–Holme–Newman (LHN), Preferential Attachment(PA) (Barabâsi et al., 2002), RA (Adamic & Adar, 2003) and Jaccard’s coeﬃcient (Martínez et al., 2017) can be mentioned. The path-based measures in turn de- ﬁne the similarity between nodes by considering the paths between them.The basic ideais that two nodesare morelikely to formalinkiftherearemoreshortpathsbetweenthem.Thepath- basedmeasuresrangefromtheordinarypath-distancemeasuresto moresophisticatedmeasures that considerensemblesofdifferent paths, forinstance,the Katzmeasure (Soares &Prudêncio, 2013).

Incomparative terms,theneighborhood-basedmethods aremore widespread,due toboth their computational eﬃciency andgreat

(4)

performanceobservedinexperiments(Huang,2006;Liben-Nowell

& Kleinberg,2007;Murata& Moriyasu,2008). Themeasures proposed in this study can be categorized as neighborhood-based ones, since they use information about the connections around nodes to assign scores to them. But the uniqueness of our algorithm is that it considers not only the direct common neighbors of node pairs but also their indirect common neighbors.

Moreover, the feature of attention allocation of common neighbors,namelythefriendcountofcommonneighbors,istakeninto account.

SinceaSLPA isnot suﬃcienttofullycharacterizethenetwork, the combined link prediction algorithms are developed. On the whole,therearemanycombinedlinkpredictionalgorithmswhich are built with a high degree of versatility to suit a variety of networksbasedon multiplenetworkcharacteristics. Forexample, Fan,Liu,Lu,Xiu,andChen(2017)proposedacompositelinkfore- castingindexandtookintoaccountthenodetypeeffectandnode structuresimilarity. WuandTang(2014) developedadirectedso- cialnetworklinkpredictionmethodbasedonthetopicmodelthat integratednodeattributesandnetworkstructures forlinkpredic- tion. Muniz, Goldschmidt, andChoren (2018) used context (node and link attributes), temporal information (chronological interac- tion data) andtopologyinformation tocalculate linkweights between nodes, and then applied weighted similarity function to identifypotentiallinks. Xiao,Li,Wang,Xu,andLiu(2018)studied the internalandexternal factorsthat inﬂuenced theformationof linksanddevelopeda three-levelhiddenBayesianlinkprediction modelby combiningtheuserbehaviorsanduserrelationshipsto linkprediction.

Unlike existing works that focus on algorithm versatility, this study tends to build the suitable algorithms for speciﬁc circle structures. Obviously, network structures are diverse anda SLPA cannot perform well inall networks. Accordingly, thisstudy pro- posesIAALPAtoadaptivelyselectAAIssuitable forthegivencircle structure accordingto thenetwork features ofcommon neighbor densityandattentiondispersing.Additionally,thecomplementary AAIsoftheselectedAAIarescreenoutbasedontheirsimilarityin value, direction, andranking. Subsequently, the selected AAI and itscomplementaryonesarecombinedintoonecompositeindexto avoidthelowpredictionperformance causedby theblindcombi- nationofAAIs.

In recent years, many scholars have done valuable works on friend recommendation in social network based on link prediction methods.Using the recommendationalgorithm ofdatamin- ing, Liu, Yu, Wei, and Ning (2018) proposed an improved algorithm to rankthe recommendedinformation withconﬁdencein- terval,andrecommendedfriendswiththesameinterest forusers in microblog. Yuan, Cheng, Zhang,Liu, andLu (2015) designeda socialinﬂuencepropagationmethodtomineuser’sbuddy(friends who had a great impact on user) andsusceptibility (willingness to be affected), and developed a recommendation model based on the impact of social relations. He et al. (2017) integrated link andcontent information anddeveloped the MapReduce dis- tributedcomputingframeworktoimplementtherecommendation of friendsin large-scale onlinecommunitynetwork. Zhu,Lu, and Ma(2015)mineduserinterestsfromshortmessagesandproposed the neighbor-based friend recommendation to recommend users withsimilarinterests.

Despite intensive research efforts, there is a distinct lack of methodology for recommendingfriends indifferent social circles ina social network,whichisdifferent fromthe traditionalfriend recommendationduetotheproblemofnetworksparseness.Inthis study,IAALPAisconstructedtorecommendtheusergroupinone productcircletothetargetuserinanotherproductcircle,andthe inﬂuence ofthefriend group isusedto affectthepurchase deci- sionofthetargetuser,soastorealizecross-selling.

3. Friendgrouprecommendationforcross-marketing 3.1. Cross-marketinginabrandcommunity

Theonlinebrand communityisdefined asD(V, E), whereV is theset of nodesrepresenting theusers in the communityandE isthe set ofedges representingthe friend relationshipsbetween users.Inabrandcommunity,userswillformacirclebecausethey preferthe sametype ofproduct. Theset ofusersin thecircleof product A is defined as V_A andthe set of users in the circle of productBisdefinedasV_B.Incross-marketing,ifproductAissold tousersinthecircleofproductB,allusersintheproductAcircle canberecommendedtousersintheproductBcircleasthefriend group,andviceversa.

Usually, the nodes in the same circle are closely connected (there are more common friends), while the nodes in different circlesare sparsely connected (there are fewer common friends).

Therefore, recommending users in different circles to become friendsoftenfacestheproblemofnetworksparsity,whichischar- acterizedbytheaveragedegreeofnetworkbeingfarlessthanthe numberofnodes(Lei&Rinaldo,2015).Sincethetopologicalchar- acteristicsofsocialnetworksareall relatedtoaveragedegree,ei- therdirectly orindirectly, sparsitywill affectthe performance of the existing SLPAs because these algorithms heavily rely on the networktopology. Consequently,traditionalSLPAs cannot guaran- teehighpredictionaccuracyoffriendrecommendationindifferent product circles. To address this problem, IAALPA is developed to fullydescribe thepossibilityofconnectionbetweenusernodesin differentcirclesandconstructasuitable mutuallycomplementary AAIforthespeciﬁccirclestructure.

Fig. 1 shows an example of friend group recommendation in differentcirclesinbrandcommunityD,wheretheinitial network isshowninFig.1(a),thenetworkwithpredictedlinksisdescribed inFig.1(b).InFig. 1(a), usersthat belong toproductA circleare 1,2,3,and4,usersthat belongtoproductBcircleare 5,6,and7.

Supposethat user7isthe targetcustomer,thepurpose offriend group recommendation is to enable him to purchase product A.

Through the linkprediction method,it is found that user7 may establishafriendrelationshipwithusers1,3,and4inproductA circle.Thentheonlinebrandmanagerscanrecommendusers1,3, and4tobecomefriendswithuser7,asshowninFig.1(b).When they becomefriends, thetarget customer 7 isencouragedto buy productAusingtheimpactofthefriendgroupof1,3,and4.

3.2.AAIforfriendgrouprecommendation

In the triadic closure structure, the fewer friends a common neighbor of a node pair has, that is, the smaller its degree is, the more attention the common neighbor assigns to the node pair(Backstrom,Bakshy,Kleinberg,Lento,&Rosenn,2011),accordingly,themorelikelythereisalinkbetweenthenodepair.Based on this principle, AAIs are constructed from the point of view ofmicrostructure (node pairs andtheir commonneighbors), and macrostructure(nodepairs, their commonneighbors,andfriends ofcommonneighbors),asa result, theproblemofnetwork spar- sityisovercome andtheattentionallocation betweennodepairs iscomprehensivelydepicted.

3.2.1. AAIsbasedonmicrostructure

Seven mostcommonlyused indices,such asSalton, Sorenson, HPI,HDI, LHN, PA, RA, andresource allocationaverage (RAA) are selected (Lü & Zhou,2011) andadopted as microstructure based AAIs,which consider the microstructure consistingof node pairs andtheircommonneighbors.AsshowninTable1,thelargernum- ber of common neighbors or the less degree of common neigh- borindicatesthelargerscoresofattentionallocation,viceversa.In

(5)

Fig. 1. An example of friend group recommendation.

Table 1

AAIs based on microstructure.

AAI Formula Salton S ^Saltonxy = ^|√ ^Γ⁽^x⁾^∩^Γ⁽^y⁾^|

k(x)∗k(y) Sorenson S ^Sorensonxy = ²^|^Γ_k₍⁽_x^x₎⁾₊^∩_k^Γ₍_y⁽₎^y⁾^| HPI S ^HPIxy = _min^|^Γ⁽_{^x_k⁾₍^∩_x^Γ₎_,k⁽₍^y_y⁾₎^|_} HDI S ^HDIxy = _max^|^Γ⁽_{^x_k⁾₍^∩_x^Γ₎_,k⁽₍^y_y⁾₎^|_} LHN S ^LHNxy = ^|^Γ_k⁽₍^x_x⁾₎^∩_∗k^Γ₍_y⁽^y₎⁾^| PA S ^PAxy= k (x )∗k (y ) RA S ^RAxy= _z∈_Γ₍_x₎_∩_Γ₍_y₎_k₍¹_z₎ RAA S ^RAA_xy = _z∈_Γ₍_x₎_∩_Γ₍_y₎^max⁽^k_k⁽₍^x_z⁾₎^,k⁽^y⁾⁾

Table1,Γ(·)^represents^the ^neighbor^set ^of^a^node, ^k⁽·)denotes thedegreeofanode.

3.2.2. AAIsbasedonmacrostructure

Consideringthe sparsenessoflocalnetworksformedby nodes indifferentcircles,we innovativelydevelopawiderrangeofAAIs basedonthemacrostructureofnetwork,whichconsidertheeffect ofindirectcommonneighbors,i.e., friends ofcommonneighbors.

Speciﬁcally,theattentionallocatedtocommonneighborsbytheir friendscanchangetheconnectionsofcommonneighborsandthen indirectlyaffecttheattentionthatisallocatedtotargetnodepairs bycommonneighbors.TheseinnovativelyproposedAAIsareWA1, WA2,WA3,andRWA.

(a)WA1

WA1representstheattentionallocatedtonodepairsbythein- directcommon neighbors.The larger degree of indirect common neighborindicatesasmallerscoreofattentionallocation,asshown informula(1).

S^WA_xy¹= Γ (^z),z∈Γ (^x)∩Γ (^y)

1

k(

Γ

(^z)) ⁽¹⁾

wherek(Γ(^z))^represents^the^degree^of^friend^of^common^neigh- borzbetweennodexandy.

(b)WA2

WA2 representstheattentionallocatedto nodepairsby direct andindirectcommonneighbors,wherethelargerclusteringcoef- ﬁcientrepresentsthelessattentionallocationofindirectcommon neighborsandthelargerdegreeofdirectcommonneighborsindi- catesthelessattentionallocation,asshowninformula(2).

S^WA_xy²=

z∈Γ (^x)∩Γ (^y)

1

k(^z)^∗(

ρ

^∗^c(^z)⁺

ϕ

^∗(¹⁻^c(^z))) ⁽²⁾

where c(z) is the clustering coefficient of node z, namely c(^z)= 2∗n/(^k∗(^k−1)),nrepresentsthenumberoflinksbetweenallk neighborsofnodez,andρândϕâre^constantparameters.

(c)WA3

Obviously,thelargerdegreeofnodeindicates thewider atten- tiondispersionofthenode.InWA3,theattentiondispersionofthe nodesthemselvesiscombinedwiththeattentiondispersionofthe commonneighbornodes,asshowninformula(3).

S^WA_xy³=

z∈Γ (^x)∩Γ (^y)

₁

k(^z)∗(¹−c(^z))⁺ 1 k(^x)∗c(^x)⁺

1 k(^y)∗c(^y)

(3)

(d)RWA

In fact, the connection of node pairs is often affected by the joint inﬂuence of the attention allocation of direct and indirect common neighbors as well as the nodes contained in the node pairs. Accordingly, a combined index RWA is obtained by inte- grating AAIsbased on microstructureandmacrostructure so that the characteristics of attentiondistribution among nodes are ad- equately depicted and the shortcomings of low prediction accu- racycausedbycirclestructuresparsityareovercome.AmongAAIs based on microstructure, since RA is highlyeffective and widely used,itis adoptedinRWA.InAAIsbased onmacrostructure, be- causeWA1isincluded inc(z),we justconsiderWA2 andWA3in RWA.Consequently,RWAisconstructed,asshowninformula(4).

S^RWA_xy =

α

∗S^RA_xy+

β

∗S^WA_xy²+

γ

∗S^WA_xy³ (4)

whereα^,β^,ândγ âre^the^weight^parameters ôf^S^RAxy,S^WA_xy²,S^WA_xy³, respectively.

4. IAALPA

IAALPA considers the relationship between the joint effect of various features of the network and the prediction performance ofthe algorithm.In IAALPA, DT modelis developedto selectthe appropriate AAI based on the network characteristics related to common neighbors. Moreover, because a single AAI oftenmakes overestimationorunderestimationandtherandomcombinationof AAIsdoesnot guaranteeexcellent resultseverytime,SVMiscon- structed to recognizethe complementary AAIsto generatean ef- fective combined predictionmodel. Fig.2 showsthe structure of IAALPA.

(6)

Fig. 2. The structure of IAALPA.

4.1. AlgorithmevaluationandcomplementaryAAI

Thearea underthecurve(AUC)isthe mostcommonstandard metricformeasuringtheaccuracyofSLPAs.AUCrandomlyselects the connected node pairs andunconnected ones in the test set, and compares their scorevalues obtainedby AAI. Inm indepen- dentcomparisons,iftheconnectednodepairshaveahigherscore ofm1times,thentheAUCisshowninformula(5)(Liben-Nowell

&Kleinberg,2007).

AUC= m1+0.5(^m−m1)

m (5)

Whenthenetworksizeislarge,theAUCvalueobtainedbythis randomsamplingmethodcanreducethecomputationcomplexity and improvethe eﬃciency. It is obviousthat the larger the AUC value,thehighertheaccuracyofthealgorithm.

The complementary AAI ofanyindexB is: Ifthe combination ofB andcandidateAAI hasbetter AUC thanthat B has,thenthe candidateAAIisconsideredtobethecomplementaryindexofB.

4.2. DTforscreeningAAI

AlthoughAUCisthemostcommonmethodforevaluatingAAI, the AUC method doesnot analyze thecharacteristics of the spe- ciﬁccirclestructure.Inpractice,itisoftennecessarytotryallAAIs before ﬁnding theAAI that ismostsuitable fora givennetwork.

Since DTs are good at solving multi-class problems andcan im- plicitlyperformvariablesscreeningorfeaturesselectionwhilere- quiringrelativelylittleeffortfordatapreparation(Mantas,Abellán,

&Castellano,2016),DTisdevelopedtoadaptivelyselecttheappro- priateAAIsforthespeciﬁccirclestructure.

The features of the network according to the idea of attention allocation ofcommonneighborsin thetriadic closurestruc- tureare fullydescribed inthefollowing two dimensions. Firstof all, the density of common neighbors is considered since more commonneighborsindicate moreattentionallocated,suchasthe averagedegreeandtheaverageclusteringcoeﬃcient.Secondly,the dispersion levelofcommonneighbors’attentionisemployed. Be- causemoreshortpathsconnectingcommonneighborsmeantheir moreattentiondispersion,weconsidertheshortpathrelatedfea- tures,forinstancetheaverageshortestpath,theaveragenodebe- tweenness,andtheaveragelinkbetweenness.

InDT,theindependentvariablesofthelearningsamplearethe network characteristicindices, andthe dependent variable is the AAI with the maximum AUC value. Fig. 3 shows an example of DT.Inthisstudy,theDT algorithmisdesignedbasedonthe idea ofC4.5algorithm(Mantasetal.,2016),becauseC4.5classiﬁcation treeisthemostpopularalgorithmandhasbeenproved bymany studiestobe thesimpleandpracticallearningalgorithm.Assum- ingthatthenetworktrainingsetisT,ineachsample,thenetwork is labeled by the AAI withthe largest AUC. Giventhat there are ktypesof AAIs,adivision ofT isobtainedas{S₁,S₂,,S_k}.The prior probability ofdivision is P_i=|^Si|/|^T|^. ^Then ^the information entropyusedforclassifyingTisIn f o(^T)=−k

i=1P_ilog₂P_i.

ThenetworksinTaredividedbyfeatureA(suchastheaverage clusteringcoefficient),andthesequence{A₁,A₂,,A_J}isobtained by arrangingthe valuesof featureA inascending order. Defining the ith (¹≤i≤J−1) ^partition ^point âs âi=(Âi+A₍_i₊₁₎)/2, T is divided into 2 subsets {T₁, T₂}, where the value of feature A of thenetworksinT₁ isV(A,T₁)∈[A₁,a_i],andsimilarlyV(A,T₂)∈(a_i, A_J]. Corresponding to this kind of division, the information gain of feature A is Gain(Â)=In f o(^T)−In f o_A(^T), where In f o_A(^T)= 2

i=1|^Ti|

|^T|In f o(^Ti)^.Correspondingto partitionpointa_i,theinforma- tiongainrateofAisshowninformula(6).

Gain_Ratio(A,ai)=Gain(^A)

Split(^A) ⁽⁶⁾

whereSplit(^A)=−2 i=1|^Ti|

|^T|log₂^|_|^T_Tⁱ_|^|.

Subsequently,theinformationgainrateforeachpartitionpoint insequence{A₁,A₂,,A_J}iscalculatedaccordingtoformula(6), andthepartitionpointwiththemaximumgainrateisselectedas thebestbranchthresholdofthefeatureA,namely,Threshold(^A)=

1≤maxi≤J−1{^Gain^_^Ratio(^A,a_i)}^.

ThemainprocedureoftheDTproposedinthisstudyisshown asfollows:

Step1. The features ofcommon friend densityandthe disper- sionlevelofcommonfriends’ attentionarecalculated,thefeature withthe maximum information gain rate is selected asthe root node,anditisbranchedbasedonitsbestbranchthreshold;

Step2. Forthe subset ofdata corresponding to thebranch of differentfeatures,branchesofthetreeare recursivelyestablished bythesamemethodasStep1,andthisprocedureisrepeateduntil alldatasamplesofeachbranchbelongtothesameclassAAI;

Step3.Thesimpliﬁed DTisobtainedby pruningtheinitial DT toeliminatetheinﬂuenceofrandomfactorssuchasnoiseandiso- latednodes;

Step4.The decisionrules are extracted.For theDT generated byStep3,thedecisionrulescan beobtaineddirectly,that is,the bestAAI suitable forthecirclestructure isselected basedon the structuralfeaturesofthecommunitynetwork.

ItshouldbenotedthatintheprocessoftrainingtheDTmodel, theAUCofAAIforeachnetworkintrainingsetiscalculated,and theaverageAUCvaluewofeachAAIwillbeusedasitsweightin thecombinedmodelinSection4.4.

4.3.SelectingcomplementaryAAIsbasedonSVM

Althoughit ispossible toselect themostsuitable AAI forthe networkthroughDTmethod,thecombinationoftheselectedAAI anditscomplementary indicescan fullycapturetheattentional- locationcharacteristicsofnodesindifferentcircles.Thesparsityof thenetworkcircleissufficientlydiverse,andthatmayleadtothe overfittingof the modelto identify the mutuallycomplementary AAIs,namelythemodeladaptstooexactlytotheparticularcircles, andfailstofitadditionalcirclesreliably.SVMshowsmanyunique advantagesinavoiding overfittingandsolving smallsample,non- linearandhigh-dimensionalpatternrecognitionproblems,soitis

(7)

Fig. 3. An example of DT.

adoptedtoidentify AAIsthat are complementaryto thebest AAI selectedbytheDTmodel(Wang& Xing,2019).Subsequently, the mutuallycomplementary indices are combined into a composite one.

Inordertofullydescribecomplementary relationshipbetween indices,thescoresimilaritybetweenAAIsisdescribedfromthree aspects:ﬁrstly,thedistancebetweenthescoresofthetwoindices, suchasEuclideandistance,standardized Euclideandistance,Man- hattandistance, andChebyshev distance; secondly, the similarity indirection,suchasCosinedistanceandPearsoncorrelationcoef- ﬁcient;thirdly,thedifferenceinperformanceranking,forexample, Spearmandistance.

Givenasetoftrainingsamples[I_i,M_i], i=1,2,· · ·,l,inputvec- tor I_i represents the above 7 similarity indices for the scores of twoAAIsinsamplei,andM_i=1iftwoAAIsaremutuallycomple- mentary,else M_i=−1.ThemainideaofSVMistomap theinput vector toa high dimensional eigenvectorspace, andto construct theoptimalclassificationsurfaceintheeigenvectorspace.Inorder to improvethe efficiency of the algorithm, Radial BasisFunction (RBF)K(Îi,V)=exp(−||^V−I_i||²/δ²)îs^selectedâs^the^kernel^function,whereVistheinputvectorandδîs^constant.Accordingly,an optimalclassificationfunctioncanbeobtainedinformula(7).

f(^V)=sgn

{

(^W·(^V))+b

}

⁼^sgn

l

i=1

aiMiK(^Ii,V)+b

(7)

wherea andb are constants,W isa normalvector, which deter- minesthedirectionofhyperplane,bisadisplacementterm,which indicatesthedistancebetweenhyperplaneandorigin.⁽^V⁾^repre- sentstheeigenvectoraftermappingV.

4.4.CompositeAAI

In orderto comprehensivelyportray theattentiondistribution of common friends of users in different circles, IAALPA designs a composite mutually complementary AAI. The composite index modelismainlycomposedofthebestindexBselectedbyDTand thecomplementary AAIsofB identiﬁedbySVM(i.e.E1,E2,,E_i).

Aimingat making the AAI withgood performance account fora large proportion in the composite AAI, average AUC value w of

each AAI, which is obtained in trainingDT model, is applied as its weight, as mentioned in Section 4.2. In addition, h excellent complementaryAAIswithlargerweights areselectedtobuildthe compositeAAItofurtherimprovetheaccuracyofthealgorithm,as showninformula(8).

S=w∗(B,E1,E2,· · ·,E_h)^T ⁽⁸⁾

5. Experimentsandresultsanalysis 5.1. Experimentaldesign

971Ego-netdatasetsinTwitter(http://snap.stanford.edu/data/

egonets-TwittQQer.html) and 132 Ego-net data sets in Google+ (http://snap.stanford.edu/data/ego-Google+.html) were adopted to verify theeffectiveness of IAALPA. Each Ego-netis a brand com- munitynetwork,wherecenternoderepresentsthebrandcompany andusersaredivided intodifferentproductcircles.760networks withproductcircleswereselectedfromTwitter,and100networks withproduct circleswere selected fromGoogle+. Table 2shows themean,minimumandmaximumofthe statisticalindicatorsof theselectednetworksamplesinTwitterandGoogle+.

To test the accuracy ofthe proposed algorithm,the networks in Twitter and Google+ were randomly divided into two parts, respectively:the trainingset,whichcontained80 percentofnet- works,andthetestset,whichcontainedtheremaining20percent ofnetworks.

Todescribebrieﬂy,S1-S25areusedtorepresent25algorithms, asshowninTables3and4.Table3givesAAIswithoutparameters, andTable4givesAAIswithspeciﬁcparametervalues.Thesevalues were determined according to a large number ofexperiment re- sultsandthosevalueschosenfortheparametershavebeenproven toworkcorrectlyonawiderangeofproblems.

IntheDTexperiments,25AAIswerescreened basedontrain- ing set data,and 15optimal AAIswere selected,which were S6, S7,S8,S9,S10,S11,S13,S15,S16,S17,S19,S20,S23,S24,andS25.

ForeachoptimalAAI,trainsetwasusedtotrainSVM,andthenits complementary AAIswere found,sothat 15 SVMswereobtained corresponding to 15 AAIs.Then, thecomposite mutuallycomple- mentaryAAIwasdesignedbasedonformula(8).

(8)

Table 2

Statistical information of network samples in Twitter and Google+.

Statistical indicators Twitter Google +

Minimum Mean Maximum Minimum Mean Maximum

Number of nodes 4 40.35 230 4 222.11 962

Number of edges 4 398.06 5137 8 1.23 56330

Average degree 1.66 13.55 64.50 2 37.89 143.11

Average shortest path 0.44 1.64 3.65 0.34 1.81 3.20

Average node betweenness 1 33.38 330.04 1 205.73 1.15

Average link betweenness 0.44 1.64 3.65 0.34 1.81 3.20

Average clustering coeﬃcient 0.12 0.66 1 0.24 0.66 0.87

Table 3

AAIs without parameters.

AAI abbreviation S1 S2 S3 S4 S5 S6 S7 S8 S12 S13

AAI name Salton Sorenson HPI HDI LHN PA RA WA1 RAA WA3

Table 4

AAIs with parameters.

Algorithm AAI Parameter Algorithm AAI Parameter

S9 WA2 ρ= 1 , φ= 4 S19 RWA ρ= 1 , φ= 1 , α= 4 , β= 1 , γ= 1 S10 WA2 ρ= 1 , φ= 1 S20 RWA ρ= 1 , φ= 1 , α= 1 , β= 4 , γ= 1 S11 WA2 ρ= 4 , φ= 1 S21 RWA ρ= 1 , φ= 1 , α= 1 , β= 1 , γ= 4 S14 RWA ρ= 1 , φ= 4 , α= 1 , β= 1 , γ= 1 S22 RWA ρ= 4 , φ= 1 , α= 1 , β= 1 , γ= 1 S15 RWA ρ= 1 , φ= 4 , α= 4 , β= 1 , γ= 1 S23 RWA ρ= 4 , φ= 1 , α= 4 , β= 1 , γ= 1 S16 RWA ρ= 1 , φ= 4 , α= 1 , β= 4 , γ= 1 S24 RWA ρ= 4 , φ= 1 , α= 1 , β= 4 , γ= 1 S17 RWA ρ⁼¹^,φ⁼⁴^,α⁼¹^,β⁼¹^,γ⁼⁴ ^S25 ^RWA ρ⁼⁴^,φ⁼¹^,α⁼¹^,β⁼¹^,γ⁼⁴ S18 RWA ρ= 1 , φ= 1 , α= 1 , β= 1 , γ= 1

Table 5

The performance of all algorithms in the Twitter experiments.

Algorithm Average AUC Algorithm Average AUC Algorithm Average AUC

S1 0.78543 S10 0.819112 S19 0.81563

S2 0.78078 S11 0.815698 S20 0.815624

S3 0.737871 S12 0.822598 S21 0.800107

S4 0.767094 S13 0.793013 S22 0.80673

S5 0.622253 S14 0.806359 S23 0.814921

S6 0.775256 S15 0.814763 S24 0.811521

S7 0.819129 S16 0.810858 S25 0.798743

S8 0.799544 S17 0.798534 IAALPAa 0.822624

S9 0.820377 S18 0.809194 IAALPAb 0.8158

Single DT 0.811517 SVM 0.656711

Table 6

The performance of all algorithms in the Google + experiments.

Algorithm Average AUC Algorithm Average AUC Algorithm Average AUC

S1 0.66928 S10 0.84928 S19 0.841492

S2 0.644073 S11 0.845608 S20 0.841461

S3 0.727259 S12 0.882082 S21 0.813375

S4 0.622012 S13 0.803694 S22 0.825926

S5 0.387874 S14 0.824811 S23 0.840444

S6 0.873019 S15 0.840224 S24 0.835089

S7 0.849222 S16 0.832744 S25 0.811044

S8 0.884159 S17 0.810624 IAALPAa 0.892791

S9 0.85096 S18 0.830051 IAALPAb 0.87925

Single DT 0.857105 SVM 0.688149

Since quite a few composite link prediction algorithms were constructed basedon SVM,another compositealgorithm wasde- velopedwithSVM to evaluate whetherIAALPA outperformedthe recentlydevelopedother compositealgorithms.IntheSVMbased compositealgorithm,thelinksbetweennodepairswerepredicted using 8 AAIs inTable 1. All algorithms were applied in MATLAB softwarewithdefaultsettings.

TheaverageAUCsforTwitterandGoogle+datasetin100ran- domexperimentsareshowninTables5and6,respectively,where

IAALPAa represents the combination of the optimal AAI selected by the DT model andtwo complementary AAIs with the largest weights. IAALPAb represents the combination ofthe optimal AAI selectedbytheDTmodelanditsallcomplementaryAAIs.

Figs. 4 and 5 show performance comparison of various AAIs in the Twitter and Google+ experiments, respectively. Figs. 6 and7 show the performance comparison of singleDT and non- combinationAAIsintheTwitterandGoogle+experiments,respectively.

(9)

Fig. 4. Performance comparison of AAIs in the Twitter experiments.

Fig. 5. Performance comparison of AAIs in the Google + experiments.

Fig. 6. Performance comparison of single DT and non-combination AAIs in the Twitter experiments.

Fig. 7. Performance comparison of single DT and non-combination AAIs in the Google + experiments.

5.2.Performanceanalysisofalgorithms

Tables 5 and6 show thatcompared withexisting outstanding algorithms, IAALPA has a signiﬁcant improvement effect, that is, IAALPAproposed in thisstudycan accurately recommendfriend- ships between users in different circles when conducting cross- marketingactivities inonlinecommunity.Itcan alsobeobserved fromTables 5 and 6that performance of IAALPAais better than thatofIAALPAbintegratingallcomplementaryAAIs,andthisindi- catesthat combining complementary AAIswith largerweights is aneffectivemeanstoimprovetheaccuracyofIAALPA.

From Figs. 4and5,itcanbe seenthat theaccuracyofthe 19 algorithms,rangingfromS7toS25,issigniﬁcantlyhigherthanthat oftheother 7AAIs,andtheseoptimalAAIsareallnewproposed exceptfor S7. This shows that focusing on macroscopic network structureandadoptingtheprincipleofattentionallocationofcom-

mon friends in the triadic closurestructure can effectively over- come the problem of network sparsity in predicting friendships betweenusersindifferentcircles.Itisnotedthatthemacroscopic networkstructureconsistsofnodepairs,their commonneighbors andthefriendsofthecommonneighbors.

Atthesametime, theperformanceofIAALPAaandIAALPAbis betterthanthatofDTandotheralgorithms,whichshowsthatthe mechanismtoselectcomplementaryAAIsbySVMiseffective.That istosay,usingtheSVMmodeltoidentifythecomplementaryAAIs andintegratingthemtogethercandeliverbetterpredictionresults thantherandomcombinationofAAIs.

ItcanalsobeseenfromTables5and6thatIAALPAaissigniﬁ- cantlybetterthanSVM.TheseresultsshowthattheIAALPAframe- workproposedinthisstudyisfarsuperiortotheSVMframework.

Additionally,from Figs. 6and 7,it can be concluded that the accuracyofDTishigherthanthatofothernon-combinationAAIs,

(10)

Table 7

Performance comparison under various networks in the Twitter experiments.

Network ID Node count Circle count Node count per circle S7 S12 IAALPAa

200214366 59 12 4.9167 0.8262 0.8454 0.8267

29514951 39 7 5.5714 0.8327 0.8306 0.834

98633794 33 5 6.6 0.8603 0.8563 0.8577

7888452 80 9 8.8889 0.8906 0.8894 0.8862

35012277 20 2 10 0.9361 0.9164 0.9394

356963 114 11 10.3636 0.7532 0.7473 0.7673

15070932 21 2 10.5 0.82 0.8323 0.8283

351092905 55 4 13.75 0.9172 0.9066 0.9139

18886852 70 5 14 0.8917 0.8893 0.8959

16652550 70 5 14 0.8914 0.8894 0.8908

134943586 209 11 19 0.8877 0.886 0.89

Average AUC 0.8643 0.8626 0.8664

Table 8

Performance comparison under various networks in the Google + experiments.

Network ID Node count Circle count Node count per circle S7 S12 IAALPAa

100715738096376666180 46 3 15.3333 0.7308 0.8833 0.9189

104672614700283598130 32 2 16 0.8281 0.9218 0.9275

114122960748905067938 222 6 37 0.8658 0.8752 0.8661

104607825525972194062 80 2 40 0.8645 0.9113 0.9172

113356364521839061717 116 2 58 0.8354 0.9077 0.9102

103503116383846951534 351 5 70.2 0.8269 0.8581 0.8648

107362628080904735459 168 2 84 0.8298 0.8492 0.8424

107203023379915799071 172 2 86 0.8493 0.9305 0.9347

115516333681138986628 305 3 101.6667 0.915 0.9249 0.9238

110971010308065250763 521 4 130.25 0.8637 0.9025 0.9096

101499880233887429402 514 2 257 0.9557 0.9625 0.9631

Average AUC 0.8513 0.9025 0.9071

Fig. 8. Performance comparison in different node density in Twitter.

Fig. 9. Performance comparison in different node density in Google + .