Open Archive Toulouse Archive Ouverte Official URL :

Texte intégral

(1)

Any correspondence concerning this service should be sent

to the repository administrator: tech-oatao@listes-diff.inp-toulouse.fr

This is an author’s version published in: https://oatao.univ-toulouse.fr/22090

To cite this version:

Thonet, Thibaut and Cabanac, Guillaume and Boughanem, Mohand

and Pinel-Sauvagnat, Karen Users Are Known by the Company They

Keep: Topic Models for Viewpoint Discovery in Social Networks.

(2017) In: CIKM 2017 International Conference on Information and

Knowledge Management, 6 November 2017 - 10 November 2017

(Singapore, Singapore).

Official URL :

(2)

IRIT, Université de Toulouse, CNRS

(3)

Traditionalopinion miningresearch mainly focused on product/service review analysis =⇒Identification of a review’spolarityw.r.t. a target:positive/negative

(4)
(5)
(6)
(7)
(8)
(9)

Text contentcomponent

Observed data:tokensoccurring indocumentsposted byusers

=⇒3 nested plates

Latenttopicsassigned to each token

Latentviewpointsassigned at

(10)

Text contentcomponent

Following theTopic-Aspect Model

from[Paul+, AAAI ’10], definition of

four word typesspecified by switch variables ` (level) and x (route):

Backgroundwords =⇒ ` = 0, x = 0 Viewpointwords =⇒ ` = 0, x = 1 Topicwords =⇒ ` = 1, x = 0 Viewpoint-topicwords =⇒ ` = 1, x = 1

(11)
(12)

Social interactioncomponent

Outgoing interactionsfor user u = interactions initiated by u on another user (recipientr)

I #GOP RT

FollowingSN-LDAfrom[Sachan+, WSDM ’14], viewpoints assigned to outgoing interactions (homophily)

(13)

Social interactioncomponent

. . . But outgoing interactions

insufficientfor some users

I #GOP

RT RT

@

=⇒We propose to also exploit

(14)

Social interactioncomponent

Incoming interactionsfor user u = interactions initiated by another user (senders) on u

I #GOP

RT

Viewpoint assigned to thedocument

(15)

Posterior inference

Approximate inference based on

Collapsed Gibbs Sampling

Dirichlet/Bernoulli distributions σ, ψ, θ, π, φ, ξintegrated out

Successivelysamplediscrete latent variables `, x, z, v, v0from their posterior distributions (i.e., given observations w, s, r)

(16)
(17)
(18)

∝nuv+ η 1 V nu•+ η ·nvu0+ µ 1 U nv•+ µ SNVDMvs . . . ∝nuv+ η 1 V nu•+ η · PU

u00 =1Au00 u0nvu00+ µ

1 U

P PPU

u00 =1Au00•nvu00+ µ

(19)

Dataset #Tweets #Tokens Vocabulary #Interactions Yes/Dem. No/Rep.

Indyref 589 575 270,075 2,043,204 38,942 696,654 Midterms 767 778 113,545 975,199 25,312 241,741

(20)

Dataset #Tweets #Tokens Vocabulary #Interactions Yes/Dem. No/Rep.

Indyref 589 575 270,075 2,043,204 38,942 696,654 Midterms 767 778 113,545 975,199 25,312 241,741 State-of-the-art baselines:

Topic-Aspect Model(TAM) from[Paul+, AAAI ’10]

=⇒Only text contentto discover viewpoints and topics

Social Network Latent Dirichlet Allocation(SN-LDA) from[Sachan+, WSDM ’14]

=⇒Text contentandoutgoing interactionsto discovercommunities(≈ viewpoints) and topics

Viewpoint and Opinion Discovery Unification Model(VODUM) from[Thonet+, ECIR ’16]

=⇒Text contentto discover viewpoints and topics, andparts of speechto distinguish between topic words and viewpoint-topic words

(21)

Dataset #Tweets #Tokens Vocabulary #Interactions Yes/Dem. No/Rep.

Indyref 589 575 270,075 2,043,204 38,942 696,654 Midterms 767 778 113,545 975,199 25,312 241,741 State-of-the-art baselines:

Topic-Aspect Model(TAM) from[Paul+, AAAI ’10]

Social Network Latent Dirichlet Allocation(SN-LDA) from[Sachan+, WSDM ’14]

Viewpoint and Opinion Discovery Unification Model(VODUM) from[Thonet+, ECIR ’16]

Proposed models:

SNVDM

SNVDM-GPU (τ = 10): only10 most interacting acquaintancesused in Generalized Pólya Urns

(22)
(23)
(24)
(25)
(26)
(27)

Topic: Scottish independence Neutral Viewpoint: Yes Viewpoint: No

#indyref #voteyes #indyref

scotland yes uk independence scotland salmond

vote independence #bettertogether

campaign westminster #scotdecides

scottish vote separation uk independent currency people country thanks future #yes today independent #scotland say

Topic: Energy and resources Neutral Viewpoint: Dem. Viewpoint: Rep. energy #actonclimate #4jobs

house climate #obamacare

new #p2 #jobs

gas change gop natural #climatechange obama #energy clean bills

#ff oil jobs

#kxl energy house

support #gop act economic seec watch

Reasonablecoherenceof topic words and viewpoint-topic words

Topic words indeedunbiasedtowards any viewpoints

(28)

What’s next?

Integratetime dimensionandgeolocation, e.g., to analyze candidate support during elections Design aviewpoint summarizationframework to provide Internet users withmore diversified contentand thus mitigate the filter bubble and echo chamber phenomenon

(29)

Questions?

@tthonet thonet@irit.fr @gcabanac cabanac@irit.fr @MohBoughanem boughanem@irit.fr @karenatw sauvagnat@irit.fr

(30)

Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2009). Distributed Algorithms for Topic Models. J. Mach. Learn. Res., 10, 1801–1828.

Pariser, E. (2011). The Filter Bubble: What the Internet Is Hiding from You. The Penguin Press.

Paul, M. J., & Girju, R. (2010). A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics. In Proc. of AAAI ’10 (pp. 545–550).

Sachan, M., Dubey, A., Srivastava, S., Xing, E. P., & Hovy, E. (2014). Spatial Compactness meets Topical Consistency: Jointly Modeling Links and Content for Community Detection. In Proc. of WSDM ’14 (pp. 503–512).

Sunstein, C. R. (2009). Republic.com 2.0. Princeton University Press.

Thonet, T., Cabanac, G., Boughanem, M., & Pinel-Sauvagnat, K. (2016). VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery. In Proc. of ECIR ’16 (pp. 533–545).

(31)
(32)

Indyref Midterms TAM 1.45 0.87 SN-LDA 1.18 0.64 VODUM 2.78 1.85 SNVDM-WII 2.08 1.08 SNVDM 2.49 1.15 SNVDM-GPU (τ = 10) 3.47 1.34 SNVDM-GPU (τ = ∞) 14.67 2.56

(33)

+

Urn

2. Duplicate the drawn ball

3. Put back in the urn he original ball and its duplicate

Infinite ball generator

(34)

+

Urn

2. Duplicate the drawn ball and generate parts of balls for those "similar" to the drawn ball

3. Put back in the urn the original ball, its duplicate and the parts of similar balls

Infinite ball generator

+ +

1. Randomly draw a ball from the urn

(35)

∝nuv+ η 1 V nu•+ η ·nvu0+ µ 1 U nv•+ µ SNVDMvs . . . ∝nuv+ η 1 V nu•+ η · PU

u00 =1Au00 u0nvu00+ µ

1 U

P PPU

u00 =1Au00•nvu00+ µ

. . .SNVDM-GPU

Theaddition matrixA defines theweightto put on count nvu00for each u00: Au0u00=      1 if u0= u00,

λ if u00is amongtop τ acquaintancesof u0,

0 otherwise

with 0 ≤ λ ≤ 1 (λ = 0 =⇒ “vanilla” SNVDM) and τ ∈ N

Figure

Updating...

Sujets connexes :