• Aucun résultat trouvé

slides (pdf)

N/A
N/A
Protected

Academic year: 2022

Partager "slides (pdf)"

Copied!
53
0
0

Texte intégral

(1)

Unevenness in network properties

on the social Semantic Web

Raf Guns

Informatie- en Bibliotheekwetenschap

2nd Workshop on Social Aspects of the Web, 2008

1 / 19

(2)

Overview

Introduction

Two-step methodology

Unevenness

Example

Conclusions

(3)

Introduction: the Semantic Web

Social Semantic Web: RDF data containing social information

Semantic Web = complex system

short paths

clustering

skewed degree distribution: P(k)Ak−γ

3 / 19

(4)

Introduction: the Semantic Web

Social Semantic Web: RDF data containing social information

Semantic Web = complex system

short paths

clustering

skewed degree distribution: P(k)Ak−γ

(5)

Introduction: the Semantic Web

Social Semantic Web: RDF data containing social information

Semantic Web = complex system

short paths

clustering

skewed degree distribution: P(k)Ak−γ

1 2

3 4

3 / 19

(6)

Introduction: the Semantic Web

Social Semantic Web: RDF data containing social information

Semantic Web = complex system

short paths

clustering

skewed degree distribution: P(k)Ak−γ

(7)

Introduction: the Semantic Web

Social Semantic Web: RDF data containing social information

Semantic Web = complex system

short paths

clustering

skewed degree distribution:

P(k)Ak−γ

3 / 19

(8)

Introduction: Questions

Social Network Analysis with Semantic Web data

prior work by Peter Mika, Li Ding etc.

goal: study network properties of multiple aspectsone RDF graph as ‘master’

It is well-known that properties like degree distribution are skewed, but howskewed exactly?

(9)

Introduction: Questions

Social Network Analysis with Semantic Web data

prior work by Peter Mika, Li Ding etc.

goal: study network properties of multiple aspectsone RDF graph as ‘master’

It is well-known that properties like degree distribution are skewed, but howskewed exactly?

4 / 19

(10)

Step 1: extraction query

Goal: one ‘master’ graph, multiple derived graphs

How? with SPARQL, query language for RDF

SELECTqueryresult table

CONSTRUCTqueryresult RDF graph

ASK

DESCRIBE

Example:

(11)

Step 1: extraction query

Goal: one ‘master’ graph, multiple derived graphs

How? with SPARQL, query language for RDF

SELECTqueryresult table

CONSTRUCTquery result RDF graph

ASK

DESCRIBE

Example:

5 / 19

(12)

Step 1: extraction query

Goal: one ‘master’ graph, multiple derived graphs

How? with SPARQL, query language for RDF

SELECTqueryresult table

CONSTRUCTquery result RDF graph

ASK

DESCRIBE

Example:

(13)

Step 1: extraction query

Goal: one ‘master’ graph, multiple derived graphs

How? with SPARQL, query language for RDF

SELECTqueryresult table

CONSTRUCTqueryresult RDF graph

ASK

DESCRIBE

Example:

5 / 19

(14)

Step 1: extraction query

Goal: one ‘master’ graph, multiple derived graphs

How? with SPARQL, query language for RDF

SELECTqueryresult table

CONSTRUCTqueryresult RDF graph

ASK

DESCRIBE

Example:

(15)

Step 1: extraction query

Goal: one ‘master’ graph, multiple derived graphs

How? with SPARQL, query language for RDF

SELECTqueryresult table

CONSTRUCTqueryresult RDF graph

ASK

DESCRIBE

Example:

5 / 19

(16)

BASE <http://metastore.ingentaconnect.com>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX prism:

<http://prismstandard.org/namespaces/1.2/basic>

PREFIX ex: <http://example.com/ns/>

CONSTRUCT { ?author1 ex:cites ?author2 } WHERE {

?art1 a </ns/structure/Article> ; foaf:maker ?author1 ;

prism:references ?art2 .

?art2 a </ns/structure/Article> ; foaf:maker ?author2 .

}

(17)

Step 2: secondary graph to SNA format

This is optional, but. . .

most RDF and FOAF tools are quite limited

Convert RDF to format for (social) network analysis

Disadvantage: requires new conversion every time something changes in source RDF

throughpyNetConvto Pajek, GML, GraphML, . . .

7 / 19

(18)

Step 2: secondary graph to SNA format

This is optional, but. . . most RDF and FOAF tools are quite limited

Convert RDF to format for (social) network analysis

Disadvantage: requires new conversion every time something changes in source RDF

throughpyNetConvto Pajek, GML, GraphML, . . .

(19)

Step 2: secondary graph to SNA format

This is optional, but. . . most RDF and FOAF tools are quite limited

Convert RDF to format for (social) network analysis

Disadvantage: requires new conversion every time something changes in source RDF

throughpyNetConvto Pajek, GML, GraphML, . . .

7 / 19

(20)

Step 2: secondary graph to SNA format

This is optional, but. . . most RDF and FOAF tools are quite limited

Convert RDF to format for (social) network analysis

Disadvantage: requires new conversion every time something changes in source RDF

throughpyNetConvto Pajek, GML, GraphML, . . .

(21)

Step 2: secondary graph to SNA format

This is optional, but. . . most RDF and FOAF tools are quite limited

Convert RDF to format for (social) network analysis

Disadvantage: requires new conversion every time something changes in source RDF

throughpyNetConvto Pajek, GML, GraphML, . . .

7 / 19

(22)

Unevenness

Or: skewedness, inequality

Intuitive notion, but how can it be expressed?

studied in econometrics and informetrics

several measures available (not all good)

Lorenz curve: graphical measure

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: numerical measure

equivalent to Pratt’s measure

(23)

Unevenness

Or: skewedness, inequality

Intuitive notion, but how can it be expressed?

studied in econometrics and informetrics

several measures available (not all good)

Lorenz curve: graphical measure

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: numerical measure

equivalent to Pratt’s measure

8 / 19

(24)

Unevenness

Or: skewedness, inequality

Intuitive notion, but how can it be expressed?

studied in econometrics and informetrics

several measures available (not all good)

Lorenz curve: graphical measure

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: numerical measure

equivalent to Pratt’s measure

(25)

Unevenness

Or: skewedness, inequality

Intuitive notion, but how can it be expressed?

studied in econometrics and informetrics

several measures available (not all good)

Lorenz curve: graphical measure

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: numerical measure

equivalent to Pratt’s measure

8 / 19

(26)

Lorenz curve

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: G0(X) = µN22

PN

j=1(N+ 1−j)xj

N1

= twice the area under the convex Lorenz curve

(27)

Lorenz curve

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: G0(X) = µN22

PN

j=1(N+ 1−j)xj

N1

= twice the area under the convex Lorenz curve

9 / 19

(28)

Lorenz curve

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Gini index: G0(X) = µN22

PN

j=1(N+ 1−j)xj

N1

= twice the area under the convex Lorenz curve

(29)

Lorenz curve

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

× 2

Gini index: G0(X) = µN22

PN

j=1(N+ 1−j)xj

N1

= twice the area under the convex Lorenz curve

9 / 19

(30)

Example: Agrippa

Agrippa = catalogue and database of AMVC Letterenhuis, Antwerp (see http://museum.antwerpen.be/amvc_

letterenhuis/index_eng.html)

Contains information about

archival materials (manuscripts, paintings, letters, posters etc.)

creators of these materials (people and organizations)

relations between these

Storage: Jena

Queried through: SPARQL protocol

Many interesting graphs can be derived

(31)

Example: Agrippa

Agrippa = catalogue and database of AMVC Letterenhuis, Antwerp (see http://museum.antwerpen.be/amvc_

letterenhuis/index_eng.html)

Contains information about

archival materials (manuscripts, paintings, letters, posters etc.)

creators of these materials (people and organizations)

relations between these

Storage: Jena

Queried through: SPARQL protocol

Many interesting graphs can be derived

10 / 19

(32)

Example: Agrippa

Agrippa = catalogue and database of AMVC Letterenhuis, Antwerp (see http://museum.antwerpen.be/amvc_

letterenhuis/index_eng.html)

Contains information about

archival materials (manuscripts, paintings, letters, posters etc.)

creators of these materials (people and organizations)

relations between these

Storage: Jena

Queried through: SPARQL protocol

Many interesting graphs can be derived

(33)

Example: Agrippa

Agrippa = catalogue and database of AMVC Letterenhuis, Antwerp (see http://museum.antwerpen.be/amvc_

letterenhuis/index_eng.html)

Contains information about

archival materials (manuscripts, paintings, letters, posters etc.)

creators of these materials (people and organizations)

relations between these

Storage: Jena

Queried through: SPARQL protocol

Many interesting graphs can be derived

10 / 19

(34)

Example: Agrippa

Agrippa = catalogue and database of AMVC Letterenhuis, Antwerp (see http://museum.antwerpen.be/amvc_

letterenhuis/index_eng.html)

Contains information about

archival materials (manuscripts, paintings, letters, posters etc.)

creators of these materials (people and organizations)

relations between these

Storage: Jena

Queried through: SPARQL protocol

Many interesting graphs can be derived

(35)

Correspondence in Agrippa

Example: writers and recipients of correspondence in Agrippa

In other words:

Nodes = persons (sometimes acting on behalf of an organization)

Arcs = letters from writer to recipient

Simple SPARQL query:

PREFIX agrippa: <http://anet.ua.ac.be/agrippa#> CONSTRUCT {

?sender <urn:agrext#writesLetterTo> ?recipient } WHERE {

?context agrippa:hasLetterWriter ?sender .

?context agrippa:hasRecipient ?recipient . }

11 / 19

(36)

Correspondence in Agrippa

Example: writers and recipients of correspondence in Agrippa

In other words:

Nodes = persons (sometimes acting on behalf of an organization)

Arcs = letters from writer to recipient

Simple SPARQL query:

PREFIX agrippa: <http://anet.ua.ac.be/agrippa#> CONSTRUCT {

?sender <urn:agrext#writesLetterTo> ?recipient } WHERE {

?context agrippa:hasLetterWriter ?sender .

?context agrippa:hasRecipient ?recipient . }

(37)

Correspondence in Agrippa

Example: writers and recipients of correspondence in Agrippa

In other words:

Nodes = persons (sometimes acting on behalf of an organization)

Arcs = letters from writer to recipient

Simple SPARQL query:

PREFIX agrippa: <http://anet.ua.ac.be/agrippa#>

CONSTRUCT {

?sender <urn:agrext#writesLetterTo> ?recipient } WHERE {

?context agrippa:hasLetterWriter ?sender .

?context agrippa:hasRecipient ?recipient . }

11 / 19

(38)

Centrality

Degree centrality: number of connections to a node

Betweenness centrality: how important is this node for establishing short paths between other nodes?

Closeness centrality: how fast can other nodes be reached from this one?

Example of unevenness (in degree and out degree):

(39)

Centrality

Degree centrality: number of connections to a node

Betweenness centrality: how important is this node for establishing short paths between other nodes?

Closeness centrality: how fast can other nodes be reached from this one?

Example of unevenness (in degree and out degree):

12 / 19

(40)

Centrality

Degree centrality: number of connections to a node

Betweenness centrality: how important is this node for establishing short paths between other nodes?

Closeness centrality: how fast can other nodes be reached from this one?

Example of unevenness (in degree and out degree):

(41)

Centrality

Degree centrality: number of connections to a node

Betweenness centrality: how important is this node for establishing short paths between other nodes?

Closeness centrality: how fast can other nodes be reached from this one?

Example of unevenness (in degree and out degree):

12 / 19

(42)

Lorenz curves for centrality measures

Degree centrality (DC)

0.2 0.4 0.6 0.8 1.0

L(p)

(43)

Lorenz curves for centrality measures

Betweenness centrality (BTC)

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

L(p)

14 / 19

(44)

Lorenz curves for centrality measures

Closeness centrality (CC)

0.2 0.4 0.6 0.8 1.0

L(p)

(45)

Discussion

According to Lorenz curve:

BTC is more uneven than DC and CC

slight overlap between DC and CC

According to Gini: G0(BTC)<G0(DC)<G0(CC)

Why such huge differences?

16 / 19

(46)

Discussion

According to Lorenz curve:

BTC is more uneven than DC and CC

slight overlap between DC and CC

According to Gini: G0(BTC)<G0(DC)<G0(CC)

Why such huge differences?

(47)

Discussion

According to Lorenz curve:

BTC is more uneven than DC and CC

slight overlap between DC and CC

According to Gini: G0(BTC)<G0(DC)<G0(CC)

Why such huge differences?

16 / 19

(48)

Why such differences?

CC is quite even, due to the small world effect

diameterD= 11

average length of shortest paths = 3.85

BTC is veryuneven due to the bow-tie/corona structure

(49)

Why such differences?

CC is quite even, due to the small world effect

diameterD= 11

average length of shortest paths = 3.85

BTC is veryuneven due to the bow-tie/corona structure

17 / 19

(50)

In summary. . .

Simpletwo-step methodology with central place for SPARQL:

balance between power and usability

Unevenness in network measurescan be used to test hypotheses regarding network structure

Future:

testing on other (kinds of) networks

predictive power of unevenness?

(51)

In summary. . .

Simpletwo-step methodology with central place for SPARQL:

balance between power and usability

Unevenness in network measurescan be used to test hypotheses regarding network structure

Future:

testing on other (kinds of) networks

predictive power of unevenness?

18 / 19

(52)

In summary. . .

Simpletwo-step methodology with central place for SPARQL:

balance between power and usability

Unevenness in network measurescan be used to test hypotheses regarding network structure

Future:

testing on other (kinds of) networks

predictive power of unevenness?

(53)

Thank you!

Any questions?

19 / 19

Références

Documents relatifs

Vulnerability analysis of Swiss power grid (nodes) using (a) degree definition, (b) betweenness centrality, (c) closeness centrality, (d) eigenvector centrality, (e)

In support of our previous findings, the right-hand side of figure 4 illustrates that governments and parliaments are particularly responsive to the rich if a priority gap

ERRATUM ON “ENTROPY-ENERGY INEQUALITIES AND IMPROVED CONVERGENCE RATES FOR NONLINEAR..

A function is just like a procedure except that it evaluates to a result that can be used as part of an expression; for example, a function call could be used as the value for an

SCBC can be considered as an evolution of BC capable of detecting central (in the betweenness centrality sense) c-nodes and i-nodes by taking into account that these nodes do not

The problem of detecting central nodes in a graph has been extensively in- vestigated [4] and centrality indicators like degree and others based on shortest distances between

A number of years ago, a directive from the Ontario Ministry of Health advised that the thyroid-stimulating hor- mone (TSH) test was to be used as the sole screening test

The computationally rather involved betweenness centrality index is the one most frequently employed in social network analysis. However, the sheer size of many instances occurring