Semantic Link Prediction through Probabilistic Description Logics
Kate Revoredo1, Jos´e Eduardo Ochoa Luna2, and Fabio Gagliardi Cozman2
1 Departamento de Inform´atica Aplicada, Unirio Av. Pasteur, 458, Rio de Janeiro, RJ, Brazil
2 Escola Polit´ecnica, Universidade de S˜ao Paulo, Av. Prof. Mello Morais 2231, S˜ao Paulo - SP, Brazil
[email protected],[email protected],[email protected]
Abstract. Predicting potential links between nodes in a network is a problem of great practical interest. Link prediction is mostly based on graph-based features and, recently, on approaches that consider the se- mantics of the domain. However, there is uncertainty in these predictions;
by modeling it, one can improve prediction results. In this paper, we pro- pose an algorithm for link prediction that uses a probabilistic ontology described through the probabilistic description logiccrALC. We use an academic domain in order to evaluate this proposal.
1 Introduction
Many social, biological, and information systems can be well described by net- works, where nodes represent objects (individuals), and links denote the rela- tions or interactions between nodes. Predicting a possible link in a network is an interesting issue that has recently gained attention, due to the growing inter- est in social networks. For instance, one may be interested on finding potential friendship between two persons in a social network, or a potential collaboration between two researchers. Thus link prediction [12, 20] aims at predicting whether two nodes (i.e. people) should be connected given that we know previous infor- mation about their relationships or interests. A common approach is to exploit the network structure, where numerical information about nodes is analyzed [12, 20, 9]. However, knowledge about the objects represented in the nodes can improve prediction results. For instance consider that the researchers Joe and Mike do not have a publication in common, thus they do not share a link in a collaboration network. Moreover, graph features do not indicate a potential link between them. However, they have published in the same journal and they both teach the same course in their respectively universities. This information can be an indication of a potential collaboration between them. Given this, approaches that are based on the semantics related to the domain of the objects represented by the nodes [21, 18] have been proposed. In some of them, an ontology modeling the domain and the object interests were used in the prediction task.
However, there is uncertainty in such predictions. Often, it is not possible to guarantee the relationship between two objects (nodes). This is maybe due
to the fact that information about the domain is incomplete. Thus, it would be interesting if link prediction approaches could handle the probability of a link conditioned on the information about the domain. In our example, knowing that the probability of the relationship between Joeand M ike conditioned on the knowledge of them publishing in the same journal and teaching the same course is high implies a link between them in the network; otherwise, a link is not suggested. In graph-based approaches, probabilistic models learned through machine learning algorithms were used for link prediction. Some examples of probabilistic models are Probabilistic Relational Model (PRM) [6], Probabilistic Entity Relationship Model (PERM) [7] and Stochastic Relational Model (SRM) [22]. On approaches based on semantic we claim that ontologies must be used to model the domain. Therefore, to model uncertainty, probabilistic approaches, such as probabilistic ontologies, must be considered.
An ontology can be represented through a description logic [2], which is typi- cally a decidable fragment of first-order logic that tries to reach a practical bal- ance between expressivity and complexity. To encode uncertainty, a probabilistic description logic (PDL) must be contemplated. The literature contains a number of proposals for PDLs [8, 10, 19, 13]. In this paper we adopt a recently proposed PDL, called CredalALC(crALC) [4, 16, 5], that extends the popular logicALC [2]. In crALC one can specify sentences such asP(Professor|Researcher) = 0.4, indicating the probability that an element of the domain is a Professor given that it is aResearcher. These sentences are calledprobabilistic inclusions. Exact and approximate inference algorithms that deal with probabilistic inclusions have been proposed [4, 5], using ideas inherited from the theory of Relational Bayesian Networks (RBN)[11].
In this paper, we propose to use a probabilistic ontology defined with the PDLcrALC for semantic link prediction.
The paper is organized as follows. Section 2 reviews basic concepts of PDLs andcrALC. Section 3 presents our algorithm for semantic link prediction through the PDL crALC. Experiments are discussed in Section 4, and Section 5 con- cludes the paper.
2 Probabilistic Description Logics and crALC
Description logics (DLs) form a family of representation languages that are typi- cally decidable fragments of first order logic (FOL) [2]. Knowledge is expressed in terms of individuals, concepts, and roles. The semantic of a description is given by a domain D (a set) and an interpretation ·I (a functor). Individuals represent objects through names from a setNI={a, b, . . .}. Eachconceptin the set NC={C, D, . . .} is interpreted as a subset of a domainD. Eachrolein the setNR ={r, s, . . .}is interpreted as a binary relation on the domain.
Several probabilistic descriptions logics (PDLs) have appeared in the litera- ture. Heinsohn [8], Jaeger [10] and Sebastiani [19] consider probabilistic inclusion axioms such asPD(Professor) =α, meaning that a randomly selected object is a Professorwith probabilityα. This characterizes adomain-basedsemantics: prob-
abilities are assigned to subsets of the domainD. Sebastiani also allows inclusions such asP(Professor(John)) =α, specifying probabilities over the interpretations themselves. For example, one interpretsP(Professor(John)) = 0.001 as assigning 0.001 to be the probability of the set of interpretations whereJohnis aProfessor.
This characterizes an interpretation-basedsemantics.
The PDLcrALC is a probabilistic extension of the DLALC that adopts an interpretation-based semantics. It keeps all constructors ofALC, but only allows concept names on the left hand side of inclusions/definitions. Additionally, in crALC one can have probabilistic inclusions such as P(C|D) =αorP(r) =β for concepts C and D, and for role r. If the interpretation of D is the whole domain, then we simply write P(C) = α. The semantics of these inclusions is roughly (a formal definition can be found in [5]) given by:
∀x∈ D : P(C(x)|D(x)) =α,
∀x∈ D, y∈ D : P(r(x, y)) =β.
We assume that every terminology is acyclic; no concept uses itself. This as- sumption allows one to represent any terminologyT through a directed acyclic graph. Such a graph, denoted byG(T), has each concept name and role name as a node, and if a conceptCdirectly uses conceptD, that is ifCandDappear respectively in the left and right hand sides of an inclusion/definition, then D is aparentof C inG(T). Each existential restriction∃r.C and value restriction
∀r.C is added to the graphG(T) as nodes, with an edge fromrand C to each restriction directly using it. Each restriction node is adeterministicnode in that its value is completely determined by its parents.
Example 1. Consider a terminology T1 with concepts A,B,C,D. Suppose P(A) = 0.9,B v A,C v Bt ∃r.D, P(B|A) = 0.45, P(C|Bt ∃r.D) = 0.5, and P(D|∀r.A) = 0.6. The last three assessments specify beliefs about partial overlap among concepts. Suppose alsoP(D|¬∀r.A) =²≈0 (conveying the existence of exceptions to the inclusion ofDin∀r.A). Figure 1 depictsG(T).
Fig. 1.G(T) for terminologyT in Example 1 and its grounding for domainD={a, b}.
The semantics ofcrALC is based on probability measures over the space of interpretations, for a fixed domain. Inferences, such asP(Ao(a0)|A) for an ABox A, can be computed by propositionalization and probabilistic inference (for exact calculations) or by a first order loopy propagation algorithm (for approximate calculations) [5].
3 Link Prediction by using crALC
In this section we describe how to apply the PDL crALC for semantic link prediction. We borrowed some syntax from the graph-based approach where each node (a person in a social network) is represented byA, B, C, and we are interested in defining whether a link betweenAandB is suitable given there is no link between these nodes. Interests between the nodes are modeled through a probabilistic ontology represented by the PDLcrALC. The prediction link task can be described as:
Given:
• a network defining relationship between objects;
• an ontology represented bycrALCdescribing the domain of the objects;
• the ontology role that defines the semantic of the relationship between objects;
• the ontology concept that describes the network objects.
Find:
• a revised network defining relationship between objects.
The proposed algorithm for link prediction receives a network of a specific domain. For instance, in a collaboration network the nodes represent researchers and the relationship can have the semantic ”has a publication with” or ”is ad- vised by”. Therefore, the ontology represented bycrALC describes the domain of publications between researchers, having concepts likeResearcher,Publication, StrongRelatedResearcher and NearCollaborator and roles like hasPublication, hasSameInstitution andsharePublication. This ontology can be learned automat- ically through a learning algorithm as the ones proposed in [15, 17]. Thus, the nodes represent instances of one of the concepts described in the PDL crALC and the semantic of the links is described by one of the roles in the PDLcrALC.
These concept and role must be informed as inputs to the proposed algorithm.
The link prediction algorithm is described in Algorithm 1.
The algorithm starts looking for all pairs of instances of the conceptCdefined as the concept that provides the semantic for the network nodes. For each pair it checks whether the corresponding nodes exist in the network (this can be improved by exploring graph-based properties). If not the probability of the link is calculated through the probability of the defined role conditioned on evidence. The evidence is provided by the instances of the ontology. As many instances the ontology have the better is the inference performed. The inference is performed through the Relational Bayesian network build from ontologyO. If the probability inferred is greater than a threshold then the corresponding link
Require: a networkN, an ontologyO, the roler(, ) representing the semantic of the network link, the conceptC describing the objects of the network and a threshold.
Ensure: a revised networkNf
1: defineNf asN
2: for allpair of instances (a, b) of conceptCdo
3: if does not exist a link between nodesaandbin the networkN then 4: infer probabilityP(r(a, b)|evidences) using the RBN created through the
ontologyO
5: if P(r(a, b)|evidences)> thresholdthen 6: add a link betweenaandbin networkNf
7: end if 8: end if 9: end for
Algorithm 1: Algorithm for link prediction throughcrALC.
is added to the network. Alternatively, when the threshold to be considered is not known a priori, a rank of the inferred links based on their probability is done and the top-k, where k would be a parameter, are chosen.
4 Preliminary Results
Experiments were run over a collaborative network of researchers. Data was ga- thered from the Lattes curriculum platform3, the public repository for Brazilian curriculum researchers. In this platform, every researcher has a unique Lattes code that allows one to link to other researchers according to: shared publica- tions, advising tasks, and examination board participations. Given this collabo- rative network we are interested in predicting further links among researchers in order to either promote further collaborations (suitable co-workers to research tasks would be suggested) or gather information about research groups. Due to form-filling errors there are many missing links among researchers; thus, we are unable to completely state co-working relationships using only the Lattes platform.
To tackle link prediction we firstly have collected information about 1200 researchers and learned a probabilistic ontology [15, 17], represented by the PDL crALC, for modeling their research interests. A simplified probabilistic ontology
3 http://lattes.cnpq.br/
is given by:
P(Publication) = 0.3 P(Board) = 0.33
P(sharePublication) = 0.22 P(wasAdvised) = 0.05 P(hasSameInstitution) = 0.14 P(sameExaminationBoard) = 0.31 ResearcherLattes≡ Person
u(∃hasPublication.Publication
u∃advises.Personu ∃participate.Board)
P(PublicationCollaborator |Researcheru ∃sharePublication.Researcher) = 0.91 P(SupervisionCollaborator |Researcheru ∃wasAdvised.Researcher) = 0.94 P(SameInstitution |Researcheru ∃hasSameInstitution.Researcher) = 0.92 P(SameBoard |Researcheru
∃sameExaminationBoard.Researcher) = 0.92
P(NearCollaborator |Researcheru ∃sharePublication.∃hasSameInstitution.
∃sharePublication.Researcher) = 0.95 FacultyNearCollaborator≡ NearCollaborator
u ∃sameExaminationBoard.Researcher P(NullMobilityResearcher |Researcheru ∃wasAdvised.
∃hasSameInstitution.Researcher) = 0.98 StrongRelatedResearcher≡Researcher
u(∃sharePublication.Researcheru
∃wasAdvised.Researcher) InheritedResearcher≡ Researcher
u(∃sameExaminationBoard.Researcheru
∃wasAdvised.Researcher)
In this probabilistic ontology concepts and probabilistic inclusions denote mu- tual research interests. For instance, aPublicationCollaboratorinclusion refers to Researchers who shares aPublication, thus relates two nodes (Researcher) in a col- laboration graph. Therefore, the conceptResearcherand the rolesharePublication are inputs to the algorithm we proposed in Algorithm 1.
To perform inferences and therefore to obtain link predictions, a proposition- alization step (a resulting relational Bayesian network) is required.
In addition, a collaboration graph, based on shared publications, was also defined. Statistical information was computed accordingly. Figure 2 depicts col- laborations among 303 researchers. Several relationships and clusterings can also be observed.
If we carefully inspect this collaboration graph (Figure 3 shows a subgraph obtained from Figure 2) we could be interested, for instance, in predicting links among researchers from different groups.
Thus, in Figure 3 one could further investigate whether a link between re- searcher R (red octagon node) and the researcher B (blue polygon node) is suitable. In order to infer this, the probability of a possible link betweenR and
Aldo Tonso
A n d r e a s K a r o l y G o m b e r t
1
Celso Ricardo Denser Pamboukian
1
I r e n e F e r n a n d e s
1
Mari Cleide Sogayar
1
Felipe Santiago Chambergo Alcalde
1
Valéria Mendes Rodas
1
P a t r i c i a B a r r o s d o s S a n t o s
1
Soraia Attie Calil Jorge
1
Fabiano Henrique Marques
1
Carlos Augusto Pereira
1
Ari José Scattone Ferreira
1
M a t e u s M e n e g h e s s o d a C o n c e i ç ã o
1
Amilton Sinatora 1 Andre Paulo Tschiptschin
Celso Pupo Pesce
1
Linilson Rodrigues Padovese
1
R o b e r t o M a r t i n s d e S o u z a
1
Cristian Camilo Viáfara Arango
1
Deniol Katsuki Tanaka
1
G i u s e p p e P i n t a ú d e
1
Wilson Luiz Guesser
1
Marcio Gustavo Di Vernieri Cuppari
1
Ricardo Fuoco
1
Miguel Angelo de Carvalho
1
Ivan Gilberto Sandoval Falleiros
1
Marcelo Nelson Páez Carreño
1
Renato Rufino Xavier
1
Célia Aparecida Lino dos Santos
1
I n é s P e r e y r a
1
Daniel Rodrigues
1
Julio Cesar Klein das Neves
1
Newton Kiyoshi Fukumasu
1
Helio Goldenstein
1
Paulo Roberto Mei
1
Andre Fabio Kohn
1
F e r n a n d o J o s e G o m e s L a n d g r a f
1
Idalina Vieira Aoki
1
Neusa Alonso-Falleiros
1
1
D a i r o H e r n á n M e s a G r a j a l e s
1
11 1
1
11
Ailton Camargo
1
Abel André Cândido Recco
1
S t e p h a n W o l y n e c
1
Flavio Beneduce Neto
1
Anna Helena Reali COSTA
Antonio Carlos Silva Costa Teixeira
Antonio Carlos Vieira Coelho
Antonio Domingues de Figueiredo
Carmen Cecilia Tadini
Hercílio Gomes de Melo
11
Jorge Andrey Wilhelms Gut
1
M a r i a E l e n a S a n t o s T a q u e d a
1
Pedro de Alcântara Pessôa Filho
1
Jose Mauricio Pinto
1
Carmen Cecilia Tadini1
E l i z a b e t e W e n z e l d e M e n e z e s
1
S u s y F r e y S a b a t o
1
Javier Telis Romero
1
F l a v i o C e s a r C u n h a G a l e a z z o
1
Aurea Yuki Sugai
1
S u z a n a C a e t a n o d a S i l v a L a n n e s
1
Pedro Vitoriano de Oliveira
1
Pricila Veiga dos Santos
1
Patrícia Ponce
1
Claudio Augusto Oller do Nascimento
1
Reinaldo Giudici
1
R o b e r t o G u a r d a n i
1
Osvaldo Chiavone Filho
1
Isabela Bellot de Souza Will
1
F r a n k H e r b e r t Q u i n a
1
Antonio Esio Bresciani
1
Airton Jose de Luna
1
Marco Giulietti
1
Cinthia Tiemi Muranaka
1
Amilcar Machulek Junior
1
Sylvana Cavedon Presti Migliavacca
1
D o u g l a s G o u v ê a
1
H e n r i q u e K a h n
1
Jose Deodoro Trani Capocchi
1
Samuel Marcio Toffoli
1
Edson Roberto Leite
1
Ing Hwie Tan
1
M a r c o s C r a m e r E s t e v e s
1
Reginaldo Muccillo
1
José Alberto Cerri
1
Jefferson Bettini
1
C a r l o s R e n a t o R a m b o
1
Victor Carlos Pandolfelli
1
Ricardo Hauch Ribeiro de Castro
1
Emílio Carlos Nelli Silva
Emilio Del Moral Hernandez
1 1
1
1 11
Rodrigo Magnabosco
1
Carlos Shiniti Muranaka
1
Fábio Cardoso Chagas
1
Walman Benicio de Castro
1
Julio Carlos Teixeira
1
Raimundo Carlos Silvério Freire
1
Suzilene Real Janasi
1
Frank Patrick Missell
1
G e r a l d o C a i x e t a G u i m a r ã e s
1
A l e x a n d r e M a g n o B a r r o s d e F r e i t a s
1
F e r n a n d o J o s e p e t t i F o n s e c a John Paul Hempel Lima
1
G e r s o n d o s S a n t o s
1
Maria Aparecida Godoy Soler Pajanian
1
Joao Paulo Sinnecker
1
Leni Campos Akcelrud
1
P a u l o S e r g i o S a n t o s
1
Rodrigo Fernando Bianchi
1
Marystela Ferreira
1
Adnei Melges de Andrade
1
Rodrigo Andrade Martinez
1
Paulo Cesar de Morais
1
Antônio Otávio de Toledo Patrocínio
1
Ely Antonio Tadeu Dirani
1
Rinaldo Gregorio Filho
1
Antonio Riul Júnior
1
Vitor Angelo Fonseca Deichmann
1
Miguel Alexandre Novak
1
P a t r i c i a A l e x a n d r a A n t u n e s
1
Everaldo Carlos Venancio
1
F e r n a n d o R e b o u ç a s S t u c c h i
Flávio Buiochi
Francisco Ferreira Cardoso
1
H e n r i q u e L i n d e n b e r g N e t o
1
Márcia Terra da Silva
1
Mauro Zilbovicius
1
M e r c i a M a r i a S e m e n s a t o B o t t u r a d e B a r r o s
1
Silvio Burrattino Melhado
1
Adriana Gouveia Rodrigo
1
Erika Paiva Tenório de Holanda
1
Fred Borges da Silva
1
Marco Antônio Penido de Rezende
1
Heliana Comin Vargas
1
Ubiraci Espinelli Lemes de Souza
1
Eliana Kimie Taniguti
1
Adriano Gameiro Vivancos
1
Leonardo Tolaine Massetto
1
F e r n a n d o H e n r i q u e S a b b a t i n i
1
T a t h y a n a M o r a t t i
1
Cláudio Vicente Mitidieri Filho
1
Viviane Miranda Araujo
1
M a r c e l a P a u l a M a r i a Z a n i n M e n e g u e t t i
1
Cláudia Nascimento de Jesus
1
Carlos Torres Formoso
1
F e r n a n d o R e s e n d e
1
1
1
Vanderley Moacyr John
1
1 1
Maria Cristina Motta de Toledo
1
Sérgio Cirelli Angulo
1
Lilia Mascarenhas Sant Agostino
1
Viviane Carillo Ferrari
1
H e l m u t B o r n
1
Giuliana Ratti
1
Wildor Theodoro Hennies
1
1
1
1
Marina Magnani
1
M a r c e l D u p r e t L o p e s B a r b o s a
1
Assis Vicente Benedetti
1
P a t r i c i a H a t s u e S u e g a m a
1
R i s o m á C h a v e s
1
Maysa Terada
1
Isolda Costa
1
Mitiko Saiki
1
Fernando Morais dos Reis
1
R o c i o d e l P i l a r B e n d e z ú H e r n á n d e z
1
Susana Ines Cordoba de Torresi
1
1 1
1 1
1
1
Henrique Eisi Toma
1
1
11
Mônica de Fátima Lobão Granero
1
I v a n E d u a r d o C h a b u
I z a b e l F e r n a n d a M a c h a d o
João Antonio Martino Luciano Mendes Camillo
1
Michele Rodrigues
1
Rudolf Theoderich Bühler
1
Salvador Pinillos Gimenez
1
V i c t o r S o n n e n b e r g
1
Aparecido Sirley Nicolett
1
Antonio Luis Pacheco Rotondaro
1
Alan Rodrigo Navia
1
Marcello Bellodi
1
Michelly de Souza
1
Marcelo Antonio Pavanello
1
S e b a s t i a o G o m e s d o s S a n t o s F i l h o
1
Milene Galeti
1
José Roberto Cardoso
1 1
L u i z L e b e n s z t a j n
1
R o b e r t o M o u r a S a l e s
1
Silvio Ikuyo Nabeta
1
Viviane Cristine Silva
1
Jose Jaime da Cruz
1
1
B e r n a r d o P i n h e i r o d e A l v a r e n g a
1
R e n a t o C a r l s o n
1
Maurício Caldora Costa
1
José Augusto Pereira da Silva
1
Wanderlei Marinho da Silva
1
Marcos Antonio Ruggieri Franco
1
Mario Leite Pereira Filho
1
P e d r o P e r e i r a d e P a u l a
1
Sérgio Luís Lopes Verardi
1
Yasmara Conceicao De Polli Migliano
1
Antonio Carlos de Jesus Paes
1
Djonny Weinzierl
1
Fabio Henrique Pereira
1
Mauricio Barbosa de Camargo Salles
1
José Roberto Castilho Piqueira
1
Renato Teodoro Ramos
1 1
Luiz Henrique Alves Monteiro
1
Vanessa Oliva Araujo
1
José Guilherme de Souza Chaui Mattos Berlinck
1
André Alves Ferreira
1
H e n r i q u e S c h ü t z e r D e l N e r o
1
M a r i l e n e d e P i n h o S a n t o s M a z z a
1
Atila Madureira Bueno
1
Iberê Luiz Caldas
1
Julio Cezar Adamowski
Jun Okamoto Junior Kazuo Nishimoto
1
1 1
Attilio Converti
1Joao Batista de Almeida e Silva
1
Tiago Martinello
1
Luis Fernando Peffi Ferreira
1 L e o b e r t o C o s t a T a v a r e s
1
Geraldo Narciso da Rocha Filho
1
André Rolim Baby
1
Elfriede Marianne Bacchi
1 Lucia Helena Terra Guerra
1
Cleber Marcos Ribeiro Dias
1
Orestes Marraccini Goncalves Paulo Carlos Kaminski
Paulo Eigi Miyagi
Paulo Sérgio Licciardi Messeder Barreto
Pedro Afonso de Oliveira Almeida
Rafael Giuliano Pileggi
1 Roberto Cesar de Oliveira Romano
1
1
A n d r é R o c h a S t u d a r t
1
Jose de Anchieta Rodrigues
1
Gracinda Marina Castelo da Silva
1
Heloísa Cristina Fernandes
1
Reinaldo Morabito Neto
1
Jose Renato Coury
1
Murilo Daniel de Mello Innocentini
1
Andrea Murillo Betioli
1
1
11 11
Esleide Lopes Casella
1
T a h W u n S o n g
1
P e d r o H e n r i q u e H e r m e s d e A r a u j o
1
Ariovaldo Bolzan
1
Alexandre Eliseu Stourdze Visconti
1
Heizir Ferreira de Castro
1
Fabio Bentes Freire
1
Teresa Cristina Brazil de Paiva
1
1
1 1
1 1
1
1 1
Carlos Eduardo Calmanovici
1
Katia Ribeiro
1
Efraim Cekinski
1
1 1
1 1
Marjorie Benegra
1
R o n a l d o C â m a r a C o z z a
1
Sara Aida Rodriguez Pulecio
1 1
111
1 1
Humberto Naoyuki Yoshimura
1
Raul Gonzalez Lima
1
Jose Daniel Biasoli de Mello
1
Cristiane Ueda
1
Sergio Shiguemi Furuie
1 1
1
Fabiane Bizinella Nardon
1
Eduardo Moacyr Krieger
1
E d u a r d o M a s s a d
1
Paulo Bandiera Paiva
1
D e n i s R o b e r t o Z a m i g n a n i
1
Vivian Rodrigues Fiales
1
Griselda Esther Jara de Garrido
1
Cecil Chow Robilotta
1
Silvia Maria de Souza Selmo Tulio Nogueira Bittencourt
1 1
Paulo Roberto do Lago Helene
1
João Cyro André
1
Ricardo Amorim Einsfeld
1
Marco Antonio Meggiolaro
1
J o s é U m b e r t o A r n a u d B o r g e s
1
Jaime Tupiassú Pinho de Castro
1
J o a q u i m B e n t o C a v a l c a n t e N e t o
1
E d u a r d o P a r e n t e P r a d o
1
L u i z F e r n a n d o C a m p o s R a m o s M a r t h a
1
Gilson Fujii
1
Luiz Eduardo Teixeira Ferreira
1
V a h a n A g o p y a n
1 1
1 1 1
1 1
11
Carina Ulsen
1 1 1
Alex Kenya Abiko
1
1
1Racine Tadeu Araujo Prado
1 11
Thiago Melanda Mendes
1
V a n e s s a G o m e s d a S i l v a
1
Maristela Gomes da Silva
1
N e i d e M a t i k o N a k a t a S a t o
1
Arnaldo Manoel Pereira Carneiro
1
Bruno Luís Damineli
1
A r t e m á r i a C o ê l h o d e A n d r a d e
1
Miguel Aloysio Sattler
1
B r u n o r o L e i t e G i o r d a n o
1
Anne Komatsu Agopyan
1
Mário Sousa Takeashi
1
W a n g S h u H u i N e y l o r A n t u n e s S t e v o
Renato Seiji Tavares
1
11 1
1 1
1 1 1
1
11
1
11 1
11 1 11
111 1111
1
Lucas Antonio Moscato
Marcos de Sales Guerra Tsuzuki
1 1
1
111
1 1
1
1
G u i l h e r m e C a n u t o d a S i l v a
1
Fig. 2.Collaboration graph among researchers.
B is calculated, P(link(R, B)|E), where E denotes evidence about researchers such as publications, institution, examination board participations and so on.
The role sharePublication is the one defining the semantic of the links in the graph. Therefore, it is through it that we must calculateP(link(R, B)|E). Since the concept PublicationCollaborator is defined by the role sharePublication and considering as evidence Researcher(R)u ∃hasSameInstitution.Researcher(B) one can inferP(link(R, B)|E) through:
Amilton Sinatora Linilson Rodrigues Padovese 1
Deniol Katsuki Tanaka 1 G i u s e p p e P i n t a ú d e
1 Wilson Luiz Guesser
1 Marcio Gustavo Di Vernieri Cuppari
1
Ricardo Fuoco 1
Andre Paulo Tschiptschin 1
Miguel Angelo de Carvalho 1
Ivan Gilberto Sandoval Falleiros 1
Celso Pupo Pesce 1 Marcelo Nelson Páez Carreño
1
Renato Rufino Xavier
1 Cristian Camilo Viáfara Arango
1 Célia Aparecida Lino dos Santos
1
R o b e r t o M a r t i n s d e S o u z a 1
I n é s P e r e y r a 1 Daniel Rodrigues
1 Julio Cesar Klein das Neves
1 Newton Kiyoshi Fukumasu
1 Helio Goldenstein
1
Paulo Roberto Mei 1
F e r n a n d o J o s e p e t t i F o n s e c a G e r s o n d o s S a n t o s 1
Maria Aparecida Godoy Soler Pajanian 1 Joao Paulo Sinnecker
1 Leni Campos Akcelrud
1 P a u l o S e r g i o S a n t o s
1
Rodrigo Fernando Bianchi 1
Marystela Ferreira 1
Adnei Melges de Andrade 1
Rodrigo Andrade Martinez 1
Paulo Cesar de Morais 1 Antônio Otávio de Toledo Patrocínio
1
Ely Antonio Tadeu Dirani
1 Rinaldo Gregorio Filho
1 Antonio Riul Júnior
1
Vitor Angelo Fonseca Deichmann 1
Miguel Alexandre Novak 1 P a t r i c i a A l e x a n d r a A n t u n e s
1 Everaldo Carlos Venancio
1 John Paul Hempel Lima
1 Francisco Ferreira Cardoso
Erika Paiva Tenório de Holanda 1 Fred Borges da Silva
1
Marco Antônio Penido de Rezende 1 Heliana Comin Vargas
1
Ubiraci Espinelli Lemes de Souza
1
Eliana Kimie Taniguti 1
Adriano Gameiro Vivancos 1
M e r c i a M a r i a S e m e n s a t o B o t t u r a d e B a r r o s
1
Leonardo Tolaine Massetto 1 F e r n a n d o H e n r i q u e S a b b a t i n i
1
T a t h y a n a M o r a t t i 1 Adriana Gouveia Rodrigo
1 Márcia Terra da Silva
1
Anna Helena Reali COSTA 1
Cláudio Vicente Mitidieri Filho 1 Viviane Miranda Araujo
1
H e n r i q u e L i n d e n b e r g N e t o
1 Marcela Paula Maria Zanin Meneguetti
1
Mauro Zilbovicius 1
Cláudia Nascimento de Jesus 1
Carlos Torres Formoso
1 F e r n a n d o R e s e n d e
1 Silvio Burrattino Melhado
1
José Roberto Cardoso
R e n a t o C a r l s o n 1
Maurício Caldora Costa 1 José Augusto Pereira da Silva
1
Jose Jaime da Cruz
1 Wanderlei Marinho da Silva
1
Marcos Antonio Ruggieri Franco 1
Carlos Shiniti Muranaka 1
Emilio Del Moral Hernandez 1
L u i z L e b e n s z t a j n 1
Mario Leite Pereira Filho 1 P e d r o P e r e i r a d e P a u l a
1
Sérgio Luís Lopes Verardi
1 Yasmara Conceicao De Polli Migliano
1 Viviane Cristine Silva
1
Antonio Carlos de Jesus Paes 1
Djonny Weinzierl 1 Fabio Henrique Pereira
1
Mauricio Barbosa de Camargo Salles
1 B e r n a r d o P i n h e i r o d e A l v a r e n g a
1 R o b e r t o M o u r a S a l e s
1
Silvio Ikuyo Nabeta 1
I v a n E d u a r d o C h a b u
1
Vanderley Moacyr John
1
1
Heloísa Cristina Fernandes 1
H e n r i q u e K a h n
1
Racine Tadeu Araujo Prado
1
Arnaldo Manoel Pereira Carneiro 1
Antonio Carlos Vieira Coelho 1 Thiago Melanda Mendes
1
Neide Matiko Nakata Sato 1
Bruno Luís Damineli 1 A r t e m á r i a C o ê l h o d e A n d r a d e
1
Orestes Marraccini Goncalves
1
Miguel Aloysio Sattler 1
B r u n o r o L e i t e G i o r d a n o
1 V a n e s s a G o m e s d a S i l v a
1 V a h a n A g o p y a n
1 Anne Komatsu Agopyan
1 Andrea Murillo Betioli
1 Antonio Domingues de Figueiredo
1 Cleber Marcos Ribeiro Dias
1
Sérgio Cirelli Angulo
1
Roberto Cesar de Oliveira Romano 1
Maristela Gomes da Silva 1
Rafael Giuliano Pileggi 1
Alex Kenya Abiko 1
Mário Sousa Takeashi 1
Silvia Maria de Souza Selmo 1
Carina Ulsen 1
Maria Elena Santos Taqueda
1
Fig. 3.Collaboration subgraph.
P(PublicationCollaborator(R)|Researcher(R)
u∃hasSameInstitution.Researcher(B)) = 0.57.
If we took a threshold of 0.60, the link between R and B would not be included.
One could gain more evidence, such as information about nodes that in- directly connect these two groups (Figure 3), denoted by I1, I2. The inference would be
P(PublicationCollaborator(R)|Researcher(R)
u∃sharePublication(I1).∃sharePublication(B)
u∃sharePublication(I2).∃sharePublication(B)) = 0.65.
Because more information was provided the probability inferred was different.
The same threshold now would preserve the link.
Other inferences are possible by considering the suggestion of links between surrounding nodes, i.e. nodes directly linked to the two nodesRandB, denoted byR1, . . . , Rk, andB1, . . . , Bn respectively. For eachi= 1, ..., kandj= 1, ..., n, calculatesP(link(Ri, Rj)|E) andP(link(Bi, Bj)|E).
As a rule, if we are interested in discovering whetherAandBcould be linked, probabilistic inference P(link(A, B)) should be performed.
In a more general framework, graph information could be useful to deal with a large number of link predictions. Note that graph adjacency allow us to address probabilistic inference for promising nodes. In a naive approach, each pair of nodes in the collaboration graph would be evaluated so, multiple probabilistic relational Bayesian inference calls would be required.
On the other hand, if graph-based information is used, such naive scheme could be improved. In our approach, two nodes are probabilistically evaluated if there is a path between them (number of incoming/outgoing edges, number of mutual friends, node distances are also considered). Thus, numerical graph- based information guides the inference process in the relational Bayesian network (linked to the probabilistic ontology). In addition, other candidates sharing any kind of evidence are also evaluated, i.e., interests based features (linked to onto- logical knowledge) allow us to further explore link prediction.
Alternatively, by completing an overall link predicting task we can devise further functionalities to the resulting collaboration network. The resulting graph can be considered as being a probabilistic network, i.e., probabilities inferred for each link could be denote strenght of the relationship.
5 Conclusion
We have presented an approach for predicting links that resorts both to graph- based and ontological information. Given a collaborative network, e.g., a social network, we encode interests and graph features through acrALCprobabilistic ontology. In order to predict links we resort to probabilistic inference. Prelimi- nary results focused on an academic domain, and we aimed at predicting links among researchers. These preliminary results showed the potential of the idea.
Previous combined approaches for link prediction [3, 1] have focused on ma- chine learning algorithms [14]. In such schemes, numerical graph-based features and ontology-based features are computed; then both features are input into a machine learning setting where prediction is performed. Differently from such approaches, in our work we adopt a generic ontology (instead of a hierarchi- cal ontology, expressing only is-a relationships among interests). Therefore, our approach uses more information about the domain to help the prediction.
Acknowledgements
The third author is partially supported by CNPq. The work reported here has received substantial support by FAPESP grant 2008/03995-5.
References
1. W. Aljandal, V. Bahirwani, D. Caragea, and H.W. Hsu. Ontology-aware classifica- tion and association rule mining for interest and link prediction in social networks.
InAAAI 2009 Spring Symposium on Social Semantic Web: Where Web 2.0 Meets Web 3.0, Standford,CA, 2009.
2. F. Baader and W. Nutt. Basic description logics. InDescription Logic Handbook, pages 47–100. Cambridge University Press, 2002.
3. D. Caragea, V. Bahirwani, W. Aljandal, and W. H. Hsu. Ontology-based link prediction in the livejournal social network. InSymposium on Abstraction, Refor- mulation and Approximation, 2009.
4. F. G. Cozman and R. B. Polastro. Loopy propagation in a probabilistic descrip- tion logic. In Sergio Greco and Thomas Lukasiewicz, editors,Second International Conference on Scalable Uncertainty Management, Lecture Notes in Artificial In- telligence (LNAI 5291), pages 120–133. Springer, 2008.
5. F. G. Cozman and R. B. Polastro. Complexity analysis and variational inference for interpretation-based probabilistic description logics. InConference on Uncertainty in Artificial Intelligence, 2009.
6. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. InProc. 16th Int. Joint Conference on Artificial Intelligence, pages 1300–
1309, 1999.
7. D. Heckerman, C. Meek, and D. Koller. Probabilistic entity-relationship models, prms, and plate models. InIn Proceedings of the 21st International Conference on Machine Learning, 2004.
8. J. Heinsohn. Probabilistic description logics. InInternational Conf. on Uncertainty in Artificial Intelligence, pages 311–318, 1994.
9. W.H. Hsu, A.L. King, M.S.R. Paradesi, T. Pydimarri, and T. Wneinger. Collab- orative and structural recommendation of friends using weblog-based social net- work analysis. InProceedings of Computational Approaches to Analysing WebLogs (AAAI), 2006.
10. M. Jaeger. Probabilistic reasoning in terminological logics. InPrincipals of Know- ledge Representation (KR), pages 461–472, 1994.
11. M. Jaeger. Relational Bayesian networks: a survey. Linkoping Electronic Articles in Computer and Information Science, 6, 2002.
12. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social net- works. InProceedings of the twelfth international conference on Information and Knowledge Management, pages 556–559, New York, NY, USA, 2003. ACM.
13. T. Lukasiewicz and U. Straccia. Managing uncertainty and vagueness in description logics for the semantic web. Web Semant., 6:291–308, November 2008.
14. T. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
15. J. Ochoa-Luna, K. Revoredo, and F.G. Cozman. Learning sentences and assess- ments in probabilistic description logics. InBobillo, F., et al. (eds.) Proceedings of the 6th International Workshop on Uncertainty Reasoning for the Semantic Web, volume 654, pages 85–96, Shangai, China, 2010. CEUR-WS.org.
16. R. B. Polastro and F. G. Cozman. Inference in probabilistic ontologies with at- tributive concept descriptions and nominals. In 4th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW) at the 7th International Semantic Web Conference (ISWC), Karlsruhe, Germany, 2008.
17. K. Revoredo, J. Ochoa-Luna, and F. Cozman. Learning terminologies in proba- bilistic description logics. In Antˆonio da Rocha Costa, Rosa Vicari, and Flavio Tonidandel, editors, Advances in Artificial Intelligence SBIA 2010, volume 6404 ofLecture Notes in Computer Science, pages 41–50. Springer / Heidelberg, Berlin, 2010.
18. M. Sachan and R. Ichise. Using semantic information to improve link prediction results in network datasets. International Journal of Computer Theory and Enge- neering, 3:71–76, 2011.
19. F. Sebastiani. A probabilistic terminological logic for modelling information re- trieval. In ACM Conf. on Research and Development in Information Retrieval (SIGIR), pages 122–130, 1994.
20. B. Taskar, M. FaiWong, P. Abbeel, and D. Koller. Link prediction in relational data. InProceedings of the 17th Neural Information Processing Systems (NIPS), 2003.
21. T. Wohlfarth and R. Ichise. Semantic and event-based approach for link predic- tion. In Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management, 2008.
22. K. Yu, W. Chu, S. Yu, V. Tresp, and Z. Xu. Stochastic relational models for discriminative link prediction. InProceedings of the Neural Information Processing Systems (NIPS), 2006.