• Aucun résultat trouvé

Neural Variational Entity Set Expansion for Automatically Populated Knowledge Graphs

N/A
N/A
Protected

Academic year: 2022

Partager "Neural Variational Entity Set Expansion for Automatically Populated Knowledge Graphs"

Copied!
2
0
0

Texte intégral

(1)

Neural Variational Entity Set Expansion for Automatically Populated Knowledge Graphs

Pushpendre Rastogi

◦∗

Adam Poliak

Vince Lyzinski

4†

Benjamin Van Durme

Johns Hopkins University 4UMass. Amherst

Abstract

We propose Neural Variational Set Expansion to extract actionable information from a noisy knowledge graph (KG) and propose a gen- eral approach for increasing the interpretability of recommendation systems. We demonstrate the usefulness of applying a variational au- toencoder to the Entity Set Expansion task based on a realistic auto- matically generated KG.

1 Neural Variational Set Expansion

Imagine a physician trying to pin-point a specific diagnosis or a journalist investigating abuses of governmental power. In both scenarios, a domain expert may try to find answers based on prior known, relevant entities – either a list of diagnoses of with similar symptoms that a patient is experiencing or a list of known conspirators.

Instead of manually looking for connections between potential answers and prior knowledge, a searcher would like to rely on an automaticRecommender to find the connections and answers for them, i.e. related entities.

In the information retrieval (IR) community, Entity Set Expansion (ESE) is the established task of recom- mending entities that are similar to a provided seed of entities. The physician and journalist in our example can not fully take advantage of IR advances in ESE for two main reasons. Recent advances 1) often assume access to a clean, large Knowledge Graph and 2) are uninterpretable.

Many advanced ESE algorithms rely on manually curated, clean Knowledge Graphs (KG). In real-world settings, users rarely have access to clean KGs, and instead may rely on automatically generated KGs. Such KGs are oftennoisy because they are created from complicated and error-prone NLP processes. For example, automatic KGs may include duplicate entities, associations (relations) between entities may be missing, and entities with similar names may be incorrectly disambiguated. These imperfections prevent machine learning approaches from performing well on automatically generated KGs. Furthermore, many ESE algorithms degrade as the sparsity and unreliability of KGs increases [2, 3]. Advanced ESE methods, especially those that rely on neural networks, are uninterpretable [1]. If a physician can not explain decisions, patients may not trust her and if a journalist can not demonstrate how a certain individual is acting unethically or above the law, a resulting article may lack credibility. Furthermore, uniterpretability may limit the applications of advancements in IR, and more broadly artificial intelligence, as humans “won’t trust an A.I. unless it can explain itself.”1

We introduce Neural Variational Set Expansion (NVSE) to advance the applicability of ESE research. NVSE is an unsupervised model based on Variational Autoencoders (VAEs) that receives a query, uses a Bayesian approach to determine a latent concept that unifies entities in the query, and returns a ranked list of similar entities based on the previously determined unified latent concept. NVSE does not require supervised examples of queries and responses, nor pre-built clusters of entities. Instead, our method only requires sentences with linked entity mentions, i.e. spans of token associated with a KG entity, often included in automatically generated KGs.

Email: [email protected]Work done while the author was at JHU1 https://nyti.ms/2hR1S15

Copyrightc by the paper’s authors. Copying permitted for private and academic purposes.

In: Joint Proceedings of the First International Workshop on Professional Search (ProfS2018); the Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR); and the International Workshop on Data Search (DATA:SEARCH18). Co-located with SIGIR 2018, Ann Arbor, Michigan, USA – 12 July 2018, published at http://ceur-ws.org

49

(2)

References

[1] Mitra, B., Craswell, N.: Neural Models for Information Retrieval. ArXiv e-prints (May 2017)

[2] Pujara, J., Augustine, E., Getoor, L.: Sparsity and noise: Where knowledge graph embeddings fall short.

In: EMNLP. (2017)

[3] Rastogi, P., Lyzinski, V., Van Durme, B.: Vertex nomination on the cold start knowledge graph. Technical report, Human Language Technology Center of Excellence (2017)

50

Références

Documents relatifs

Though geographic BRF adding 25 terms from 5 documents combined with the Boolean approach worsened performance for all tasks compared to the respective base

In this section, we will give some properties of these Green functions that we will need later. The proof of the following lemma is based on elementary potential theory on the

By comparing an entity at hand to a reference set of similar entities, we can provide information on how the entity at hand differs from the typical patterns found for similar

With the advancement of semantic annotation technologies, the semantic data can be employed to significantly enhance information access by increasing the depth of analysis of

In the EntityVec model, queries are represented by the average of the embedding vectors of the entities in the query.. The entities in the query are annotated using TagMe [12]

The semantic similarity of words is exploited by these methods, providing vectors close in space when their related words are close in meaning.. These vectors are usually calculated

Our method jointly ranks items and knowledge graph entities using a Personalized PageRank procedure to produce recommendations together with their explanations..

The generalized difference explanation problem Dif Exp can be solved using the most specific similarity explanation problem SimExp msp : given I, P os and N eg, first construct a