• Aucun résultat trouvé

Table Retrieval and Generation

N/A
N/A
Protected

Academic year: 2022

Partager "Table Retrieval and Generation"

Copied!
1
0
0

Texte intégral

(1)

Keynote: Table Retrieval and Generation

Krisztian Balog University of Stavanger

krisztian.balog@uis.no

Abstract

Tables are powerful and versatile tools for organizing and presenting data. Tables may be viewed as complex information objects, which summarize existing information in a structured form. Therefore, for many information needs, returning tables as search results may be more helpful to the user than serving a ranked list of items (documents or entities). This talk is centered around utilizing (relational) tables as units of retrieval.

We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. In another variant of this task, referred to as query-by-table, the input is not a keyword query, but an incomplete table. Tables can be ranked much like documents, by considering the words contained in them. Our main research objective is to move beyond lexical matching and improve table retrieval performance by incorporating semantic matching. We achieve that by representing tables and queries in multiple semantic spaces (employing both discrete sparse and continuous dense vector representations).

It may happen that the exact table the user is looking for does not exist. Our third task, termed on-the-fly table generation, addresses this very scenario: given a query, generate a relational table that contains relevant entities (as rows) along with their key properties (as columns). This problem is decomposed into three specific subtasks: (i) core column entity ranking, (ii) schema determination, and (iii) value lookup. We show that the first two subtasks are not independent of each other and can assist each other in an iterative manner.

Biography

Krisztian Balog is a full professor at the University of Stavanger where he leads the Information Access &

Interaction research group. He received his PhD from the University of Amsterdam, and worked as a postdoc at the Norwegian University of Science and Technology (NTNU), before joining the University of Stavanger. His general research interests lie in the use and development of information retrieval, information extraction, and machine learning techniques for intelligent information access tasks. His current research concerns entity-oriented and semantic search, and novel evaluation methodologies. He serves as a senior programme committee member at SIGIR, CIKM and ECIR, as an Associate Editor of the ACM Transactions on Information Systems, and as a current and former coordinator of information retrieval benchmarking efforts at TREC and CLEF.

Copyrightc by the paper’s authors. Copying permitted for private and academic purposes.

In: Joint Proceedings of the First International Workshop on Professional Search (ProfS2018); the Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR); and the International Workshop on Data Search (DATA:SEARCH18). Co-located with SIGIR 2018, Ann Arbor, Michigan, USA – 12 July 2018, published at http://ceur-ws.org

67

Références

Documents relatifs

Given a relational table R, our approach is based on what we call the query lattice of R; and our basic algorithm constructs the skyline set as the union of the answers to a subset

Furthermore, even when using a query analyser tool (MaltParser) to indentify phrases (here restricted to head and modifiers), the tensor product of all query terms (R) led to

Consistent query answering (CQA) is about formally char- acterizing and computing semantically correct answers to queries posed to a database that may fail to satisfy certain

The PR module of IDRAAQ is based on keyword-based and structure-based levels that respectively consist in: (i) a Query Expansion (QE) process relying on Arabic WordNet

For future experiments, we intend to exploit the knowledge on the impact of named entities on the retrieval process (Mandl & Womser-Hacker 2005) as well

Several experiments with different boost values and blind relevance feedback parameters were carried out for each stemmer.. The following tables 6, 7 and 8 show the results for

To avoid checking all images, we follow the annotation procedure suggested in [13]: first, CNN image classifiers are trained for each of the 16 place classes (using images from

Our key contribution is to induce a weighted sampling with replacement approach over relevance models com- bined with fusion in order to bypass the poor efficiency of running a