• Aucun résultat trouvé

Issues When Building CLIR Applications: Outline of Invited Talk at CLEF 2004 Workshop

N/A
N/A
Protected

Academic year: 2022

Partager "Issues When Building CLIR Applications: Outline of Invited Talk at CLEF 2004 Workshop"

Copied!
1
0
0

Texte intégral

(1)

Issues When Building CLIR Applications Outline of Invited Talk at CLEF 2004 Workshop

G. Thurmair, linguatec

The talk will discuss some design issues which influence the quality of crosslingual retrieval.

1. On the Analysis Side, index creation is a key point. The index should be based on base forms, not on text forms or stems; therefore state-of-the-art stemmers are not the best tools to use. Also, there should be one index per language, to avoid clashes of terms occurring in several languages; this implies to run a language-detection component before indexing.

2. On the Query side, the quality of search crucially depends on the resource used for translation. While several techniques have been proposed, preference should be given to a multilingual conceptual network, to be used for a term-to-term translation. The closer this network matches the domain to be searched the better the search quality will be.

3. Once the search result is available, the questions of ranking of multilingual documents, as well as retranslation into the query language need to be addressed. Technology

components to support these tasks must be provided.

4. The CLIR application requires a new kind of linguistic resource which merges

requirements of terminology, formal NLP type dictionaries, conceptual hierarchies, and multilinguality like in Machine Translation. Such a multifunctional resource must be designed and maintained, and the paper ends with a list of requirements for such a resource.

As a result, a good CLIR system shows significant differences as compared to a conventional IR implementation.

Références

Documents relatifs

In this paper, we described the participation of KDEIR at CLEF eHealth 2016 Patient-Centred Information Retrieval task, where we proposed our approaches to ad-hoc search and

We participated in CLEF 2006 Indian language to English CLIR task, which is to the best of our knowledge, the first evaluation of CLIR where queries were provided in Hindi and

st-efpg – Expansion using the original title field for web query generation and final terms selected from full web pages.. Results obtained are discouraging as all our expansions

For both English and Portuguese we used a similar set of indexes as for the monolingual runs described earlier (Words, Stems, 4-Grams, 4-Grams+start/end, 4-Grams+words); for all

Three languages were addressed in the monolingual tasks (Dutch, Italian and Spanish), while in the bilingual tasks questions were formulated in five source languages (Dutch,

Four streams generate candidate answers from the Dutch CLEF corpus: Lookup, Pattern Match, Ngrams, and Tequesta.. The Table Lookup stream uses specialized knowledge bases constructed

This talk will present the fourth NTCIR Workshop, which is the latest in a series of evaluation workshops designed to enhance research in information access (IA) technologies

With respect to the 2001 CLEF evaluation (Bertoldi and Federico, 2002), the Cross Language IR (CLIR) system was modified in order to work with multi- ple translations of queries,