• Aucun résultat trouvé

Multilingual Retrieval in Twente

N/A
N/A
Protected

Academic year: 2022

Partager "Multilingual Retrieval in Twente"

Copied!
2
0
0

Texte intégral

(1)

Multilingual Retrieval in Twente.

Franciska de Jong and Dennis Reidsma and Djoerd Hiemstra TKI research group, dept. of Computer Science, University of Twente

{fdejong@cs.utwente.nl}

Abstract

The participation of the University of Twente in CLEF 2003 was very low profile.

The eternal pressure on resources known to researchers world-wide caused the focus of activities to be on other projects for some time. Therefore this paper will not go into the techniques with which the results of Twente were produced but rather gives an overview of current work in retrieval at the University of Twente.

1 Current Work in Twente.

The TKI reserch group has participated and is participating in several past and ongoing Dutch and European retrieval projects concerned with the disclosure of multimedia archives. Besides tackling the problem of processing the multimedia content, many of these projects focus on using parallel textual content as an extra source of information. This content can help searching effectively on information that cannot be derived from the multimedia content in itself.

Within the Dutch project DRUID1a lot of work has been done on robust speech recognition for Dutch[2], video segmentation and information extraction and filtering[3]. The IST project ECHO2 aimed at disclosing historic film materials. The major research themes were content- based processing of the video material and speech processing. Both projects had a strong focus oncross lingual retrieval of the content, expanding the results of previous projects.

The “Waterland project” (an ongoing Dutch project) does not directly concern multilingual retrieval, but it investigates a flexible and generic architecture for distribution of multimedia content.

In Pidgin3 retrieval is not the major theme. However, crosslanguage aspects figure heavily in this project: the resulting demonstrator should be able to generate automatic “translations” of user utterances, to enable conversations between persons using different languages. The translations need not be grammatically perfect, but should be good enough to convey the correct meaning.

MUMIS4, a recently completed IST project, was aimed at disclosingvideo recordings of soccer matches. The use of parallel textual reports on those matches played a major role in this project.

Furthermore a first start with TKI research on Information Extraction was made in the MUMIS project. The most interesting development was automatic alignment and merging of the extracted information from separate sources, which should improve the quality of retrieval [1].

The above can be summarised as follows: the major part of the TKI research on retrieval focuses on multilingual access to multimedia content and in the future more work will be done on Information Extraction for Dutch and especially on the promising theme of cross document information extraction, using information from one source to improve the extraction for another source.

1http://dis.tpd.tno.nl/druid/

2http://pc-erato2.iei.pi.cnr.it/echo/

3http://www.pidgin.nl/

4http://parlevink.cs.utwente.nl/projects/mumis/

(2)

References

[1] Jan Kuper, Horacio Saggion, Hamish Cunningham, Thierry Declerck, Peter Wittenburg, and Dennis Reidsma. Intelligent multimedia indexing and retrieval through multi-source infor- mation extraction and merging. In Proceedings of the IJCAI’03, Acapulco, August 2003. to appear.

[2] Roeland Ordelman, Arjan van Hessen, Franciska de Jong, and David van Leeuwen. Speech recognition for dutch spoken document retrieval. In David Bearman and Franca Garzotto, editors,Proceedings from the ICHIM01 meeting, Volume 2: Short Papers/Posters and Demos, Milan, September 2001.

[3] Martijn Spitters and Wessel Kraaij. A language modeling approach to tracking news events.

InCBMI’01, Brescia, 2001.

Références

Documents relatifs

For the textual retrieval, the experiment where the title, description and location fields of the image caption have been indexed using Xapian and applying the proper noun

In these runs, topic field is composed by modified title like in ADJUSTEN and body field is composed by the original query text. Our system is not dependent of the query

Multilingual retrieval results are composed of documents from different languages. Therefore, it is helpful to first investigate the retrieval accuracies of results from single

The trie based indexing and retrieval engine developed by the MIRACLE team is now fully functional and has been adapted to the WebClef environment and employed in this campaign..

The medical version of the GIFT is called medGIFT 4 [4]. It is also accessible as open source and adaptations concern mainly the features used and the interface that shows the

In this paper we try to answer the following questions: (1) How does eliminating the documents of the authors written in languages other than the query language affect the

First edition limited to 600 numbered copies all accompanied by a 10 x 14,6 cm sticker... The Future

The  program of  the intern  is in  a  first step  to investigate some bandwidth allocation algorithms with cyclic scheduling and to investigate the main performance. issues