• Aucun résultat trouvé

Report from TREC-9

N/A
N/A
Protected

Academic year: 2022

Partager "Report from TREC-9"

Copied!
3
0
0

Texte intégral

(1)

1 Report from TREC-9

Donna Harman, Ellen Voorhees Retrieval Group Information Access Division National Institute of Standards and

Technology 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001

TREC Tasks

Answers, not documents Web searching

Beyond text

Beyond just English

Human-in-the-loop Streamed text Static text

Interactive X?Euro., Chin., Arab.

Chinese Spanish Video Speech OCR Web

Very large collection Q&A

Ad Hoc Filtering Routing

TREC-9 Tracks

• Cross-language (English to Chinese)

• Filtering

• Interactive

• Query

• Question Answering

• Spoken Document Retrieval

• Web

• Task: ad hoc search for documents written in one language using topics in another language

– 25 topics in English created by bilingual assessors; Chinese version also available – 126,937 documents; 188 MB in BIG5

– Hong Kong newspapers donated by Wiser Ltd.

• Hong Kong Commercial Data (Aug 98-Jul 99)

• Hong Kong Daily News (Feb 99-July 99)

• Takongnao (Oct 98-Mar 99)

Cross Language Track

Relevance Judgments

• Judged highest priority mono- and cross- lingual run from each group

– 39 cross (75%) / 13 mono (25%)

– 51 auto / 1 manual (Thank you, Berkeley!)

• Added top 50 documents from each judged run to the pool

• Mean actual pool size = 598 (39% of max) within expected range

% Contributions to Pool by Run Type ( Relevant documents)

Monolingual Crosslingual

13 59

28

(2)

2 Participants

BBN Technologies Fudan University

IBM T.J. Watson Research Center Johns Hopkins University

Korea Advanced Institute of Science and Technology

Microsoft Research, China MNIS-TextWise Labs National Taiwan University

More participants

Queens College, CUNY RMIT University

Telcordia Technologies, Inc.

The Chinese University of Hong Kong Trans-EZ Inc.

University of California at Berkeley University of Maryland

University of Massachusetts

Resources: dictionaries/word lists – LDC English - Mandarin word list

(~120,000 pairs)

– Chinese-English Translation Assistance (CETA) dictionary – KingSoft online bilingual dictionary – WordNet

– other local (proprietary) dictionaries

Resources: software & services

MT

– HuaJian MT system

– IBM AlphaWorks translation server – Alis Gist-in-Time MT system

English analysis

– InXight LinguistX (English linguistic analysis) – Apple Pie parser

– Brill’s POS tagger

Chinese analysis/conversion

– Various Chinese segmenters (e.g., NMSU’s ch_seg) – BIG5->GB converters (e.g., NJStar’s)

Miscellaneous

– CMU’s WEAVER translation-pair extraction – Yahoo search

English to Chinese Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall

Precision

BBN9XLA msrcn1 fdut9xl2 CHUHK00XEC1 pir0XHxD INQ7XL3 ibmcl9a KAIST9xlqm

Cross-language vs. Monolingual

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall

Precision

BBN9MONO BBN9XLA msrcn3 msrcn1 pir0Xori pir0XHxD ibmcl9m ibmcl9a

(3)

3 Average Precision by Topic:

Crosslingual

0 0.2 0.4 0.6 0.8 1

55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

What was learned from the Chinese CLIR track?

• Many approaches to English to Chinese topic translation, including use of various dictionaries, word lists, parallel text, and commercial MT systems

• Extensive set of Chinese retrieval experiments performed ranging from various n-gram methods to word based to complete language modeling

• Because of the tight focus of this track, cross-system comparison is possible

TREC 2001

• Cross language

– Chinese ? NTCIR workshop (NII, Japan) – TREC task will be English, French?Arabic

• Filtering track using new Reuters corpus

• Interactive to investigate live web

• Expanded web and QA tracks

• New video track

trec.nist.gov

Références

Documents relatifs

The objective of this study is to give an overview of the performances of composites made with two types of straw added to an earth matrix at rates of 3% and 6% by

L’objectif de ce travail est le d´ eveloppement d’une approche analytique bas´ ee sur une th´ eorie raffin´ ee de d´ eformation par cisaillement avec l’effet d’´ etirement

Based on the classification of topic clusters, using detected topics as nodes, the external links as edges, and the numbers of external links as weights of edges, the topics network

In this paper, we propose a method for interlinking RDF with multilingual labels and describe an experiment on interlinking resources with English and Chinese labels across two

Both ap- proaches essentially create a direction in the word embedding space that reflects the direction of the path from the taxonomy root to the individual

If the criterion of cosine measure of the proximity of the document and the abstract when taking into account the weight values of the individual hieroglyphs, the best

A major problem for the application of the phylogeny-based network reconstruction approach to Chinese dialect data is that it requires a reference tree as input.. Due to the