• Aucun résultat trouvé

The University of Padova at CLEF 2003: Experiments to Evaluate Probabilistic Models for Automatic Stemmer Generation and Query Word Translation

N/A
N/A
Protected

Academic year: 2022

Partager "The University of Padova at CLEF 2003: Experiments to Evaluate Probabilistic Models for Automatic Stemmer Generation and Query Word Translation"

Copied!
13
0
0

Texte intégral

(1)

!

!

"##$%##& #&' (

) #* "* ##

#+ **#,'

(##$*#

# % "% # $ #

" "* $**# $$

# * -#'

&*

.$# #' /#

*"###""#$# #

# *# "*$#' ( *

# *"##*&)" #

&'

(* "##& #

# % # ! 0 #

' 1# ## &# #$#*

# ## ' ($

##* #

*# '/+' '2

#***# $ # *

' ' * . $ #

.'

/ & **#

(####

(2)

' (Æ& *

# # % ' *#

%# ' # (5

% *# #

#&#'2'26

# **# $ # *

* '2'6

# #

" # **# $ "

*'2'7'

0 %## # *# 2*#

*8' #

*8 Æ ' 9

##*#%&Æ& #

' (**# $%&Æ&

)**#* **#

%&Æ&'

( # &#$# **#

*8' '' %2' * *8

Stem Derivation

Word

Generic Prefix

+

+

ML

Generic Suffix

!25 **# $'

# ## *#'

%& Æ& # * :'

;: : 5 :* ##*#%&Æ&

;::;5: ## %&Æ& # '

(

£

:* ##5

£

:&

¾

:&

¾

: &

¾

2

*#<=#*** :2

;# # ## >

&-'

9# #&*" #*

(3)

# #5 # %

6 *8

#

' 1 % %& Æ& 6

%& Æ&

' ?

# ! # ## *#

## #* !'

!$ ## # *#

* *##' ( ##

read

ation comput er

ability

c omputation

ompiled

accus

compil

ed

ompilation omputer

!5 &# # #'

**#* %&Æ&

**#* # **# $'

(# #*&#**#5

:

¾

:

¾

:

¾

:

¾

5

**# %& %& * '

## **# Æ& Æ& *

6

#**# %&Æ&' @"

6

##%& ##Æ&'

!7 (# * ##5

Æ #*# $# *

# * %& Æ& )'

#*# # #' (

## **#5

:

: 5 : :

(4)

Stem/Derivation Estimator Prefix/Suffix

Estimator

(prefix, suffix) collection word

collection (stem, derivation)

SPLIT Stemmer

word

!75 9 (#'

:

: 5 : :

* *%&6

(##5

:

¾

 

:

¾

:2

**#'

## ##

# )2' ##

#' )2#5

&

¾

: &

¾

&

¾

2

&

¾

2 %&Æ&**#*)"

**#**$#"

*# #' ( *"

**#

#**#' # #"

# #* #

'

A## #$##

# %&*## Æ&*

'

/ ! ## #

# ## ' 9 ##

* #5

5 2:B :B6

!5 :2:76

05 2 :B:B6

#5 :2:76

5 2:7:7'

/ & # * . $

(5)

. ## * * # * * #

*#' ( ##% . 5

**# *#%#**#

**# *#'

! " #

.## ###$**

*) *#' ! ) #

# * ) *# *.'( .

# ) *

## %#' #)

#' *#**#

**#' (*###

Æ#&5 @*'

# . *# #$

( # ) # * #

*) #5 %&Æ&' 9#

. A5

% Æ *##

Æ&' ! "# *

#5

#*##" '' #6

Æ&" "###**#

'' *# Æ&6

%# *# *'' * + *

#Æ&'

9 #.# #%##!B'

00 00 00 00 00 11 11 11 11 11

= initial state = final state

00 00 00 00 00 11 11 11 11 11

00 00 00 00 11 11 11 11

00 00 00 00 11 11 11 11

000 000 000 000 111 111 111 111

stem−set split point suffix−set

! B5 .##"Æ&"'

1 #. #*# # * #

) *# * .' 9 %

**#*' (

# # "

Æ&" ## ' #"

Æ& ) # ** #" $

* $Æ&'

!

( # % * #*#

(6)

##*<"/#&"

&- #' 9 ( # # ##

)#$' (

#- ### ###*#'

/ .##

#' ( * $ *

#*# & ' *

#" **# #

Æ& ' / # A

$#*## >' ($

*# #* +Æ&#

* + ' Æ& # ) *# *

.'(##

Æ&"*** +#5

" ### % 6

&**#6%#' (&#

&# *#Æ&' 9###"#

*# ** + "-

# "**#' ( # # '

&# # Æ&"!D&#

Æ& ## #'

000 000 000 000 111 111 111 111

00 00 00 00 11 11 11 11

000 000 000 000 000 111 111 111 111 111

000 000 000 000 000 111 111 111 111 111 000 000

000 000 000 111 111 111 111 111

000 000 000 000 111 111 111 111

00 00 00 00 11 11 11 11

000 000 000 000 111 111 111 111

000 000 000 000 111 111 111 111 000 000

000 000 111 111 111 111

000 000 000 000 000 111 111 111 111 111

00 00 00 00 00 11 11 11 11 11

000 000 000 000 000 111 111 111 111 111

000 000 000 000 000

111 111 111 111 111 000 000

000 000 111 111 111 111

000 000 000 000 000 111 111 111 111 111

b a

= initial state = final state

c

!D5 ( # .*'

9 % Æ&" # . * #"

' 0 #$# ) #

Æ&##* )#' !

# ## Æ&" **# #

)Æ&' !&# Æ&" "

## **# # # "

## # )# # '

1#*

) ) # & **# * #

**# '

(7)

/ # # (1? # #

+ ' (1?+"# # "

* # ' 9 (1?

) # #"'

.(1? 5

!5

$ ##

#.%*

##*# (1?#5

£

:&

¾

(* # #*##

##'

5

##

(1?

**#

*@*"

5

£

:&

*# '

1 **#

£

#"

# * *#

Æ&"'

1*"### #

# +# $ & $ # *

*# #$ * * + ##

'2' (*- ##'

! *#*# /* * #' /

*#*# ' 1

**' *

#** ' ( ##*# --

'

' #

0 EFGGH 7'EH

! HG2H 'HF

2DEHF 7'D

# GHDG 7'H7

! "# ## % ) ## $

&8 )##$ )#

A * A $' ( *#

# )# ## # #'

## #*# * # #

## * ' * #

# * & # #

(8)

*#' 9&##

**# ' * *

* # * # ' % #

#* # %

::

::#* #

* # ' /%

& *

#

' /&

# &

' 1 * 2

# &

+' #

#

##

-

:

'

( **## # % * & ''

**# & # &' (*

#-*

#

&*#**#''

#

0

#

##

%

:

:

9 )##&

*## ' ( * # #&

# & # * ' (

* # & #

& * # #*$ &' (#

#*& * #

& # * * & ' ( ## #

%5

:

:

**#

# &

**#

# &'

(**#

*$# ) "

#

' (&##* ' (

, # #

# * # #&' !"

&+* &

#' 9 *2DII' C $

'

( & "##$ #+

# ## #"

* "# $# 8 ,*

# ' ( **#

+# , ' # #

=##

(9)

##&# #

## =1? =#1?* % !

* ## " ! 7' ( #

# # *#

+ = '' ## # #"

&' A #" = # *#

& ##' *#

2'7

=2#*" &#*# J'

=1?## 8*

* #*+ ## ##

#" = # #&# #-#& ' ! &# *#

##+&6

* +##

* ' *##

/* ' # Æ #& J!#&

2'7'D

#& J J' ( #& *# ##

' # ### *

B'2'2'

LAN

Tomcat 4.1

WebIRON

(Lucene 1.3 RC1) IRON GUI

!E5 9 =1?'

9 % E =1? # * + 5 #

0 0# # =1? # * "#

#' ( ## J#*/* #

## /*=1?'/*=1?*(

B'2/*$ =1?/*

# - # ##

= ## =

&'

=1?/*=1?J*#"# 6

#)=. &

#)/K*

$# ##'

(10)

*> !7 & #'

J *#&##-' 9#&#-

"#$####$#&' ($

%*#&*#' ##

L#&,*'

# * '

(#" #&*# ##5

# ## +6

#+#*#"

##&*+# #6

# ## )Æ#&

##$ !7#& * ##

A#& ##*#

'

1%#5

2' / #&# #- *# ! ##

+#6

' $#&# **#-*#

####-6

7' $#& Æ# !'

9*## -## %# ##'

. ##&# %#5

! "#

$ ! "# $ %& ! "%&

%'$ ( %'$

$&$"$&$

)%")%

**+ ,-""**

&.& ))' / 0 1 (

2 0 / /

&.&

! "# !

$ ! "# !$

.#$%# 1

! "#

) %1?1 %& % #

*&#*L0.,) ##' #

) ##$ !75

(11)

!5 9( 1?

05 !=0 9

#5 90M 9(99

5 !

9*#%5 #&*#

- + ## ###' / ) #

& L, #& % *" *# ###

!F'

## # * *# # #&

# ##" * * #&

& &' (0# ! % ##

## # * #$

(K(N(K(## * & #'

YYINITIAL

Algemeen Dagblad

AD

nrc handelsblad

Dutch

NH

Glasgow Herald

LosAngeles Times

English GH

LA

Glasgow Herald DocNo

Glasgow

Glasgow Herald

Title

Glasgow Herald Text GH Glasgow Herald

! F5 =9 &# !9 !7' 9 #"#&

0#.###'

1 # $ * Æ #&' # J!#& #&

## &' / Æ

J#' 9*##

<+#' 9<+#*#) #$

**%' 9#) *

# )*##<+

*## 9 '

9 # #* 3B4'

"#

/ ####$5 !0 #'

!$ +#5

% 5 ##6

5 # +# ##*#

*##/**'6

5 ##* # "

(12)

9" & '' ##

"# #*# ' (

"#* ! !'

$ #

!##*-###)

##'

(*#2* ### +

#&'

#'

)) * +,

- # .

% 2B2HGH'HG GEHH2'GE 277F'GG BGGE'7 GBGG'2

2BH'B GGEH7'EE 27FEFD'B BHFE2'B7 2GH'E2

$% 27GEGF'77 GH2HB'2H 27GBFD'GB D7E'2G 2BGH'F2

2B2EGH'FH H22HE'7 2B7BFG'DG BHE'G H'HH

)- 2DFF HBE 2GD GH 7EG

(*#25 =#*## 7('

? # ## * #

# * # *

#' &*(1?

##'

(*# # *

#'

) *+,

- # .

% B'22 B'GE 7B'H 7B'FE 7H'F

B'GB BD'E 7F'22 7G'2F 7G'D

$% B'DF BD'EF 7E'EG 7B'EE B'DE

B7'BH BD'GF 7F'GG 7D'D7 B7'B

(*#5 9 7('

(*#7 #&="*

= * = * #

'

/( '*+,

- # .

% B'D2 7H'BD 7E'DH 7E'7 B'E

B2'DB B7' 7F'G 7G'7H 7H'GD

$% 7H'EE B' 7F'D7 77'E 7H'H

B'DD B2'EG 7G'F7 7B'FH B'F

(*#75 &=" 7(

? ! 0 # #+

(13)

((1? +#,' (&#

%*#**##

##$#*#' / # #

% +*## '

( &###

#)#' ( *

"##$ #* *"##$' (

#* )#*

**##' /**## %

#*## )# &

%'

324 ' 9 ' < ?' ! ' #' 9 =#

(& ' ' '<# J'0-# ' O#$

" # $ %& ' " #

( &"()**)&$'<#N.#*7'

34 ' < ?' ! ' #' ( + 0"* 9#

' '' '!''0'O.''9'!&'=''(

+ & , & %& - . "

/ & "/ )**) 22F82G' ?

?DDD<#N.#*'

374 ' =* <'.' J' ( .## #

#+?J7287GH2HH7'

3B4 0''?-' "()**0 - 9#*# 30'

Références

Documents relatifs

[r]

In CLEF 2003, University of Tampere (UTA) utilized the UTACLIR (Hedlund &amp; al. 2002) system for topic translation. The approach of separate indexes was followed, and three

In effect, we conducted three sets of experiments: (i) on the four language small multilingual set (English, French, German, and Spanish), (ii) on the six languages for which we have

As we did not have any German-to-Italian translation resources, we used the Babel Fish translation service provided by Altavista.com for translating German topics into Italian,

Thus transitive translation of queries using English as a pivot language was used to search Italian document collections for German queries without any direct bilingual dictionary or

Query expansion from the merged document collection (used for the multilingual task) of Spanish, English, French, and German also resulted in improvement in retrieval performance,

The resulting mean average precision for these various indexing approaches is shown in Table 5b (German and Dutch corpora), in Table 5c (Swedish and Finnish

As in our previous contribution to CLEF 2002 [18], our proposal for single word term conflation keeps being based on exploiting the lexical level in two phases: firstly, by solving