• Aucun résultat trouvé

ALEX: Auto-completion Learning for XML

N/A
N/A
Protected

Academic year: 2022

Partager "ALEX: Auto-completion Learning for XML"

Copied!
1
0
0

Texte intégral

(1)

Model

 A generative model based on Tree Automata

 DFAs describe possible children of a node

 Probabilities annotate transitions

System Architecture System Architecture

ALEX: Auto-completion Learning for XML

Serge

Serge Abiteboul Abiteboul Yael Amsterdamer Pierre Yael Amsterdamer Pierre Senellart Senellart Tova Tova Milo Milo

Challenges

 Intricate structure

 Integrity constraints

 Online analysis is required

Problem

XML documents are often edited manually

 Examples: XHTML, Docbook,…

 Many editors are available

Still, manually editing XML docs can be hard Auto-completion can help!

How to choose completion suggestions?

Main idea: learn probabilistically the completions from an XML corpus!

 Of the same user, or “similar” ones

 Example: XML homepage

Find the most likely XML document part given i. A model

ii. Integrity (key, inclusion, domain) constraints iii. A partial document

Do it repeatedly, and fast!

Problem statement

Complexity Results

The Editor is a plugin to Eclipse IDE based on Rinzo

 Demonstrated with real data about researchers Dedicated optimizations to allow interactivity

 Pruning options that lead to a "dead end"

 Separate structure and value generation

Configurable bound of continuation search depth

Implementation and Optimizations Techniques

Checking for the existence of a valid continuation is already NP-complete in the schema size

We can find most likely continuation in time exponential in the schema size,

polynomial in partial document and output size

Schema size is typically small!

 Algorithm to learn a model instance that is provably

the most likely estimator for a given corpus NP-complete in the schema size,

polynomial in the corpus size

A*-like algorithm to find most likely continuations

 Reducing the number of examined continuations

File system

Example documents

Load schema

Corpus Loader Corpus Loader

New document Get suggestion

Auto-completion suggestion

hasContinuation?(prefix, suffix, choice)

hasContinuation?(prefix, suffix, choice) Load example corpus

Probabilistic Generator

schema

Probabilistic Generator

schema

UI UI

Completion Finder

The Web

The user

Continuation-Test

q q 2 q

q 0 q q 1

publish ppublish

next 1- ppublish

present ppresent

$

1- ppresent

Références

Documents relatifs

Table 1: Illustrative example of the overall design of the augmented transaction table with amended learning subsystem performance scores, statistics and meta-learner information..

INALCO Institut National des Langues et Civilisations Orientales ISC Inscriptions sanscrites du Cambodge (Barth 1885) ISEAS The Institute of Southeast Asian Studies.. IT

We therefore took only the number of phonemes of the syllable into account, thus avoiding the bias of syllable weight: for example a syllable composed of five phones

How many annotations are needed? What type of corpus and which corpora have to be annotated? What quality of annotation is expected? Those are the typical questions that should

Concerning the compatibility of the video annotation tool with the other tools used for the annotation of the corpus, we had two initial requirements: the tool used would have to

Because some donor words are not adapted to conform to Japanese phonotactics, but rather include sounds and sound sequences which are absent from native vocabulary, innovative

Unit´e de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unit´e de recherche INRIA Rh ˆone-Alpes, 46 avenue F´elix Viallet, 38031 GRENOBLE Cedex

It is hoped that the constitution of the corpus #intermittent will enable researchers to work on this kind of discourse (tweets related to a controversial topic), to characterize it