• Aucun résultat trouvé

Form Understanding Program-analysis-based ProFoUnd:

N/A
N/A
Protected

Academic year: 2022

Partager "Form Understanding Program-analysis-based ProFoUnd:"

Copied!
7
0
0

Texte intégral

(1)

ProFoUnd:

Program-analysis-based Form Understanding

Andreas Savvides

(work done with M. Benedikt, T.Furche and P. Senellart)

(2)

Looking deep under the surface

• Surface Web:

– Bread & butter of traditional search engines – All about hyperlinks

• Deep Web:

– Search engines struggle at surfacing hidden content

– Plethora of valuable information behind Web services and HTML forms

– Valuable data for information extraction systems, e.g. DIADEM

2

(3)

Dealing with a deep Web search interface

1. Find relevant websites

2. Identify a search interface

3. Deduce input fields of the search interface and

corresponding meta-data

4. Query a hidden database 3

(4)

JavaScript and the Deep Web

• Client-side integrity constraints enforced using JavaScript

4

(5)

ProFoUnd

The first system to provide integrity

constraint identification for deep Web search interfaces based on JavaScript analysis.

ProFoUnd’s Interface

ProFoUnd’s Architecture 5

(6)

Evaluation & Moving Forward

• 70 randomly selected real-estate websites

– 100% precision and 63% recall

• Where did we fall short?

– eval, obfuscation, system limitations

• What about…

– Runtime execution?

– Ajax?

– Server-side constraints?

6

(7)

Demonstration?

Come find us at our demo booth all day long on Friday for a showcase of the

system or simply to learn more!

7

Références

Documents relatifs

In our work, we proposed to assess data completeness, which in UI labeling equates the number of identified UI elements, via predicting this expected number with metrics

A common strategy in deep convolutional neural network for semantic segmentation tasks requires the down-sampling of an image between convolutional and ReLU layers, and then

At the crossroad of CP and MCTS, this paper presents the Bandit Search for Constraint Program- ming ( BaSCoP ) algorithm, adapting MCTS to the specifics of the CP search.

Sentilo 2 is a semantic SA system introduced in [10] that analyses the senti- ment of a sentence: it identifies the holder of an opinion, the topics and sub-topics of that opinion,

To evaluate our method on the dataset, we used the network shown in Fig.1 and described in section 2, which consists of a band-pass filter, CSP spatial projection,

The current highlights comprise modules for RDF, SPARQL, data ac- cess, conversion of result sets to objects and faceted browsing, together with corresponding user interface

For this case, we give an efficient algorithm to compute all the values attained by a given quantity (e.g. execution time) over all possible paths — i.e., the support of

For this purpose, given an environment for which electronic forms are a privileged way to exchange information and stakeholders familiar with form-based (com- puter) interaction,