• Aucun résultat trouvé

Understanding the Hidden Web

N/A
N/A
Protected

Academic year: 2022

Partager "Understanding the Hidden Web"

Copied!
55
0
0

Texte intégral

(1)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Understanding the Hidden Web

Pierre Senellart

17 June 2005

(2)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof the World Wide Web.

Size estimate (2001) :500 times larger than thesurface Web.

How to understand it and benefit from its content?

(3)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof the World Wide Web.

Size estimate (2001) :500 times larger than thesurface Web.

How to understand it and benefit from its content?

(4)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof the World Wide Web.

Size estimate (2001) :500 times larger than thesurface Web.

How to understand it and benefit from its content?

(5)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Understanding the Hidden Web

Purpose

Intensionalindexing of the Hidden Web High-levelqueries

⇒asemanticsearch engine over the Hidden Web

In a fully, unsupervized, way!

Difficultandbroadproblem. Possible restriction to some domain (e.g. publications).

(6)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Understanding the Hidden Web

Purpose

Intensionalindexing of the Hidden Web High-levelqueries

⇒asemanticsearch engine over the Hidden Web

In a fully, unsupervized, way!

Difficultandbroadproblem. Possible restriction to some domain (e.g. publications).

(7)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Understanding the Hidden Web

Purpose

Intensionalindexing of the Hidden Web High-levelqueries

⇒asemanticsearch engine over the Hidden Web

In a fully, unsupervized, way!

Difficultandbroadproblem. Possible restriction to some domain (e.g. publications).

(8)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

(9)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

(10)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(11)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(12)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(13)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Results Queries querying

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(14)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

1 Introduction

2 Process description Web Service Discovery

Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying

3 Summary

(15)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web Service UDDI registries

WSDL descriptions

Other resources (XML, HTML, Web as a full-text index. . . )

Only interested in Web Services withno side effects:

Ok Yellow Pages

Publication databases . . .

Not Ok Booking services

Mailing list management . . .

(16)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web Service UDDI registries

WSDL descriptions

Other resources (XML, HTML, Web as a full-text index. . . )

Only interested in Web Services withno side effects:

Ok Yellow Pages

Publication databases . . .

Not Ok Booking services

Mailing list management . . .

(17)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web Service UDDI registries

WSDL descriptions

Other resources (XML, HTML, Web as a full-text index. . . )

Only interested in Web Services withno side effects:

Ok Yellow Pages

Publication databases . . .

Not Ok Booking services

Mailing list management . . .

(18)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web Service UDDI registries

WSDL descriptions

Other resources (XML, HTML, Web as a full-text index. . . )

Only interested in Web Services withno side effects:

Ok Yellow Pages

Publication databases . . .

Not Ok Booking services

Mailing list management . . .

(19)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

1 Introduction

2 Process description Web Service Discovery

Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying

3 Summary

(20)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Analyzing HTML forms

Analyzing thestructureof HTML forms.

Issues

What are therelevantform fields?

What is theconcretetype of each field?

What is thelabelof each field?

(21)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Analyzing HTML forms

Analyzing thestructureof HTML forms.

Issues

What are therelevantform fields?

What is theconcretetype of each field?

What is thelabelof each field?

(22)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Analyzing HTML forms

Analyzing thestructureof HTML forms.

Issues

What are therelevantform fields?

What is theconcretetype of each field?

What is thelabelof each field?

(23)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Probing HTML forms

ProbingHTML forms to retrieve sample HTML answer pages:

Withdictionarywords

Withnonsensewords

Withdomainwords

(24)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Probing HTML forms

ProbingHTML forms to retrieve sample HTML answer pages:

Withdictionarywords

Withnonsensewords

Withdomainwords

(25)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Probing HTML forms

ProbingHTML forms to retrieve sample HTML answer pages:

Withdictionarywords

Withnonsensewords

Withdomainwords

(26)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Query-answer webpages

Extractdata from query-answer webpages.

Issues

What partof the webpage contains the answer?

How to extractstructured content?

How tolabelthis structured content?

(27)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Query-answer webpages

Extractdata from query-answer webpages.

Issues

What partof the webpage contains the answer?

How to extractstructured content?

How tolabelthis structured content?

(28)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Query-answer webpages

Extractdata from query-answer webpages.

Issues

What partof the webpage contains the answer?

How to extractstructured content?

How tolabelthis structured content?

(29)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

R

OAD

R

UNNER

Example: ROADRUNNERinformation extraction engine

(30)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

1 Introduction

2 Process description Web Service Discovery

Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying

3 Summary

(31)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Conceptual Model

IsAontologyofconcepts(simple DAG)

Person

Man Woman

Thing

Proceedings Article Book Publication

n-ary typedroles

AuthorOf(Person,Publication) HasName(Person,Name)

(32)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Conceptual Model

IsAontologyofconcepts(simple DAG)

Person

Man Woman

Thing

Proceedings Article Book Publication

n-ary typedroles

AuthorOf(Person,Publication) HasName(Person,Name)

(33)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic representation of a service

What is a service described by?

An-uple oftypedinput parameters Acomplex(= nested) type of its output

Semanticrelationsbetween inputs and outputs Definition (Complex types)

S: set of concepts

T ←−S|<T, . . . ,T>|T∗

(34)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic representation of a service

What is a service described by?

An-uple oftypedinput parameters Acomplex(= nested) type of its output

Semanticrelationsbetween inputs and outputs Definition (Complex types)

S: set of concepts

T ←−S|<T, . . . ,T>|T∗

(35)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic representation of a service

What is a service described by?

An-uple oftypedinput parameters Acomplex(= nested) type of its output

Semanticrelationsbetween inputs and outputs Definition (Complex types)

S: set of concepts

T ←−S|<T, . . . ,T>|T∗

(36)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic representation of a service

What is a service described by?

An-uple oftypedinput parameters Acomplex(= nested) type of its output

Semanticrelationsbetween inputs and outputs Definition (Complex types)

S: set of concepts

T ←−S|<T, . . . ,T>|T∗

(37)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Services and queries

Example

Servicegiving authors from publication titles

A*←WrittenBy(P,A),HasTitle(P,T),Input(T)

Example Query:

<A,T*>*←WrittenBy(P,A),Article(P),HasTitle(P,T), KeywordOf(“xml”,P)

(38)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Services and queries

Example

Servicegiving authors from publication titles

A*←WrittenBy(P,A),HasTitle(P,T),Input(T)

Example Query:

<A,T*>*←WrittenBy(P,A),Article(P),HasTitle(P,T), KeywordOf(“xml”,P)

(39)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Managing extensional information

How to representextensionalinformation (i.e. documents) in this formalism?

Definition

A document is a service with no input.

Complex types:naturalrepresentation of a DTD.

(Disjunctionsa|bsimulated by(a?,b?)).

(40)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service?

Field labels, variable names, tag names Concrete type descriptions

(e.g. \d{4}-\d{2}-\d{2}is a date)

Linguistic analysis of plain text descriptions and pages linking to the service

Note:Extracting concepts is easier than extracting relations between concepts.

(41)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service?

Field labels, variable names, tag names Concrete type descriptions

(e.g. \d{4}-\d{2}-\d{2}is a date)

Linguistic analysis of plain text descriptions and pages linking to the service

Note:Extracting concepts is easier than extracting relations between concepts.

(42)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service?

Field labels, variable names, tag names Concrete type descriptions

(e.g. \d{4}-\d{2}-\d{2}is a date)

Linguistic analysis of plain text descriptions and pages linking to the service

Note:Extracting concepts is easier than extracting relations between concepts.

(43)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service?

Field labels, variable names, tag names Concrete type descriptions

(e.g. \d{4}-\d{2}-\d{2}is a date)

Linguistic analysis of plain text descriptions and pages linking to the service

Note:Extracting concepts is easier than extracting relations between concepts.

(44)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service?

Field labels, variable names, tag names Concrete type descriptions

(e.g. \d{4}-\d{2}-\d{2}is a date)

Linguistic analysis of plain text descriptions and pages linking to the service

Note:Extracting concepts is easier than extracting relations between concepts.

(45)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

1 Introduction

2 Process description Web Service Discovery

Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying

3 Summary

(46)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service, how to know which known web services to query?

Issues

Subsumptionof input/output parameters Missinginput parameters

Compositionof webservices

(47)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service, how to know which known web services to query?

Issues

Subsumptionof input/output parameters Missinginput parameters

Compositionof webservices

(48)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service, how to know which known web services to query?

Issues

Subsumptionof input/output parameters Missinginput parameters

Compositionof webservices

(49)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service, how to know which known web services to query?

Issues

Subsumptionof input/output parameters Missinginput parameters

Compositionof webservices

(50)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Differences with classical database querying

Three main differences:

Information can be queried only throughviews (Local As View)

Nestedtypes

Incompleteinformation Three sources of complexity!

(51)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Differences with classical database querying

Three main differences:

Information can be queried only throughviews (Local As View)

Nestedtypes

Incompleteinformation Three sources of complexity!

(52)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Differences with classical database querying

Three main differences:

Information can be queried only throughviews (Local As View)

Nestedtypes

Incompleteinformation Three sources of complexity!

(53)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Differences with classical database querying

Three main differences:

Information can be queried only throughviews (Local As View)

Nestedtypes

Incompleteinformation Three sources of complexity!

(54)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Differences with classical database querying

Three main differences:

Information can be queried only throughviews (Local As View)

Nestedtypes

Incompleteinformation Three sources of complexity!

(55)

Understanding the Hidden

Web Pierre Senellart

Introduction Process description

Discovery Wrappers Semantic Analysis Indexing and Querying

Summary

Web Service Semantic Interpretation Process

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Results Queries querying

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Références

Documents relatifs

Our approach combines 3 analysis techniques: Principal Component Analysis to select the fea- tures to be extracted from the captured data; Spectral Clustering to cluster the

2 Semantic services platforms based on OWL-S and WSMO use XML mapping languages to translate between an XML serialisation of the ontology data and the on-the-wire messages

SAWSDL-MX performs hybrid semantic Web service matching for SAWSDL operations based on both logic-based reasoning and IR-based syntactic similarity measurement, and combines the

Although the aforementioned systems use semantic Web technologies in order to facilitate WS discovery, none of them explicitly provide a tool to enable

Still, the existing repositories do not target the structured storage of semantic Web service descriptions in order to facilitate the effective service discovery but rather use

The static Web service discovery uses coarse-grained semantic Web service descriptions to find services that potentially match the user’s goal, and the dynamic offer discovery uses

At runtime, discovery of a semantic Web service is done via matching a specific user goal description against all service capability descriptions, hence, the first task

Today, if an end-user wants to discover a Web Service they must use a search engine, based on a keyword matching, to locate html pages that in turn must be parsed by the user