Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Understanding the Hidden Web
Pierre Senellart
17 June 2005
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
The Hidden Web
Definition (Hidden Web)
The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof the World Wide Web.
Size estimate (2001) :500 times larger than thesurface Web.
How to understand it and benefit from its content?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
The Hidden Web
Definition (Hidden Web)
The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof the World Wide Web.
Size estimate (2001) :500 times larger than thesurface Web.
How to understand it and benefit from its content?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
The Hidden Web
Definition (Hidden Web)
The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof the World Wide Web.
Size estimate (2001) :500 times larger than thesurface Web.
How to understand it and benefit from its content?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Understanding the Hidden Web
Purpose
Intensionalindexing of the Hidden Web High-levelqueries
⇒asemanticsearch engine over the Hidden Web
In a fully, unsupervized, way!
Difficultandbroadproblem. Possible restriction to some domain (e.g. publications).
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Understanding the Hidden Web
Purpose
Intensionalindexing of the Hidden Web High-levelqueries
⇒asemanticsearch engine over the Hidden Web
In a fully, unsupervized, way!
Difficultandbroadproblem. Possible restriction to some domain (e.g. publications).
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Understanding the Hidden Web
Purpose
Intensionalindexing of the Hidden Web High-levelqueries
⇒asemanticsearch engine over the Hidden Web
In a fully, unsupervized, way!
Difficultandbroadproblem. Possible restriction to some domain (e.g. publications).
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Results Queries querying
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
1 Introduction
2 Process description Web Service Discovery
Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3 Summary
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Discovery
Crawling the World Wide Web for:
HTML forms implementing a Web Service UDDI registries
WSDL descriptions
Other resources (XML, HTML, Web as a full-text index. . . )
Only interested in Web Services withno side effects:
Ok Yellow Pages
Publication databases . . .
Not Ok Booking services
Mailing list management . . .
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Discovery
Crawling the World Wide Web for:
HTML forms implementing a Web Service UDDI registries
WSDL descriptions
Other resources (XML, HTML, Web as a full-text index. . . )
Only interested in Web Services withno side effects:
Ok Yellow Pages
Publication databases . . .
Not Ok Booking services
Mailing list management . . .
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Discovery
Crawling the World Wide Web for:
HTML forms implementing a Web Service UDDI registries
WSDL descriptions
Other resources (XML, HTML, Web as a full-text index. . . )
Only interested in Web Services withno side effects:
Ok Yellow Pages
Publication databases . . .
Not Ok Booking services
Mailing list management . . .
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Discovery
Crawling the World Wide Web for:
HTML forms implementing a Web Service UDDI registries
WSDL descriptions
Other resources (XML, HTML, Web as a full-text index. . . )
Only interested in Web Services withno side effects:
Ok Yellow Pages
Publication databases . . .
Not Ok Booking services
Mailing list management . . .
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
1 Introduction
2 Process description Web Service Discovery
Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3 Summary
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Analyzing HTML forms
Analyzing thestructureof HTML forms.
Issues
What are therelevantform fields?
What is theconcretetype of each field?
What is thelabelof each field?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Analyzing HTML forms
Analyzing thestructureof HTML forms.
Issues
What are therelevantform fields?
What is theconcretetype of each field?
What is thelabelof each field?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Analyzing HTML forms
Analyzing thestructureof HTML forms.
Issues
What are therelevantform fields?
What is theconcretetype of each field?
What is thelabelof each field?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Probing HTML forms
ProbingHTML forms to retrieve sample HTML answer pages:
Withdictionarywords
Withnonsensewords
Withdomainwords
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Probing HTML forms
ProbingHTML forms to retrieve sample HTML answer pages:
Withdictionarywords
Withnonsensewords
Withdomainwords
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Probing HTML forms
ProbingHTML forms to retrieve sample HTML answer pages:
Withdictionarywords
Withnonsensewords
Withdomainwords
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Query-answer webpages
Extractdata from query-answer webpages.
Issues
What partof the webpage contains the answer?
How to extractstructured content?
How tolabelthis structured content?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Query-answer webpages
Extractdata from query-answer webpages.
Issues
What partof the webpage contains the answer?
How to extractstructured content?
How tolabelthis structured content?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Query-answer webpages
Extractdata from query-answer webpages.
Issues
What partof the webpage contains the answer?
How to extractstructured content?
How tolabelthis structured content?
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
R
OADR
UNNERExample: ROADRUNNERinformation extraction engine
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
1 Introduction
2 Process description Web Service Discovery
Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3 Summary
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Conceptual Model
IsAontologyofconcepts(simple DAG)
Person
Man Woman
Thing
Proceedings Article Book Publication
n-ary typedroles
AuthorOf(Person,Publication) HasName(Person,Name)
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Conceptual Model
IsAontologyofconcepts(simple DAG)
Person
Man Woman
Thing
Proceedings Article Book Publication
n-ary typedroles
AuthorOf(Person,Publication) HasName(Person,Name)
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic representation of a service
What is a service described by?
An-uple oftypedinput parameters Acomplex(= nested) type of its output
Semanticrelationsbetween inputs and outputs Definition (Complex types)
S: set of concepts
T ←−S|<T, . . . ,T>|T∗
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic representation of a service
What is a service described by?
An-uple oftypedinput parameters Acomplex(= nested) type of its output
Semanticrelationsbetween inputs and outputs Definition (Complex types)
S: set of concepts
T ←−S|<T, . . . ,T>|T∗
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic representation of a service
What is a service described by?
An-uple oftypedinput parameters Acomplex(= nested) type of its output
Semanticrelationsbetween inputs and outputs Definition (Complex types)
S: set of concepts
T ←−S|<T, . . . ,T>|T∗
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic representation of a service
What is a service described by?
An-uple oftypedinput parameters Acomplex(= nested) type of its output
Semanticrelationsbetween inputs and outputs Definition (Complex types)
S: set of concepts
T ←−S|<T, . . . ,T>|T∗
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Services and queries
Example
Servicegiving authors from publication titles
A*←WrittenBy(P,A),HasTitle(P,T),Input(T)
Example Query:
<A,T*>*←WrittenBy(P,A),Article(P),HasTitle(P,T), KeywordOf(“xml”,P)
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Services and queries
Example
Servicegiving authors from publication titles
A*←WrittenBy(P,A),HasTitle(P,T),Input(T)
Example Query:
<A,T*>*←WrittenBy(P,A),Article(P),HasTitle(P,T), KeywordOf(“xml”,P)
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Managing extensional information
How to representextensionalinformation (i.e. documents) in this formalism?
Definition
A document is a service with no input.
Complex types:naturalrepresentation of a DTD.
(Disjunctionsa|bsimulated by(a?,b?)).
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic Interpretation of a Service
How to analyze a Web Service?
Field labels, variable names, tag names Concrete type descriptions
(e.g. \d{4}-\d{2}-\d{2}is a date)
Linguistic analysis of plain text descriptions and pages linking to the service
Note:Extracting concepts is easier than extracting relations between concepts.
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic Interpretation of a Service
How to analyze a Web Service?
Field labels, variable names, tag names Concrete type descriptions
(e.g. \d{4}-\d{2}-\d{2}is a date)
Linguistic analysis of plain text descriptions and pages linking to the service
Note:Extracting concepts is easier than extracting relations between concepts.
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic Interpretation of a Service
How to analyze a Web Service?
Field labels, variable names, tag names Concrete type descriptions
(e.g. \d{4}-\d{2}-\d{2}is a date)
Linguistic analysis of plain text descriptions and pages linking to the service
Note:Extracting concepts is easier than extracting relations between concepts.
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic Interpretation of a Service
How to analyze a Web Service?
Field labels, variable names, tag names Concrete type descriptions
(e.g. \d{4}-\d{2}-\d{2}is a date)
Linguistic analysis of plain text descriptions and pages linking to the service
Note:Extracting concepts is easier than extracting relations between concepts.
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Semantic Interpretation of a Service
How to analyze a Web Service?
Field labels, variable names, tag names Concrete type descriptions
(e.g. \d{4}-\d{2}-\d{2}is a date)
Linguistic analysis of plain text descriptions and pages linking to the service
Note:Extracting concepts is easier than extracting relations between concepts.
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
1 Introduction
2 Process description Web Service Discovery
Wrapping Web Service Descriptions Web Service Semantic Analysis Web Service Indexing and Querying
3 Summary
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Indexing and Querying
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Issues
Subsumptionof input/output parameters Missinginput parameters
Compositionof webservices
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Indexing and Querying
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Issues
Subsumptionof input/output parameters Missinginput parameters
Compositionof webservices
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Indexing and Querying
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Issues
Subsumptionof input/output parameters Missinginput parameters
Compositionof webservices
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Indexing and Querying
Given a query, represented as an Analyzed Web Service, how to know which known web services to query?
Issues
Subsumptionof input/output parameters Missinginput parameters
Compositionof webservices
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Differences with classical database querying
Three main differences:
Information can be queried only throughviews (Local As View)
Nestedtypes
Incompleteinformation Three sources of complexity!
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Differences with classical database querying
Three main differences:
Information can be queried only throughviews (Local As View)
Nestedtypes
Incompleteinformation Three sources of complexity!
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Differences with classical database querying
Three main differences:
Information can be queried only throughviews (Local As View)
Nestedtypes
Incompleteinformation Three sources of complexity!
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Differences with classical database querying
Three main differences:
Information can be queried only throughviews (Local As View)
Nestedtypes
Incompleteinformation Three sources of complexity!
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Differences with classical database querying
Three main differences:
Information can be queried only throughviews (Local As View)
Nestedtypes
Incompleteinformation Three sources of complexity!
Understanding the Hidden
Web Pierre Senellart
Introduction Process description
Discovery Wrappers Semantic Analysis Indexing and Querying
Summary
Web Service Semantic Interpretation Process
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Results Queries querying
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web