• Aucun résultat trouvé

A Methodological Approach for Web Sites Reengineering

N/A
N/A
Protected

Academic year: 2021

Partager "A Methodological Approach for Web Sites Reengineering"

Copied!
15
0
0

Texte intégral

(1)

RESEARCH OUTPUTS / RÉSULTATS DE RECHERCHE

Author(s) - Auteur(s) :

Publication date - Date de publication :

Permanent link - Permalien :

Rights / License - Licence de droit d’auteur :

Institutional Repository - Research Portal

Dépôt Institutionnel - Portail de la Recherche

researchportal.unamur.be

University of Namur

A Methodological Approach for Web Sites Reengineering

Estiévenart, Fabrice; Francois, Aurore; Henrard, Jean; Hainaut, Jean-Luc

Publication date: 2003

Link to publication

Citation for pulished version (HARVARD):

Estiévenart, F, Francois, A, Henrard, J & Hainaut, J-L 2003, A Methodological Approach for Web Sites Reengineering..

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal ?

Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

(2)

A Methodological

A Methodological

Approach for Web Sites

Approach for Web Sites

Reengineering

Reengineering

Estiévenart Fabrice

François Aurore

Henrard Jean

Hainaut Jean-Luc

(3)

Context

Context

Still many static web sites…

Advantage

• easy to create

• à for small web sites

Drawbacks

• data and layout are mixed up

• maintenance problems

• à out-of-date or redundant information

• à non-homogeneous design

Solution

(4)

Goals

Goals

To provide methods and tools for web sites

reengineering :

Web Page Web site

Web Site Reverse Engineering

Data Conceptual schema

Web Site Engineering

Web Page Web site

(5)

Method : overview

Method : overview

Web Page Web site Classification Web Page HTML Page Type Cleaning Web Page XHTML Page Type Semantic Enrichment META File Page Type XML Page Type XML schema Conceptual schema Conceptualisation Extraction

(6)

Method : step 1

Method : step 1

Pages classification

« Page type » = a set of pages relative to the same

concept, that display very similar information in a very

similar layout

(7)

Method : steps 2 and 3.1

Method : steps 2 and 3.1

HTML cleaning

Semantic enrichment

For each page type

• Concepts identification and description on a sample page

- « Concept » = a part of the HTML tree describing the layout, the structure and possibly the value of a certain reality

- Ex : the concept « Phone Number »

<tr>

<td align="middle" bgcolor="#FFFF66"> <b>Phone :</b>

</td >

<td bgcolor="#FFFF99">+32 71 72.23.49</td> </tr>

- Ex : the concept « Address » composed of « Street » and « City »

<table width="100%">

<tr><td><b>Address :<b></td></tr> <tr><td>Quality Street, 25</td></tr> <tr><td>London</td></tr>

(8)

Method : the META file

Method : the META file

<HTMLDescription xmlns:meta="http://www.cetic.be/FR/CRAQ-DB.htm"> <meta:element name="Department">

<html>

<head>...</head> <body>

<table>

<meta:element ref="DeptName"/> <meta:element ref="PhoneNumber"/> <meta:element ref="Address"/>

</table> </body> </html>

</meta:element>

<meta:element name="PhoneNumber"> <tr>

<td align="middle" bgcolor="#FFFF66"><b>Phone :</b></td> <td bgcolor="#FFFF99"><meta:value/></td>

</tr> </meta:element> …

(9)

• Application to other pages of the same type

- there may be layout and structure differences between pages of the same type àa same concept may have several descriptions

- Example : a layout difference

Method : step 3.2

Method : step 3.2

<tr> <td><b>Name :</b></td> </tr> <table width="100%"> <tr><td>Address :</td></tr> <tr><td>Quality Street, 25</td></tr> <tr><td>London</td></tr> </table>

- Example : a structure difference

<tr> <td><i>Name :</i></td> </tr> <table width="100%"> <tr><td>Address :</td></tr> <tr><td>New York</td></tr> <tr><td>Main Street, 110</td></tr> </table>

(10)

Method : step 4

Method : step 4

Data and schema extraction

Data extraction

• Web pages + META file à XML document

Data structure extraction

(11)

Method : step 5.1

Method : step 5.1

Schema integration

To discover relationships between concepts

To detect redundancy

1-1 f 1-1 1-1 f 1-1 1-1 f 1-1 f 1-1 1-1 1-1 f 1-1 1-1 f 1-1 f 1-1 0-N f 1-1 0-N f 1-1 0-N f 1-1 1-N f 0-1 1-1 1-1 f 1-N f 1-1 1-1 f 1-1 1-1 f 1-1 1-1 f 1-1 1-1 f 1-1 1-N f 0-1 1-N f 0-1 Staff seq: .Administrative[*] .Academic[*] .PHD[*] ProjectsList ProjectName «content» ProjectDate «content» Project seq: .ProjectName .ProjectDate P H D Person_1 key gid: key PersonName_1 «content» PersonName keyref «content» PersonFunction_1 «content» PersonFunction «content» Person exact-1: .f .f .f seq: .PersonName .PersonFunction Description_1 seq: .PersonName_1 .PersonFunction_1 .PersonAddress Description «content» DeptName «content» Department webFileName[0-1] seq: .DeptName .Description .Staff .ProjectsList Administrative Academic

(12)

Method : step 5.2

Method : step 5.2

Schema conceptualisation

1-1 staff 0-N 1-1 1-N in PROJECT Name Date PERSON Name Function Address Status id: Name DEPARTMENT Name Phone Fax Address Map Mail Description

(13)

Web Site Engineering

Web Site Engineering

PROJECT Name Date ID_DEP equ: ID_DEP acc PERSON Name Function Address Status ID_DEP id: Name acc ref: ID_DEP acc DEPARTMENT ID_DEP Name Phone Fax Address Map Mail Description id: ID_DEP acc

DB

DB

describes

describes

Migration

XML Document

Database engineering + Data migration

(14)

Tools

Tools

HTML cleaning

Tidy

Semantic enrichment

XML editor or the semantic browser (based on Mozilla) to

edit/generate the META file

Data and schema extraction

XML parsers (Java DOM)

pageType.extractSchema(METAfile)

à XMLSchema

pageType.extractData(HTML*, METAfile)

à XML

Schemas integration/conceptualisation and

database engineering

(15)

Conclusion and future work

Conclusion and future work

A method and tools to extract from a web site data

and their structure

Difficulty : enormous diversity of web pages

structures and layouts to represent the same reality

Future work

test on real-size web sites

refine the semantic enrichment step

• improve GUI

Références

Documents relatifs

are more ambitious and potentially wider than the strictly ontology- based ones. Establishing a link between semantic annotation and discourse annotation and text

Largeur 43 % Hauteur 320 pixels Marge intérieure 10 pixels Marge extérieure 5 pixels Couleur de fond: #DADE6E Bordure : groove 4 pixels HTML :. Division Partie

Largeur 43 % Hauteur 320 pixels Marge intérieure 10 pixels Marge extérieure 5 pixels Couleur de fond: #DADE6E Bordure : groove 4 pixels HTML :. Division Partie

(Modifiez quelques lignes pour comprendre quelques éléments de ce fichier, enregistrez puis appuyez sur F5 dans le navigateur pour rafraîchir la page).. II Feuille

Pour ajouter une WebView à l'application, il suffit d'inclure l'élément &lt;WebView&gt; dans le fichier de disposition des éléments (layout) qui se trouve dans: app &gt; res

L'utilisation d'un patron de conception permet de respecter un certain nombre de bonnes pratiques en développement logiciel ; Un patron de conception permet de répondre à

It differs from most other methods by its ability to generate summaries with tex- tual content as well as non textual content (images/graphics). Our method con- sists of

Inutile d’aller à la ligne dans votre éditeur (un retour à la ligne, deux espaces accolés ou un espace accolé à un retour à la ligne sont interprétés comme un unique espace