A Methodological Approach for Web Sites Reengineering

(1)

RESEARCH OUTPUTS / RÉSULTATS DE RECHERCHE

Author(s) - Auteur(s) :

Publication date - Date de publication :

Permanent link - Permalien :

Rights / License - Licence de droit d’auteur :

Institutional Repository - Research Portal

Dépôt Institutionnel - Portail de la Recherche

researchportal.unamur.be

University of Namur

A Methodological Approach for Web Sites Reengineering

Estiévenart, Fabrice; Francois, Aurore; Henrard, Jean; Hainaut, Jean-Luc

Publication date: 2003

Link to publication

Citation for pulished version (HARVARD):

Estiévenart, F, Francois, A, Henrard, J & Hainaut, J-L 2003, A Methodological Approach for Web Sites Reengineering..

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal ?

Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

(2)

A Methodological

Approach for Web Sites

Reengineering

Estiévenart Fabrice

François Aurore

Henrard Jean

Hainaut Jean-Luc

(3)

Context

• Still many static web sites…

–

Advantage

• easy to create

• à for small web sites

–

Drawbacks

• data and layout are mixed up

• maintenance problems

• à out-of-date or redundant information

• à non-homogeneous design

–

Solution

(4)

Goals

• To provide methods and tools for web sites

reengineering :

Web Page Web site

Web Site Reverse Engineering

Data Conceptual schema

Web Site Engineering

Web Page Web site

(5)

Method : overview

Web Page Web site Classification Web Page HTML Page Type Cleaning Web Page XHTML Page Type Semantic Enrichment META File Page Type XML Page Type XML schema Conceptual schema Conceptualisation Extraction

(6)

Method : step 1

• Pages classification

–

« Page type » = a set of pages relative to the same

concept, that display very similar information in a very

similar layout

(7)

Method : steps 2 and 3.1

•

HTML cleaning

•

Semantic enrichment

–

For each page type

• Concepts identification and description on a sample page

- « Concept » = a part of the HTML tree describing the layout, the structure and possibly the value of a certain reality

- Ex : the concept « Phone Number »

<tr>

<td align="middle" bgcolor="#FFFF66"> <b>Phone :</b>

</td >

- Ex : the concept « Address » composed of « Street » and « City »

<tr><td><b>Address :<b></td></tr> <tr><td>Quality Street, 25</td></tr> <tr><td>London</td></tr>

(8)

Method : the META file

<html>

<table>

</table> </body> </html>

</meta:element>

<td align="middle" bgcolor="#FFFF66"><b>Phone :</b></td> <td bgcolor="#FFFF99"><meta:value/></td>

</tr> </meta:element> …

(9)

• Application to other pages of the same type

- there may be layout and structure differences between pages of the same type àa same concept may have several descriptions

- Example : a layout difference

Method : step 3.2

<tr> <td><b>Name :</b></td> </tr> <table width="100%"> <tr><td>Address :</td></tr> <tr><td>Quality Street, 25</td></tr> <tr><td>London</td></tr> </table>

- Example : a structure difference

<tr> <td><i>Name :</i></td> </tr> <table width="100%"> <tr><td>Address :</td></tr> <tr><td>New York</td></tr> <tr><td>Main Street, 110</td></tr> </table>

(10)

Method : step 4

• Data and schema extraction

–

Data extraction

• Web pages + META file à XML document

–

Data structure extraction

(11)

Method : step 5.1

• Schema integration

–

To discover relationships between concepts

–

To detect redundancy

1-1 f 1-1 1-1 f 1-1 1-1 f 1-1 f 1-1 1-1 1-1 f 1-1 1-1 f 1-1 f 1-1 0-N f 1-1 0-N f 1-1 0-N f 1-1 1-N f 0-1 1-1 1-1 f 1-N f 1-1 1-1 f 1-1 1-1 f 1-1 1-1 f 1-1 1-1 f 1-1 1-N f 0-1 1-N f 0-1 Staff seq: .Administrative[*] .Academic[*] .PHD[*] ProjectsList ProjectName «content» ProjectDate «content» Project seq: .ProjectName .ProjectDate P H D Person_1 key gid: key PersonName_1 «content» PersonName keyref «content» PersonFunction_1 «content» PersonFunction «content» Person exact-1: .f .f .f seq: .PersonName .PersonFunction Description_1 seq: .PersonName_1 .PersonFunction_1 .PersonAddress Description «content» DeptName «content» Department webFileName[0-1] seq: .DeptName .Description .Staff .ProjectsList Administrative Academic

(12)

Method : step 5.2

• Schema conceptualisation

1-1 staff 0-N 1-1 1-N in PROJECT Name Date PERSON Name Function Address Status id: Name DEPARTMENT Name Phone Fax Address Map Mail Description

(13)

Web Site Engineering

PROJECT Name Date ID_DEP equ: ID_DEP acc PERSON Name Function Address Status ID_DEP id: Name acc ref: ID_DEP acc DEPARTMENT ID_DEP Name Phone Fax Address Map Mail Description id: ID_DEP acc

A Methodological Approach for Web Sites Reengineering

RESEARCH OUTPUTS / RÉSULTATS DE RECHERCHE

Institutional Repository - Research Portal

Dépôt Institutionnel - Portail de la Recherche

researchportal.unamur.be

A Methodological

A Methodological

Approach for Web Sites

Approach for Web Sites

Reengineering

Reengineering

Estiévenart Fabrice

François Aurore

Henrard Jean

Hainaut Jean-Luc

Context

Context

•

Still many static web sites…

Advantage

• easy to create

• à for small web sites

Drawbacks

• data and layout are mixed up

• maintenance problems

• à out-of-date or redundant information

• à non-homogeneous design

Solution

Goals

Goals

•

To provide methods and tools for web sites

reengineering :

Method : overview

Method : overview

Method : step 1

Method : step 1

•

Pages classification

« Page type » = a set of pages relative to the same

concept, that display very similar information in a very

similar layout

Method : steps 2 and 3.1

Method : steps 2 and 3.1

HTML cleaning

Semantic enrichment

For each page type

• Concepts identification and description on a sample page

Method : the META file

Method : the META file

• Application to other pages of the same type

Method : step 3.2

Method : step 3.2

Method : step 4

Method : step 4

•

Data and schema extraction

Data extraction

• Web pages + META file à XML document

Data structure extraction

Method : step 5.1

Method : step 5.1

•

Schema integration

To discover relationships between concepts

To detect redundancy

Method : step 5.2

Method : step 5.2

•

Schema conceptualisation

Web Site Engineering

Web Site Engineering

DB

DB

describes

describes

Migration

XML Document

•

Database engineering + Data migration

pageType.extractData(HTML, METAfile)*