• Aucun résultat trouvé

Reusing Dynamic Document Fragment through Virtual Documents: Key Issues In Document Engineering

N/A
N/A
Protected

Academic year: 2022

Partager "Reusing Dynamic Document Fragment through Virtual Documents: Key Issues In Document Engineering"

Copied!
6
0
0

Texte intégral

(1)

Proceedings Chapter

Reference

Reusing Dynamic Document Fragment through Virtual Documents:

Key Issues In Document Engineering

FALQUET, Gilles, et al.

Abstract

Electronic documents may obviously no more be considered as a simple representation of their paper counterpart. They become dynamic components whose content may be modified accordingly to users interaction but also in reaction to modifications in the user's environment (such as an update operation in a database). They not only include data but also behavior (Scripts, Applets, etc.) that may aim at different purposes (provide appropriate rendering method, supply interaction mechanism to the users, etc.). They also include the hypertext dimension and play an important role in the management of information as well as associated knowledge. The extensive use of such hyperdocuments through the Internet, to disseminate and interact with pieces of information, is currently gaining in importance. Designing information systems that rely on this emerging paradigm clearly requires new methods to produce, share and access distributed information. The paper focus on the reusability of information in distributed environments. It aims at presenting our document based approach to both model active document fragments and specify [...]

FALQUET, Gilles, et al . Reusing Dynamic Document Fragment through Virtual Documents: Key Issues In Document Engineering. In: Betaille, H. & Nanard, J. & Nanard, M. 11ème

conférence francophone sur l'Interaction Homme-Machine - Actes de l'atelier Documents virtuels personnalisables - IHM'99 . Toulouse : Cépaduès, 1999. p. 32-36

Available at:

http://archive-ouverte.unige.ch/unige:48323

Disclaimer: layout of this document may differ from the published version.

(2)

Reusing Dynamic Document Fragment through Virtual Documents: Key Issues In Document Engineering

Yassine A. Rekik*, Christine Vanoirbeek*, Gilles Falquet**, Luka Nerima**

* MEDIA Research Group

LITH-DI-EPFL – IN Ecublens – CH 1015 Lausanne - Switzerland

*Yassine.Rekik@epfl.ch , Christine.Vanoirbeek@epfl.ch Tel +41 21 693 6782

* Centre Universitaire d'Informatique

Université de Genève - 24, rue Général-Dufour – CH 1211 Genève 4 – Switzerland

**Gilles.Falquet@cui.unige.ch, Luka.Nerima@cui.unige.ch Tel +41 22 705 77 72

Abstract

Electronic documents may obviously no more be considered as a simple representation of their paper counterpart. They become dynamic components whose content may be modified accordingly to users interaction but also in reaction to modifications in the user’s environment (such as an update operation in a database). They not only include data but also behavior (Scripts, Applets, etc.) that may aim at different purposes (provide appropriate rendering method, supply interaction mechanism to the users, etc.). They also include the hypertext dimension and play an important role in the management of information as well as associated knowledge.

The extensive use of such hyperdocuments through the Internet, to disseminate and interact with pieces of information, is currently gaining in importance.

Designing information systems that rely on this emerging paradigm clearly requires new methods to produce, share and access distributed information. The paper focus on the reusability of information in distributed environments. It aims at presenting our document based approach to both model active document fragments and specify appropriate methods to build customized virtual documents on the basis of so-called fragments.

Introduction

The concept of structured document relies on the use of attribute grammars to formally describe documents classes [Furuta87]. Such an approach allows the production of documents that conform to an abstract model and, thus, enhance the specification of appropriate processing operations on documents belonging to a given class. Initially used for the production and exchange of documents in publishing purposes, the concept of structured document appeared to be of much wider interest.

Documents progressively play an active role in distributed information systems; they become dynamic components whose content may be modified accordingly to users interaction but also in reaction to modifications in the user’s environment [Quit94][Quint95]. In this framework, the production of a document is not limited to the traditional authoring process that consists in providing the content of a document resulting from the work of one or several authors.

The need to build modular document description, taking benefit from existing document fragments, has been identified for a while. It is currently a critical issue to consider a document as the composition of several pieces of information that might be shared and reused by several applications (such as distributed collaborative environments including workflow processes) and users (such as document authoring systems facilitating the reuse of existing material).

In this paper, we first classify and comment about several approaches adopted to deal with the reusability of documents, in order to enhance or facilitate the specification of processing operations to be performed on them. Then, we present our approach to design a dynamic document fragments server and, discuss the appropriate mechanisms to generate customized virtual documents according to a specification.

Document reuse

How to reuse documents, or pieces of document, is currently addressed in many ways and multiple purposes. We propose a classification based on approaches originating from different research areas.

We distinguish between three main problems to be faced in this respect:

- How to add structure to existing documents? This is a matter of analysis;

- How to transform existing documents to reuse them within other applications? It ranges from

(3)

format transformations to more complex structure manipulations;

- How to design documents in such a way that facilitates their reuse by existing or further applications? This is a “document engineering”

problem, very similar to design problems addressed in the software engineering domain.

Structuring or re-structuring documents

It is commonly accepted that availability of structured documents greatly facilitates the reuse of content. In this respect, a number of research works are dedicated to the analysis of raw or semi-structured documents in order to structure or re-structure them in such a way that facilitate their reuse through existing applications.

In this respect, a lot of effort is dedicated to the analysis of document images, either obtained by scanning operation or generated by applications, either obtained by scanning operation or generated by applications, , either obtained by scanning operation or generated by applications, such as Postscript files [Bapst98]. Another approach is adopted in MarkItUp [Fankhauser94], a system designed to recognize the structure of untagged electronic documents; it is based on a learning by example approach to gradually build recognition grammars. Extracting logical structure of library references has been addressed by a mixed strategy combining image analysis and structural information provided by the representation standard used (Unimarc). The system is based on a constraint propagation method [Belaïd97]. Finally, we may also cite work performed to interactively restructuring HTML documents, an approach based on the use of a transformation language [Bonhomme96].

Transforming documents

Despite the obvious advantages conveyed by the manipulation of documents in a structured way, reusing them within users’ environments raises a number of fundamental problems to transform or to adapt their intrinsic structure:

- How to provide the users with appropriate editing operations (such as cut and paste operations) that maintain the document consistency ?

- How to merge documents conforming to different structures ?

- How to guarantee the consistency of existing documents that relate to an evolving generic structure ?

- How to transform existing structured documents to reuse them in the framework of applications relying on another document model ?

The issue of structures transformations is complex.

Depending on the context, several approaches have been proposed or are currently under investigation to address this problem. An interactive context (such as document editing) highly promotes the use of automatic transformations that have to be performed in

an efficient way to accommodate a required response time [Akapotsui97]. Transforming existing structured documents in order to fit a different target structure may be performed on the basis of explicit specification provided through descriptive rules to guide the transformation process [XSL99]. A combined approach has also been proposed to take benefit from the two mentioned approaches [Bonhomme97][Bonhomme98].

Modeling modular structured documents

The formal description of a document class aims at constraining the logical organisation of document instances and, thus guarantee their adequacy to a given model. Attributed grammars have been originally used to define classes of document for publishing purposes.

Initially, such formal descriptions consisted in

“monolithic” descriptions, each of them, representing the (logical) editorial structure of the document class.

The need to build such descriptions in a modular way, taking benefit from existing document fragment descriptions, has been rapidly identified and addressed in different ways. Some document management systems explicitly provide appropriate mechanism to define reusable units of structured pieces of documents.

However, dealing with reusable fragments of documents raises problems, very similar to those encountered in the software engineering domain, that are not currently solved. Ongoing research works have mostly adopted an object-oriented design for the appropriate representation of documents models [SOX99][Abiteboul99].This approach is relevant in many respects. It allows the precise definition of pieces of information as well as the processing operations to be performed on identified structured data. It aims at facilitating the exchange and reuse of document fragments between heterogeneous applications.

Reusable Dynamic Fragments and Virtual Documents

Reusing information implies that there exists a mechanism to combine (or compose) the available pieces of information into new documents derived according to the users’ needs. Two major issues have to be addressed to reach this objective: how to model pieces of information and how to provide methods for assembling such pieces in a consistent way?

Dynamic Document Fragment and Dynamic Fragments Server

Many work has been done in order to offer repositories of documents in order to reuse them. However, the majority of works emphasized on documents collection, documents identification and documents access. The integration of those fragments when authoring new documents, especially in the case of structured documents is not yet answered in a satisfying way. In addition, the majority of works

(4)

focused on documents content reuse. The reuse of document content, structure and behavior remains an open problem.

The first concept that seems for us a major concept for document reuse is what we call "Dynamic Structured Document Fragment". We consider the document fragment as an independent, self-described piece of information that can be reused and adapts to new documents. The definition of a dynamic structured fragment includes several aspects:

- the definition of the granularity of such reusable fragments

- the definition of a fragment content model

- the definition of fragment metadata, to characterize and identify it within an authoring process,

- the definition of a mechanism to associate methods to describe the fragment interface and behavior.

In our approach “reusing” consists in integrating both the structured content of the fragment and the associated methods, in order enhance processing operations on the fragment. The fragment can be manipulated separately with appropriate authoring tools, or within a more general context by applications that manipulate the new created document.

In order to access and reuse document fragments, our approach is based on a “Fragment Server”. The design of such a server must take into account three major points. First, how to manage and store such dynamic structured fragment. Two promising approaches may be adopted for this task. The first is the use of multimedia object databases which are appropriate for handling dynamic fragment. The second in the structured document databases which are emerging currently and seem to be appropriate for managing document based data. The second task will be to define the access functions and mechanisms that a fragments server have to offer. Also in this task, we will investigate two directions. The querying based approach represent the first direction and is based on recently proposed document querying languages. The second direction is based on navigational models. This will be based on the recent work done in the domain of hypertext navigation and information retrieval. Finally, the third point to be addressed when designing a fragment server in the exchange protocol and format.

In fact, a model to exchange dynamic fragment with authoring tools and/or document processing applications is necessary to ensure the import/export function between authors and the fragment server.

When proposing this exchange model, a special attention will be accorded to the use of standards and interoperable technologies such as XML, Java, Corba, etc.

Database Views and Virtual Document

The notion of view has been extensively studied in the field of databases. Views were first introduced within the relational data model as derived relations.

Theoretical and applied works have dealt with problems such as updating databases through views, computing and updating materialized (stored) views, etc. More recently, the notion of view has been (partially) adapted to other models such as object-oriented data models, semi-structured models and document models.

In recent years there has been a strong interest for virtual documents and virtual hypertexts. For instance, the development of the Wold Wide Web has prompted research on the generation of virtual hypertexts (made of Web pages) to represent the content of databases.

Works on this theme are beeing presented in the WebDB workshops and SIGMOD conferences. In the proposed research we intend to adapt recent results on hypertext views on databases to the generation of virtual documents on a respository of document fragments. Recently, there has been a strong interest for Personnalized virtual documents, and a community of researchers is gathering on this theme.

The virtual document model will precisely define how to derive each virtual document from the stored document fragments. The specification of a virtual document will contain at least:

- a selection specification: determines which fragments to use; it is a set of queries on the fragment repository ( based either on fragment content or meta-content);

- a global document structure, which can be either static (fixed in the specification) or dynamic (computed on the document fragments, e.g. links obtained by computing a semantic proximity between two fragments); global structures may take various forms, such as sequences, trees, graphs, etc.

- a presentation specification which indicates how to represent the global structure (for instance, a tree can be represented graphically, with embedded boxes, linearly with parentheses, etc.)

- fragment operations: specify which operation (method) to apply on each fragment (e.g. a fragment representing a formula may be expressed in a given system of units, a picture may be resized or rotated, etc.)

- inheritance relationships: a specification may be derived from one or more other specifications (this is particularly useful to adapt existing specifications to new needs)

We will develop a language to express such specifications. This language will be an extension of a document specification language. Its formal semantics will be expressed in terms of the virtual document model.

(5)

Bibliography

[Abiteboul 99] S. Abiteboul, R. Goldman, J. McHugh, V. Vassalos, Y. Zhuge. “Views for Semistructured Data”, Tech. Report, Dept of Computer Science, Stanford University, 1999.

[Ahonen 98] H. Ahonen, B. Heikkinen, O. Heinonen, J.

Jaakola, P. Kilpelainen, G. Lindén. Design and Implementation of a Document Assembly Workbench. In Electronic Publishing, Artistics Imaging and Digital Typography, St. Malo, france, March/April 1998.

[Akapotsui 97] E. Akpotsui, V. Quint, C. Roisin. Type Modelling for Document Transformation in Structured Editing Systems. Mathematical and Computer Modelling, vol. 25, num. 4, pp. 1-19, 1997.

[Bapst 98] F. Bapst, R. Bruegger, A. Zramdini, R.Ingold. Integrated Multi-Agent Architecture for Assisted Document Recognition. In Document Analysis Systems II, Series in Machine Perception and Atricial Intelligence, vol 29, World Scientific, 1998, pp. 301-317.

[Belaïd 97] A. Belaïd, Y. Chenevoy. Constraint Propagation vs Syntactical Analysis for the Logical Structure Recognition of Library References. N. A. Murshed and F.

Bortolozzi (Eds.), Lecture Notes in Computer Science 1339, BSDIA'97, Springer, pp. 153-164, Curitiba, Brazil, November 2--5, 1997.

[Bergamaschi 99] S. Bergamaschi, S. Castano, M.

Vincini, D. Beneventano. Intelligent Techniques for the Extraction and Integration of Heterogeneous Information.

In the Workshop on Intelligent Information integration, Sixteenth International Joint Conference on Artificial Intelligence, IJCAI'99, Stockholm, Sweden, July-August, 1999.

[Bonhomme 96] S. Bonhomme, C. Roisin.

Interactively Restructuring HTML Documents. Computer Network and ISDN Systems, vol. 28, num. 7-11, pp. 1075-1084, May 1996.

[Bonhomme 97] S. Bonhomme, C. Roisin.

Transformations de documents

électroniques. Document Numérique, vol. 1, num. 3, 1997.

[Bonhomme 98] S. Bonhomme. Transformations de documents structurés : une combinaison des

approches déclaratives et automatiques, Doctorat d'informatique, Université Joseph Fourier, décembre 1998.

[Caumanns 99] J. Caumanns. A Modular Framework for the Creation of Dynamic Documents. In Proceeding of the Workshop on Virtual Documents, Hypertext Functionality and the Web, Eighth International World Wide Web Conference, Toronto, Canada, 1999.

[Furuta 87] R. Furuta. Concepts and Models for Structured Documents. In the proceeding of the workshop "structure for documents", Savoie, France, January, 1987.

[Green 99] S. J. Green, M. Milosavljevic, R. Dale, C.

Paris. When Virtual Documents Meet the Real World. Proceedings of the Workshop on Virtual Documents, Hypertext

Functionnality and the Web, Eighth International World Wide Web Conference, Toronto, Canada, 1999.

[Landow 97] G. P. Landow. Hypertext 2.0. John Hopkins Press, London, 1997.

[Levy 93] D. M. Levy. Document reuse and document systems. Electronic publishing, vol. 6(4), pp. 339-348, december, 1993.

[Marchionini 97] G. Marchionini, V. Nolet, H.

Williams, W. Ding, J. Beale, E. Enomoto, A. Rose. Content + Connectivity = Community: Digital Resources for a Learning Community. The 2nd International Conference on Digital Libraries (ACM DL'97). Philadelphia, USA, July 1997.

[Nielsen 90] J. Nielsen. "The Art of Navigating Through Hypertext". CACM, Vol. 33, No.

3, 1990.

[Paradis 97] F. Paradis, A.M. Vercoustre, B.Hills. A Virtual Document Approach for Reusing SGML/XML Information Objects. , SGML/XML Asia Pacific Conference, Sydney, September 1997.

[Paradis 98] F. Paradis, A.M Vercoustre. A Language for Publishing Virtual Documents on the Web. WebDB'98 International Workshop on the Web and databases, in conjunction with EDBT98, Valencia, Spain, march 1998.

[Quint 94] V. Quint, I. Vatton. Making structured documents active. Electronic Publishing - Organisation, Dissimination and Design, 7(2): 55-74, June 1994.

(6)

[Quint 95] V. Quint, I. Vatton, J. Paoli. Active Structured Documents as User Interfaces, vol. User Interfaces for Symbolic

Computations, Springer Verlag, september, 1995.

[Ranwez 99] S. Ranwez, M. Crampes, T. Leidig.

Description and Construction of Pedagogical Material using an Ontology based DTD. Workshop on Ontologies for Intelligent Educational Systems, Ninth International Conference on Artificial Intelligence in Education, AI-ED'99, Le Mans, France, July 19-23, 1999.

[Ranwez 99] S. Ranwez, M. Crampes. Conceptual Documents and Hypertext Documents are two Different Forms of Virtual Documents.

Proceedings of the Workshop on Virtual Documents, Hypertext Functionnality and the Web, Eighth International World Wide Web Conference, Toronto, Canada, 1999.

[Rutledge 97] L. Rutledge, J. van Ossenbruggen, L.

Hardman D. C. A. Bulterman. A Framework for Generating Hypermedia Documents. In Proceedings of ACM Multimedia 97, November 1997.

[Santos 94] C. Santos, S. Abiteboul, C. Delobel.

"Virtual Schemas and Bases". In proc.

Extending Database Technology Conf. ?, 1994.

[SIGMOD 92] Special Issue: Advanced User Interfaces for Database Systems. SIGMOD Record, Vol. 21, No. 1, 1992.

[SOX 99] W3C, W3C note, Schema for

Object-Oriented XML 2.0. 30 Jul 1999.

http://www.w3.org/TR/NOTE-SOX/

[Vercoustre 97] A.M. Vercoustre, F. Paradis. A Descriptive Language for Information Object Reuse through Virtual Documents.

In 4th International Conference on Object-Oriented Information Systems (OOIS'97), Brisbane, Australia, 10-12 November, 1997.

Références

Documents relatifs

Once this constraint is lifted by the deployment of IPv6, and in the absence of a scalable routing strategy, the rapid DFZ RIB size growth problem today can potentially

When a transit node’s policy permits it to support reroute request processing and local repair, the node MUST examine incoming PathErr messages to see it the node can perform

This memo describes how the "authPassword" Lightweight Directory Access Protocol (LDAP) attribute can be used for storing secrets used by the Salted Challenge

Each response may be followed by a response code (see Section 1.3) and by a string consisting of human-readable text in the local language (as returned by the LANGUAGE

This document defines the PaC-EP Master Key (PEMK) that is used by a secure association protocol as the pre-shared secret between the PaC and EP to enable cryptographic filters

The DCSite captures the type, identifier, location, and other pertinent information about the credential gathering process, or data collection site, used in the

It provides a collection of data objects that can be queried using the SNMP protocol and represent the current status of the NTP entity.. This includes general information

Furthermore, the Type, Length, Value pairs (TLVs) described in this document are generic Layer-2 additions, and specific ones as needed are defined in the