• Aucun résultat trouvé

Presentation

N/A
N/A
Protected

Academic year: 2022

Partager "Presentation"

Copied!
21
0
0

Texte intégral

(1)

Syllable-based

compression for XML

Katsiaryna Chernik,

Jan Lánský, Leo Galamboš Dept. of Software Engineering

Faculty of Mathematics and Physics Charles University

(2)

Content

|

Motivation

|

Syllable-based compression

|

XMLSyl

|

XMillSyl

|

Results

|

Conclusion

(3)

Motivatoin

| XML

z Simple text format for structured text documents

z Data exchange standard

z Hight redundancy

(4)

Compression Methods for XML

| Character-based

z XMill

z XMLPPM

z XGrind, ...

| Word-based ?

| Syllable-based ?

(5)

Syllable-based compression

| LZWL

z Dictionary-based method

z Syllable-based version of LZW

| HufSyl

z Statistical method

z Adaptive Huffman coding

z Inspired by HuffWord

(6)

Syllable-based compression

| Syllable-based compression is suitable for languages with rich morphology (Czech)

| Syllable-based compression is

suitable for small or middle-sized files

(7)

Syllable-based compression of XML

| Majority of XML documents are small or middle-sized

| Many text-like XML documents

z news in RSS format

z documentations or books in DocBook format

Syllable-based compression and XML?

(8)

XMLSyl

Idea

| Syllable-based compressor

z XML tokens are divided to many syllables

| XMLSyl

z XML tokens are treated as single syllables

(9)

XMLSyl

Architecture

SAX Parser Structure Encoder

Element Container Data and Structure Container XML Document

Attribute Container

Compressed XML document

Syllable Compressor Syllable Compressor

(10)

XMLSyl

Example

XML doc:

<book>

<title lang="en">XML</title>

</book>

SAX events:

startElement(“book”)

startElement(“title”,(“lang”,”en”)) characters(“XML”)

endElement(“title”) endElement(“book”)

(11)

XMLSyl

Example – Encoding process

SAX events:

startElement(“book”) characters(“XML”) endElement(“title”) endElement(“book”)

Attribute Container Element Container

Data and Structure Container book E0

title E1

lang A0

E1

E0 A0 en END_ATT

CHAR XML END_CHAR END_TAG END_TAG startElement(“title ,(“lang”,”en”))

(12)

XMLSyl

Implementation details

| SAX parser – EXPAT

| Syllable Compressor – LZWL and HufSyl

| Encoding was inspired by existing XML compression methods

z XMLPPM, XGrind, XPress, XMill

(13)

XMillSyl

| Based on XMill

| Main principles of XMill

z Separating structure from data

z Grouping Data values with related meaning

(14)

Architecture of XMill

SAX Parser

Path Processor

Structure Container Data Container1 Data Container k Input XML file

Data Container 2

Compressed XML file

gzip gzip gzip gzip

(15)

XMill – one container

SAX Parser

Path Processor

Structure Container Data Container Input XML file

Compressed XML file

gzip gzip

(16)

XMillSyl

Architecture

SAX Parser

Path Processor

Structure Container Data Container1 Data Container k Input XML file

Data Container 2

Compressed XML file

gzip LZWL / HufSyl LZWL / HufSyl LZWL / HufSyl

(17)

Syllable-based compression of XML Experimental results

XMLSyl & XMillSyl vs. LZWL & HufSyl

| Non-textual XML data

z 50-60% better

| Textual XML data

z 10-20% better

(18)

Syllable-based compression of XML Experimental results

XMillSyl more containers XMillSyl one container

XMLSyl

Text-like XML documents

(19)

Syllable-based compression of XML Experimental results

XMLSyl

| XMLHuf is suitable for small-sized files

| XMLzwl is suitable for large-sized files

XMLSyl vs. XMill

| On average 10-15% worse than XMill

| On some documents the same performance or better

Text-like XML documents

(20)

Conclusion

| New syllable-based compression methods of XML

z XMLSyl (versions: XMLzwl, XMLhuf)

z XMillSyl (versions: XMillzwl, XMillhuf)

| One of our method outperforms XMill on some documents

(21)

Conclusion

| Future work

z extract and utilize the information in the DTD section

z create a special syllable dictionary for elements and attributes

z compress HTML data

Références

Documents relatifs

Indeed, we carry out here the subad- ditive closure of matrices with various sizes, and we observe the average of error as well as the gain in number of points between the

Figure 1: The 3DFC container model combined with the Scene Graph Adapter (SGA) framework allows to load an X3D file and a Collada file simultaneously in every engine that composes

Given a block size and a fixed number of containers of both types, does a configuration exist such that each container has a maximum well-being (that is, no container of the other

The EURITRACK inspection concept: (1) X-ray radiography to detect and localise suspicious items inside the cargo container; (2) tagged neutron inspection after focusing the beam on

gVisor draws a boundary between the application running in a container and the host kernel by using the runtime environment of the Open Container Initiative (OCI) called

Verification of 84% of over 200 methods succeeded; the exceptions were in large classes that use up to 7 complex data structures simultaneously, where accumulated

L’escalier intérieur de largeur 0,60m, plus petit que la largeur standard vous permettra l’accès à votre bassin tout en préservant un maximum la zone de nage.. Idéal si

Since we consider a heterogeneous fleet of containers coming in about fifteen types and two sizes (20 ft and 40 ft containers), we consider the largest container type r ∈ P