• Aucun résultat trouvé

A Theoretical Framework for Data-Hiding in Digital and Printed Text Documents

N/A
N/A
Protected

Academic year: 2022

Partager "A Theoretical Framework for Data-Hiding in Digital and Printed Text Documents"

Copied!
3
0
0

Texte intégral

(1)

Proceedings Chapter

Reference

A Theoretical Framework for Data-Hiding in Digital and Printed Text Documents

VILLAN SEBASTIAN, Renato Fisher, et al.

VILLAN SEBASTIAN, Renato Fisher, et al . A Theoretical Framework for Data-Hiding in Digital and Printed Text Documents. In: Proceedings of 9th IFIP TC-6 TC-11 International

Conference on Communications and Multimedia Security . 2005. p. 280-281

Available at:

http://archive-ouverte.unige.ch/unige:47949

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

A Theoretical Framework for Data-Hiding in Digital and Printed Text Documents

R. Vill´an, S. Voloshynovskiy, F. Deguillaume, Y. Rytsar, O. Koval, E. Topak, E. Rivera, and T. Pun

Computer Vision and Multimedia Laboratory - University of Geneva 24, rue du G´en´eral-Dufour - 1211 Geneva 4, Switzerland

Contact author:[email protected]

In this work, we consider the text data-hiding problem as a particular in- stance of the well-known Gel’fand-Pinsker problem [1]. The text, where some messagem∈ Mis to be hidden, is represented byxand called cover text. Each componentxi,i= 1,2, . . . , N, ofxrepresents one character from this text. Here, we define a character as an element from a given language alphabet (e.g. the latin alphabet{A, B, . . . , Z}). To be more precise, we conceive each characterxi as a data structure consisting of multiple component fields (features): name, shape, position,orientation,size,color, etc.

Assuming the knowledge of the conditional probability distributionp(u|x),

|M||J | codewords u are generated independently at random and located into

|M| bins, each of them with |J | codewords. Once generated, the codebook is revealed to both the encoder and the decoder. Givenmto be communicated, the encoder produces the watermarkwby finding first a jointly strongly typical pair (x,u(m, j)), whereu(m, j) is thej-th codeword inside the bin corresponding to m, and then, by using a deterministic mapping w=ϕN(x,u). The influence of the channel p(v|w, x) is divided in two stages. In the first stage, w and x are combined via a deterministic mappingψN(w,x) to give the stego texty. In the second stage, y may suffer from some intentional or unintentional distortions.

We denote by v the resulting distorted version of y. Finally, v is fed to the decoder, which tries to obtain an estimate ˆmof messagemby using the jointly strongly typical decoding rule.

As a particular example of the Gel’fand-Pinsker scheme, let us consider the Scalar Costa Scheme (SCS) [2] where the stego text Y is obtained as Y = W+X =αQm(X) + (1−α)X, whereQm(·) is a scalar quantizer corresponding tomandα is a compensation parameter. For a practical implementation based on the SCS, we only need to select a character feature (e.g. color), and use it as the cover character X. We show in Fig. 1 the resulting SCS codebook and an illustration of how to use it for text data-hiding.

Based on the above framework, we propose two new methods for text data- hiding: color quantization and halftone quantization. The exploited character features are, respectively,colorandhalftone pattern(see Fig. 2). The main idea of these methods is to quantize the character feature in such a manner that the human visual system is not able to distinguish between the original and quantized characters, but it is still possible to do it by a specialized reader, e.g. a high dynamic range and/or high resolution scanner in the case of printed

(3)



















 1





A→Q1(A) B→Q1(B)

... Z→Q1(Z)





... ...

|M|





A→Q|M|(A) B→Q|M|(B)

... Z→Q|M|(Z)











































 1





A→Q1(A) B→Q1(B)

... Z→Q1(Z)





... ...

|M|





A→Q|M|(A) B→Q|M|(B)

... Z→Q|M|(Z)

























x=B x=B

Read Feature

Update Feature

× +

×

α α 1−α 1−α

y=B y=B TRABAJAR MUCHO TRABAJAR MUCHO

m=|M|

m=|M|

Fig. 1.SCS text data-hiding.

(a) (b) (c)

Fig. 2.Halftone quantization: (a) original character; (b) marked character form= 0, screen angle = 0; (c) marked character form= 1, screen angle = 45.

documents. In particular, we show that the color quantization method works both for digital and printed documents, has high information embedding rate, is perceptually invisible, and is fully automatable.

Acknowledgment

This work was partially supported by the SNSF professorship grant no. PP002-68653/1, the IM2 project, and the European Commission through the programs IST-2002-507932 ECRYPT and FP6-507609 SIMILAR. The information in this document reflects only the author’s views, is provided as is and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability.

References

1. Gel’fand, S., Pinsker, M.: Coding for channel with random parameters. Problems of Control and Information Theory 9(1980) 19–31

2. Eggers, J., Su, J., Girod, B.: A blind watermarking scheme based on structured codebooks. In: Secure Images and Image Authentication, IEE Colloquium, London, UK (2000) 4/1–4/6

Références

Documents relatifs

support overlay support overlay 2 support overlay support overlay support overlay 5 support overlay 6 support overlay 7 support overlay 8 support overlay 9 support overlay 10

Whenever possible, links to identifiers.org (http://identifiers.org) –a resolvable persistent system for biological related infor- mation, and Bio2RDF are also

into blocks of text, if the document is composed of text blocks, each block is then segmented into a set of lines then each line will be segmented into

We define a partition of the set of integers k in the range [1, m−1] prime to m into two or three subsets, where one subset consists of those integers k which are < m/2,

• Want to view a page of a document but see context of surrounding text and

− Understand the contents of a document or collection of documents without reading them.?. Spring 2004 CS

Compressed with JPEG, a color image of a typical magazine page scanned at 100dpi (dots per inch) would be around 100 KB to 200 KB, and would be barely readable.. The same page at

The “Bibliotheques françoises” database provides historians and literary scholars with a useful tool not only to find relevant information from the original text of