Probabilistic Modelling of Printed Dots at the Microscopic Scale

(1)

HAL Id: hal-01686159

https://hal.archives-ouvertes.fr/hal-01686159v2

Submitted on 24 Mar 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

Probabilistic Modelling of Printed Dots at the

Microscopic Scale

Quoc Thong Nguyen, Yves Delignon, François Septier, Anh Thu Phan Ho

To cite this version:

Quoc Thong Nguyen, Yves Delignon, François Septier, Anh Thu Phan Ho. Probabilistic Modelling of

Printed Dots at the Microscopic Scale. Signal Processing: Image Communication, Elsevier, 2018, 62,

pp.129-138. �10.1016/j.image.2018.01.003�. �hal-01686159v2�

(2)

Probabilistic Modelling of Printed Dots

at the Microscopic Scale

Quoc Thong Nguyena, Yves Delignonb, François Septierb, and Anh Thu Phan Ho.c a_{Université de Bretagne Sud, Laboratoire de Mathématiques de Bretagne Atlantique, UMR CNRS 6205,}

Campus de Tohannic, Vannes, France

b_{IMT Lille Douai, Universit´e Lille, CNRS UMR 9189 - CRIStAL, F-59000 Lille, France} c_{Universit´e de La Rochelle, Laboratoire L3i, La Rochelle, France}

Abstract

Microscopic analysis of paper printing shows regularly spaced dots whose random shape depends on the printing technology, the configuration of the printer as well as the paper properties. The modelling and identification of paper and ink interactions are required for qualifying the printing quality, for controlling the printing process and for application in authentication as well. This paper proposes an approach to identify the authentic printer source using micro-tags consisting of microscopic printed dots em-bedded in the documents. These random shape features are modelled and extracted as a signature for a particular printer. In the paper, we propose a probabilistic model consist-ing of vector parameters usconsist-ing a spatial interaction binary model with inhomogeneous Markov chain. These parameters determine the location and describe the diverse micro random structures of microscopic printed dots. A Markov chain Monte Carlo (MCMC) algorithm is thus developed to approximate the Minimum Mean Squared Error estima-tor. The performance is assessed through numerical simulations. The real printed dots from the common printing technologies (conventional offset, waterless offset, inkjet, laser) are used to assess the effectiveness of the model.

Keywords: Probabilistic model; Bernoulli process; Metropolis Hastings within Gibbs; Microscopic printing; Markov chain.

∗_{Corresponding author}

(3)

1. Introduction

Preventing and discouraging unauthorised printed materials is the expected achieve-ment of the authentication methods. Papers dealing with this subject can be categorised in two strategies. The first one consists in embedding an extrinsic signature such as se-cure tags in the prints [1, 2]. While the second one, which is the motivation of this

5

paper, consists in characterising the intrinsic features of printing and scanning process [3]. As examples of intrinsic features, the signature consisting on the banding artefact coming from fluctuation of the optical photoconductor angular velocity in case of laser printers [4], dimple effect which is specific to inkjet printers or also texture features [5], or the analysis of quality signature of the unique print to differentiate one printer

10

technology/supplier from another [6]. Authentication can be performed based on paper statistics, for example, the paper statistics is extracted by capturing microstructure im-ages from different carton packim-ages under different light conditions and with different cameras. These microstructure images are considered as digital fingerprinting on which the authentication frameworks are derived [7]. In [8], the authors proposed the intrinsic

15

feature of paper based on a texture speckle pattern, a random bright/dark region forma-tion at the microscopic level when light falls on to the paper. The extracted signature of printer is a unique pattern of misplaced toner powder on each paper [9]. For elec-trophotographic printers, the intrinsic signals extracted from banding artefact are used to identify the device [10]. Printer source identification is an important forensic tool.

20

The techniques using characters for identification of source scanner and printer have been proposed [11]. Source identification using printed character in different languages has been recently studied [12, 13, 14, 15, 16].

Besides the role in print authentication, depending on the purposes of the problems, there are a number of applications concerning the printing models. A specific model is

25

considered in the scale of the experiments/applications. For instance, in the human vi-sion scale, halftoning technique is a method that takes advantage of the optical illuvi-sion of human vision to display continuous tone grayscale image with only black or/and white dots [17, 18, 19]. The scale of the model is the resolution of the printer. An image of print not only contains the properties of a printing process but also includes

(4)

image degradation; Baird proposed a physics-based model in which the parameters defining imaging defects are described [20]. Under the microscopic scale printing, a probabilistic model for the average coverage of the toner particles ink of electropho-tographic printing process is introduced by Norris et al. [21]. However, the model has a limited validation, and the estimation procedure has not been considered. For

35

authentication purpose, we propose a probabilistic modelling of microscopic printed dots.

The shapes of microscopic printed dots can be considered as the intrinsic feature of the printing process. At the microscopic scale, a dot is a random pattern whose shape depends on the technology, the setting of the printer, the ink quality and/or the paper

40

properties. From a statistical point of view, the digital acquisition of these random dots can be modelled as a spatial interaction binary model based on an exponential power kernel that depends on location parameter and shape parameters. The accurate and consistent model can help to create an authentic signature to improve the authentication procedure. The model proposed in our previous works [22, 23] consists in a spatial

45

distribution model that simulates the randomness of the shape of printed dots. The random shape is manipulated by the parameters in this probabilistic model. In [24], the numerical estimation procedure using MCMC algorithm is developed. However, in order to cover the diverse random shape of printed dots, the surrounding pixels or the conditional probabilities has to be taken into account.

50

In this paper, we propose a spatial interaction binary model that describes more precisely the random shape of the microscopic dots by taking into account the neigh-bourhood of the pixels. In particular, the Markov property between pixels in the image is considered. This property helps to describe more accurately the intrinsic features of the print, which can thus improve the discrimination power. An MCMC algorithm is

55

developed for estimating the model parameters. Finally, we analyse the performance of the proposed unsupervised identification algorithm. The rest of this paper is organised as follows. Section 2 is devoted to visually describe the randomness of shape of micro-scopic dots. The formulation of spatial stochastic model is discussed in Section 3. The Metropolis-Hastings within Gibbs algorithm for the estimation is developed in Section

60

(5)

the algorithm is derived. Section 6 is devoted to the performance of the new algorithm which are analysed through simulations, and the benefit of such a model and estimation algorithm are illustrated in the case of real microscopic scale printed dots. The basic authentication of printer identification using the model is also described in Section 7.

65

Concluding remarks are given in Section 8.

2. Printed dots at the microscopic scale

As aforementioned, we consider the shape of microscopic printed dots as an intrin-sic feature to perform authentication on printer identification. This section is devoted to describing the random shape of microscopic printed dots. We then demonstrate

70

intuitively that the shapes are different between different printers, which can benefit printers discrimination. Let us start by the observation of printed dots at the micro-scopic scale. A single printed dot is created by printing an image containing one black pixel. In our study, the microscopic printed dots are captured using an optical Zeiss Microscope with an AxioCam camera. The camera has a maximum solution of 2464 x

75

2056 (5Mp) with a sensor size of 8.5 mm x 7.1 mm and a pixel size of 3.45 µm x 3.45 µm. Since the focus of the study is on the physical shape of coverage area of the ink, the collected images are only in black and white.

The images on the left column in Fig. 1 illustrate the dots printed by offset, laser, inkjet printers respectively. Random shapes of these dots are observed under

micro-80

scope scale. The physical size of the dots is very small. For example, size of a dot from the offset printer (1200 dpi) is theoretically 21.16µm. The printing resolution is an important element in this study. If the resolution is too high, the outcome will be similar to the dust noise from the printer. For instance in our study, the dots printed from a laser printer under resolution 1200 dpi are not distinguishable from the noise.

85

The resolution can neither be too low, since the shape of the dots are no more random except for the edge. Therefore, the resolution has to be chosen manually for a specific printer.

Visually, from the left column of Fig. 1, we can see that the shapes of printed dots from different printers are dissimilar. Fig. 2 illustrates the shape of dots printed by

(6)

(a) 0 0.2 0.4 0.6 0.8 1 (b) (c) 0 0.2 0.4 0.6 0.8 1 (d) (e) 0 0.2 0.4 0.6 0.8 1 (f)

Figure 1: Binary image of 4 single dots printed by an offset printer (a), the empirical probabilities of appear-ance of black pixels in the given position (profile) of 100 samples of one printed dot with the resolution 1200 dpi (b). The dots from a laser printer with the resolution 600 dpi (c), and the profile (d). The dots from an inkjet printer at 720 dpi (c), and the profile (f).

two different laser printers at the same resolution 600 dpi. On the right column in Fig. 1, each image illustrates the frequency of appearance of black pixels at each position (profile) from 100 captured samples. As observed, the probabilities of emergence of the black pixels are lower when the pixels are further from the centres. The shape of the profile contains the characteristics of the printed dots, which means the shape

(7)

(a) (b)

Figure 2: The dots printed by two laser printers, Dell laser printer (a) and HP laser printer (b). The conditions are identical, resolution is 600 dpi.

of the dots contains the identity of the source printer in some sense. The model in [22, 23] introduced the parameters controlling the width of the droplets, the density of the black pixels and the average number of black pixels. However, the interaction with the surrounding pixels was not taken into account. In particular, when a pixel is black, its neighbour is more likely to be blackened. In addition, when the pixels

100

are further from the centre of the dots, this influence of the black pixel(s) on their vicinity is also smaller. Including these properties could help the model to have a more precise simulation, thus describing in more details the identity of source printer. These properties are detailed in the next Section.

3. Spatial interaction binary model with Markov chain

105

Let S be a finite set corresponding to N pixels of a binary image U that displays K dots. We consider the random process U = (Us)s∈S, where each Usis a Bernoulli

ran-dom variable which takes its value from {0, 1}, respectively black state and white state. Let Vsbe a neighbourhood of the pixel s, the geometric shape of Vsis independent of

s ∈ S. U is modelled by a Markov random field

P (Us= us| (Ut= ut)t6=s) = P (Us= us| (Ut= ut)t∈Vs) , (1)

i.e. the probability that the pixel s takes the value usconditional on all other pixels of

the image equals to the probability of usconditional on the values of the pixels in the

(8)

the centre µ, which means P (Us= 0 | (Ut= ut)t6=s) decreases when the pixel s goes

further from the centre, i.e. the Euclidean distance ||s − µ|| increases.

110

A two-dimensional image can be transformed into a one-dimensional chain by Hilbert curve [25] (see Fig. 3). Because the targeted interactions (statistical depen-dencies) between pixels within the microscopic dot are well preserved in the spatial structure by Hilbert curve, the Markov chain with Hilbert path is used in many image processing applications [26, 27, 28]. Moreover, Pseudo-Hilbert Scan algorithm can be applied to arbitrarily-sized image [29]. In convenient index, let U = (U1, U2, ..., UN)

be a vector of random variables ordered according to the Hilbert curve transformation of the image. In case the neighbourhood Vsis restricted to one pixel, the image U

becomes a Markov chain

P (U = u) = P (U1= u1) N

Y

i=2

P (Ui= ui| Ui−1= ui−1). (2)

Figure 3: Scanning of 8 × 8 image

The Markov chain (2) encompasses the parameters of the initial probability

(9)

and transition probabilities

P (Ui = 0 | Ui−1= u), (4)

for 1 < i ≤ N . The parametric probabilities for (3), (4) are

P (U1= 0) = 1 − K Y k=1 (1 − pk(s1)) (5) P (Ui = 0 | Ui−1= u) = 1 − K Y k=1 (1 − pk(si))λ(1−u). (6)

The function pk(.), named kernel, measures the ability to blacken the surrounding

pixels of the k-th dot. The kernel pk(s) has to decrease with respect to the distance

||s − µk||, so that, as requisite property, the initial and transition probabilities also 115

decrease with respect to the distance. The support of pk(.) is a subset of (0,1].

The positive parameter λ characterises the interaction between pixels when the neighbour pixel is black, Ui−1 = 0. With λ = 1, the probabilities (6) only depend

on the relative position to the centres, P (Ui | Ui−1) = P (Ui), which is the approach

proposed in [22, 23]. In case λ < 1, the smaller λ leads to a higher chance of being

120

black of current pixel Ui, since P (Ui = 0 | Ui−1= 0) is greater with smaller λ (see

Eq. (6)). When the neighbour pixel is white, Ui−1 = 1, the probabilities depend on

the distances of the current pixel to the dots centres. With these properties, the dots are more compact, and the density of black pixels at the areas closer to the centres is higher. The new proposed parameter λ helps to describe a wider variety of shape of

125

dots.

The transition probabilities depend on pk(s) which relates to the position of the

pixels, this implies that U is a non-stationary Markov chain. The expected number of black pixels of the image, denoted nb, is calculated as

nb= N X i=1 E (δ(Ui)) = N X i=1 P (Ui= 0), (7)

(10)

by (5), while the other terms are calculated by (6) with the recurrence relation and the law of total probability

P (Ui= 0) = P (Ui= 0 | Ui−1= 0)P (Ui−1= 0) +P (Ui= 0 | Ui−1= 1)P (Ui−1= 1),

(8) with 2 ≤ i ≤ N . Let us note that with smaller λ, P (Ui = 0 | Ui−1= 0) is greater,

that leads to the value of P (Ui = 0) also becoming greater. This implies that smaller

λ statistically increases the number of black pixels. In the following, the kernel pkis

chosen as a Gaussian power kernel

pk(s) = η exp    − ks − µkk 2 σ2 !β   . (9)

Fig. 4 illustrates the simulation of one dot with σ2, β equal to 70 and 5 respectively

130

and various values of λ and η. It can be seen that the dots with small λ are more continuous, in other words, the black pixels make their neighbourhood more likely to be black. The parameter η controls the amplitude of the kernel and the density of black pixels increases with η. When η is large, the density of black pixels near the centre is almost independent of λ.

135

4. Estimation

A Bayesian framework is employed to estimate the model parameters. Let us recall our problem, the parameters of interest in (5), (6) are θU ={µk} , σ2, β, η, λ which

belong to a 2K + 4 dimensional space. The Minimum Mean Squared Error (MMSE) estimator ˆθ of θUgiven the observed data u is

ˆ θ(u) =

Z

θp(θ | u)dθ, (10)

where p(θ | u) is the posterior distribution. To approximate the integral (10), an MCMC algorithm is applied [30]. Since the vector of parameters is of high-dimension,

(11)

η λ

1 0.6 0.2

0.3

0.7

1

Figure 4: The simulation of the model (5), (6) with Gaussian power kernel with σ2 _{= 70, β = 5 and}

different values of λ, η.

it is more efficient to use Gibbs sampling procedure which consists in sampling suc-cessively each parameter from their conditional posterior distribution. Unfortunately, since no closed form expression is available for these distributions, we propose to use a Metropolis-Hastings within Gibbs algorithm [30]. The chosen strategy is to run a random walk as proposal distribution:

θ∗= θ(t)+ t, (11)

where t is a zero mean normal random vector with a diagonal covariance matrix.

We generate θ2 = log(σ2), θ3 = log(β) instead of σ2, β to ensure their

positiv-ity; a transformation is also applied for η and λ to generate θ4, θ5 respectively, i.e.

η = _1+e1_θ4, λ = _1+e1_θ5. For convenient notation, we rewrite the vector of interested

(12)

parameters as θ = [θ1, θ2, θ3, θ4, θ5]. The Metropolis-Hastings within Gibbs using

random walk is summarised in Algo. 1.

Algorithm 1 Sampling algorithm with random walk

1: Initialise θ(0)= h θ(0)₁ , θ(0)₂ , θ₃(0), θ₄(0), θ₅(0)i 2: for i = 1 to N iter do 3: for b = 1 to 5 do 4: Generate θ?_b ∼ N (θ_b(i−1), σ_b2)

5: Compute the acceptance ratio

α(θ(i−1)_b , θ?_b) = min    pθ? b | θ (i−1) −b , u pθ(i−1)_b | θ_−b(i−1), u , 1    6: Decide θ_b(i)= ( θ? b with probability α θ(i−1)_b , θ? b θ_b(i−1) otherwise 7: end for 8: end for

Where θ(i−1)_−b =hθ(i)₁ , . . . , θ(i)_b−1, θ(i−1)_b+1 . . . , θ(i−1)₅ i.

The initialisation of θ proceeds as following. The centres are initialised by the careful seeding [31]. The other parameters are respectively initialised as σ2 _{= 0.6 ×}

N

πK, β = 2. Since η and λ are in (0, 1], intuitively, we may initialise both of them equal

to 0.5. From Bayes’ law, we have:

p (θb| θ−b, u) =

p (u | θb, θ−b) p (θb, θ−b)

p (θ−b, u)

, (12)

where p (u | θb, θ−b) is the likelihood and p (θb, θ−b) is the prior. Since all parameters

are independent, the acceptance ratio in Alg. 1 is rewritten as

α(θ(i−1)_b , θ_b?) = min    p (u | θ? b, θ−b) pb(θ?b) pu | θ(i−1)_b , θ−b pb θ(i−1)_b , 1    , (13)

(13)

pa-rameters by experience. Each centre {µk} has multivariate normal distribution, σ2is

log-normal distributed, β follows gamma law, while both η and λ are beta-distributed.

145

Let us note that when the priors are closed to uniform, the acceptance ratio mainly de-pends on the likelihood ratio. The choice of the variances of the priors dede-pends on the confidence of the initialisation and the experience.

5. Posterior Cram´er - Rao bound

Let u be the observation vector and θ be the r-dimensional vector of random pa-rameters (with r = 2k + 4). The joint probability density of the pair (U, θ) is p(U, θ). Let ˆθ(u) denote an estimate of θ. The Posterior Cram´er-Rao Bound (PCRB) states that [32, 33]:

E[(ˆθ(U) − θ)(ˆθ(U) − θ)T] ≥ I−1, (14)

where I is r × r Fisher information matrix (FIM)

150

I = E∇θlog p(U, θ)∇Tθ log p(U, θ)

= −E∆θ

θlog p(U, θ) , (15)

where ∆θ_θ:= ∇θ∇Tθ is the second derivative operator and ∇θis the gradient operator

with respect to θ.

Using the fact that p(U, θ) = p(U | θ)p(θ), the expression of the FIM in (15) can be expressed as: I = −E∆θ θlog p(U | θ) − E ∆ θ θlog p(θ) = Id+ Ip, (16)

where Ipstands for the a priori information matrix and Idis the ”standard” FIM (Id(θ) 155

is used in the derivation of the CRB) averaged over the a priori distribution of θ:

Id=

Z

Θ

(14)

In many special cases, when only the i-th component of θ, say θi, is of interest, the

PCRB is expressed as

E[( ˆθi(U) − θi)2] ≥ [I−1]ii (18)

6. Experimental results and performances

In this section, we aim at assessing both the performance of the estimators and the adequacy of the model. The Metropolis-Hasting within Gibbs algorithm is anal-ysed, the performance of the algorithm is evaluated from the images generated by the

160

parametric model. The experiment on the real captured dots is also conducted.

6.1. Synthetic Data

The simulated images are generated with a size of 64 × 64. As mentioned previ-ously, the priors are fixed, the expected values of the prior are set by the initial val-ues, and the variance are chosen large in order to consider a highly uncertain prior

165

knowledge. The estimation is implemented with 20000 MCMC iterations and a burn-in period of 5000, which means that 15000 last samples are used to approximate the parameters over 20000 generated samples.

6.1.1. Analysis of the estimation with respect toβ

In this analysis, only β is varying. The simulated images of single dot are

gen-170

erated. The true centres µ of the dot is (32, 32). With σ2 _{= 50, η = 0.8, λ = 0.6,}

along with three values of β, β ∈ {0.8, 2, 5}, Tab. 1 shows the mean squared errors of the estimators. Larger value of β leads to smaller errors for ˆσ2_{and ˆ}_{η. In contrast, the}

errors of ˆβ and ˆλ are larger with large β. With a large value of β, the simulated dots are different mainly near the edges. It means that the information of λ is less significant,

175

which makes the estimation of λ less consistent. While with smaller β, the contribution of λ to the connected black pixels is more visible (see Fig. 5). On the other hand, the simulated dots look similar for large values of β (see Fig. 6), which also makes the estimation of β less consistent.

(15)

(a) λ = 1 (b) λ = 0.6 (c) λ = 0.2 Figure 5: The role of λ with small β = 0.8.

(a) β = 7 (b) β = 7.5 (c) β = 8

Figure 6: Realization with three different values of β.

6.1.2. Analysis of the estimation with respect toη

180

The MSE of the estimators are given in Tab. 2. The simulations of single dot were conducted with β = 2, σ2 = 50, λ = 0.6, along with three values of η, η ∈ {0.5, 0.8, 0.97}. The estimators of σ2_{have smaller MSE with large η, which means}

the black pixels give a significant information to the estimation procedure. The MSE of ˆλ increases due to the fact that when η is large, the role of λ is visually presented

185

around the edge of the dots. This phenomenum can be observed in Fig. 4.

σ2 _β _η _λ

0.8 218.51 0.029 0.022 0.0025 2 45.12 0.23 0.011 0.0068 5 13.20 2.011 0.0037 0.0025

(16)

σ2 _β _η _λ

0.5 100.9 0.2041 0.0065 0.0058 0.8 45.12 0.23 0.011 0.0068 0.97 49.85 0.1845 0.032 0.01

Table 2: The mean squared errors of the estimators with various values of η, σ2_{= 50, β = 2, λ = 0.6.}

6.1.3. Analysis of the estimation with respect toλ

The simulations of one dot were conducted with β = 2, σ2 = 50, η = 0.8, along with three values of λ, λ ∈ {0.2, 0.6, 0.9}. Tab. 3 shows the MSE of the estimators. The estimators of σ2, β, η have decreasing MSE when λ increases. Since the number of

190

black pixels decreases when the value of λ increases, the information for the estimation of σ2_{, β, η exposes clearer on the realisations. On the other hand, the estimation of λ is}

better for small values, because the connection between black pixels strongly emerges.

σ2 _β _η _λ

0.2 66.06 0.247 0.026 0.0017 0.8 45.12 0.23 0.011 0.0068 0.9 36.57 0.199 0.0113 0.0181

Table 3: The mean squared errors of the estimators with various values of λ, β = 2, σ2_{= 50, η = 0.8.}

6.1.4. Posterior Cram´er-Rao bound for single dot case

195

The PCRB is used to assess the precision of the estimators. Since the posterior information matrix I in (16) cannot be calculated analytically, it is approximated by Monte Carlo simulation. The random values of vector of parameters θ generated N times by the priors. For each generated vector of parameter, M images are simulated to approximate the standard Fisher information matrix Id(θ) in (18).

200

µx µy σ2 β η λ

PCRB 0.111 0.109 9.621 0.002 0.002 0.003 MSE 5.180 5.605 33.146 2.843 0.044 0.054

Table 4: The approximated PCRB and the Mean Squared Error (MSE) of each parameter in case of single dot.

(17)

Bound in case of single dot with N = 100 generated random values of vector θ. The values of priors are as follows: centre (µx, µy) has multivariate normal distribution

with mean (32, 32) and covariance matrix diag(5, 5); σ2is log-normal distributed with mean 50 and variance 100; β follows gamma law with mean 2 and variance 1.5; η and

205

λ are beta-distributed B(3, 1.5). For each value of θ, M = 40 images are generated to approximate matrix Id(θ). Specifically, the size of each image is 64 × 64. With each

generated image, the parameters are estimated using Algo. 1. In total, 40 × 100 images are used to approximate the Mean Squared Error. As we observe in this example, σ2_is

estimated least accurate.

210

6.1.5. The performance with many dots

The impact of the number of dots, K, is analysed. Instead of one dot on each image as the analyses above, each image contains multiple simulated dots. The experiment is set up with σ2 _{= 50, β = 2, η = 0.8, λ = 0.6, and all generated dots are completely}

separated from each other. As demonstrated in Tab. 5, the estimators are clearly more

215

consistent for the images with more dots. The accuracy and consistency of ˆβ, ˆη, ˆλ is crucial in authentication. Therefore, analysing many dots is a good mean to have a more consistent estimation of the parameters. Since more dots give more information about the shape, which are determined by the parameters of the model, the estimators are more consistent.

1 dot 2 dots 4 dots 7 dots σ2 6.03 3.08 2.42 1.92

β 0.25 0.19 0.18 0.15 η 0.08 0.04 0.04 0.03 λ 0.08 0.06 0.05 0.03

Table 5: The standard deviation of the estimators with different number of cluster-dots K, σ2 _{= 50, β =}

2, η = 0.8, λ = 0.6. 220

6.2. Estimation results from printed dots

The dots from four printers of four popular technologies are taken: conventional offset, waterless offset, laser and inkjet printings. The resolutions of each printing process for each sample are respectively 1200 dpi with offset printers, 600 dpi with

(18)

laser printer, and 720 dpi with inkjet one. For each printer, 10 images of size 64x64

225

of one dot are collected for the experiment. Fig. 7 represents the images of the real dots and the realisation generated from the estimated parameters obtained from the developed MCMC method. Metropolis-Hasting within Gibbs algorithm is conducted with 30000 MCMC iterations with 10000 as burn-in period. The realisation from the Spatial binary model [23] is illustrated to compare the improvement as well.

230

Conventional Offset Waterless offset Avg. Std. Avg. Std. β 3.71 1.4 6.55 1.74 η 0.96 0.05 0.99 0.0004 λ 0.67 0.15 0.52 0.13 Laser Inkjet Avg. Std. Avg. Std. β 1.21 0.92 2.28 0.51 η 0.59 0.32 0.99 0.01 λ 0.30 0.06 0.20 0.04

Table 6: The estimators of the real dots by MCMC.

Tab. 6 gives the numerical values of the estimators by MCMC estimation method as well as the standard deviation values. Fig. 7 visually compares the printed dots, the realisations from the proposed Markov spatial model and from the spatial binary model that does not take into account the neighbour dependence developed in [22, 23]. The results are very promising when the interaction is taken into account. The

235

realisation from Markov spatial model is more realistic to the real dot than the one from spatial binary model. With the offset and inkjet images, the fragment near the border is resembled better, which makes them more similar to the real dots. The shape of the dot from inkjet printer is more complicated to simulate due to the concave parts. With the print from laser printer, we can see the substantial improvement using the

240

Markov spatial model w.r.t. the spatial binary model. The simulated dot is much more realistic and continuous.

In Fig. 8 and Tab. 7, we give other estimation of four printed dots with the resolu-tion 600 dpi from HP-600 M620 laser printer. We can again have a visual comparison to show the improved simulation using the Markov spatial model. The capability of

(19)

the proposed model to capture more details can clearly increase the power of discrim-ination between printers. Moreover, since the characteristics of printed dots are repre-sented by the parameters, the proposed model may significantly reduce training data than using machine learning generative models to characterise the printed dots.

Estimators Error with 99% of confidence

σ2 _66.82 _±0.019

β 1.36 ±3.6 × 10−4

η 0.76 ±2.6 × 10−4

λ 0.35 ±1.2 × 10−4

Table 7: The estimators of σ2_{, β, η, and λ from four printed dots.}

7. Application in printer source identification

250

The capability of simulating the random shape of printed dots of the model and the performance of the estimation are demonstrated in previous Section. In this part, we use the typical random shape which is characterised by the parameters as a ”signature” to identify the printer source. The vector of three parameters (β, η, λ) can be used as a characteristic of a printer with a specific setting. These three parameters are

dimen-255

sionless, which means that the size of image of dots does not affect the value of these parameters. With the model, we are able to set up an authentic tag from an exact printer by setting specific values for the parameters of the printer (ink toner, paper...). Under a specific setting, a particular printer statistically creates the same shape of microscopic dots. In this experiment, we applied the maximum likelihood classification method to

260

discriminate the printed dots from different sources [34, 35]. Fig. 9 summarises the procedure of the proposed printer identification method.

The estimations of the shape parameters are displayed in Fig. 10 that shows respec-tively β vs. λ, β vs. η and λ vs. η. Each printer is 3D Gaussian cluster in the space of parameters. Their location and shape are respectively given by their means (Tab. 6)

265

and their covariance matrices (Tab. 8). For each printer (conventional offset, waterless offset, laser and inkjet), we take 10 other images for the discrimination experiment. In total, 20 images for each printer with 10 images are used to approximate the param-eters of 3D Gaussian cluster. In this specific experiment, all the dots are printed on

(20)

uncoated paper, and other conditions are identical during the experiments. In this case,

270

the normal distributions of the shape parameters enable to use the maximum likelihood classification method. All the samples are correctly classified (100%) with respect to their printer sources. The confusion matrix is presented in Tab. 9

1.910 0.011 0.095 0.011 6e-4 0.001 0.095 0.001 0.033

3.034 6e-4 0.193 6e-4 e-7 4e-5 0.193 4e-5 0.018 (a) Conventional offset (b) Waterless offset

1.199 -0.190 0.030 -0.190 0.096 0.006 0.030 0.006 0.007 0.253 0.001 0.015 0.001 1.2e-5 e-4 0.015 e-4 0.002 (c) Laser (d) Inkjet

Table 8: The covariance matrices calculated from the estimators of four printers.

Actual/Predicted Con. offset Wat. offset Laser Inkjet

Con. offset 20 0 0 0

Wat. offset 0 20 0 0

Laser 0 0 20 0

Inkjet 0 0 0 20

Table 9: Confusion matrix for ML classification of four different types of printer: conventional offset, wa-terless offset, laser and inkjet.

Other experiment is also conducted on two laser printers, a HP laser printer and a Dell laser printer. The resolution of 600 dpi is set up for both printers, and the samples

275

are printed on uncoated paper. The relation between estimated parameters of each printer is given in Fig. 11 which clearly separates two printers. Their location and shape of the 3D Gaussian clusters of each printer are respectively given by the means in Tab. 10 and the covariance matrices in Tab. 11. Similar to the previous experiment, 20 images are captured from each printer with 10 images are used to approximate the

280

parameters of 3D Gaussian cluster. The maximum likelihood classification method also gives a completely accurate classified result (Tab. 12). These results point out the relevance of the model for printed dots at the microscopic scale.

(21)

section, the estimation of parameters is more consistent with many dots. Therefore,

285

the analysis of many dots will help to improve the accuracy of discrimination an exact printer source of a document from the other printers. In practice, a micro-tag consisting in printed dots can be embedded in an authentic document.

β η λ

HP 0.668 0.225 0.129 Dell 2.218 0.947 0.116

Table 10: The mean values of the estimators of HP laser printer and Dell laser printer.

0.0147 0.001 -0.0003 0.001 0.009 0.0009 -0.0003 0.0009 0.0007 0.243 0.054 -0.003 0.054 0.028 0.001 -0.003 0.001 0.0007 (a) HP (b) Dell

Table 11: The covariance matrices calculated from the estimators of HP and Dell laser printers.

Actual/Predicted HP laser Dell laser

HP laser 20 0

Dell laser 0 20

Table 12: Confusion matrix for ML classification of two laser printers.

8. Concluding remarks

In this paper, we have proposed a novel statistical model that considers the spatial

290

interaction of printed dots at microscopic scale. This model allows to describe more accurately the formation of the shape of the binary dot. The spreading of the ink on the surface was described as the emergence of the black pixels and the interaction between pixels that depends on the distance between the centres of the dots. All these features are modelled by non-stationary Markov chain.

295

An estimation method for the parameters based on the Metropolis-Hasting within Gibbs algorithm is proposed. The performance of this approach is assessed, not only with simulated data but also with real dots coming from four different printing tech-nologies. Thanks to the proposed model and inference algorithm, the realisations are

(22)

very close to the real images. The fragmentation on the border of the dot is better

300

treated, and the density of the black pixels is more realistically simulated with the newly introduced interaction. Authentication of printers from micro-prints has shown promising results owning to the accuracy of both the Markov spatial model and the estimation algorithm. Future research directions are consisting in extending this model to grayscale images and colour images.

305

Acknowledgments

Q-T. Nguyen wants to dedicate this work to Prof. Y. Delignon, 2nd author of the paper who passed away before this work can be published.

References

[1] J. Picard, C. Vielhauer, N. Thorwirth, Towards fraud-proof id documents using

310

multiple data hiding technologies and biometrics, in: Electronic Imaging 2004, International Society for Optics and Photonics, 2004, pp. 416–427. doi:10. 1117/12.525446.

[2] R. Vill´an, S. Voloshynovskiy, O. Koval, T. Pun, Multilevel 2-d bar codes: to-ward high-capacity storage modules for multimedia security and management,

315

Information Forensics and Security, IEEE Transactions on 1 (4) (2006) 405–420. doi:10.1109/TIFS.2006.885022.

[3] P.-J. Chiang, N. Khanna, A. K. Mikkilineni, M. V. O. Segovia, S. Suh, J. P. Alle-bach, G. T.-C. Chiu, E. J. Delp, Printer and scanner forensics, Signal Processing Magazine, IEEE 26 (2) (2009) 72–83. doi:10.1109/MSP.2008.931082.

320

[4] G. N. Ali, A. K. Mikkilineni, J. P. Allebach, E. J. Delp, P.-J. Chiang, G. T. Chiu, Intrinsic and extrinsic signatures for information hiding and secure printing with electrophotographic devices, in: NIP & Digital Fabrication Conference, Vol. 2003, Society for Imaging Science and Technology, 2003, pp. 511–515.

(23)

[5] S.-J. Ryu, H.-Y. Lee, D.-H. Im, J.-H. Choi, H.-K. Lee, Electrophotographic

325

printer identification by halftone texture analysis, in: Acoustics Speech and Sig-nal Processing (ICASSP), 2010 IEEE InternatioSig-nal Conference on, IEEE, 2010, pp. 1846–1849. doi:10.1109/ICASSP.2010.5495377.

[6] J. Oliver, J. Chen, Use of signature analysis to discriminate digital printing tech-nologies, in: NIP & Digital Fabrication Conference, Vol. 2002, Society for

Imag-330

ing Science and Technology, 2002, pp. 218–222.

[7] S. Voloshynovskiy, M. Diephuis, F. Beekhof, O. Koval, B. Keel, Towards repro-ducible results in authentication based on physical non-cloneable functions: The forensic authentication microstructure optical set (famos), in: Information Foren-sics and Security (WIFS), 2012 IEEE International Workshop on, IEEE, 2012,

335

pp. 43–48. doi:10.1109/WIFS.2012.6412623.

[8] A. Sharma, L. Subramanian, E. A. Brewer, Paperspeckle: microscopic fin-gerprinting of paper, in: Proceedings of the 18th ACM conference on Com-puter and communications security, ACM, 2011, pp. 99–110. doi:10.1145/ 2046707.2046721.

340

[9] B. Zhu, J. Wu, M. S. Kankanhalli, Print signatures for document authentication, in: Proceedings of the 10th ACM conference on Computer and communications security, ACM, 2003, pp. 145–154. doi:10.1145/948109.948131.

[10] A. K. Mikkilineni, G. N. Ali, P.-J. Chiang, G. T. Chiu, J. P. Allebach, E. J. Delp, Signature-embedding in printed documents for security and forensic applications,

345

in: Electronic Imaging 2004, International Society for Optics and Photonics, 2004, pp. 455–466. doi:10.1117/12.531944.

[11] N. Khanna, A. K. Mikkilineni, G. T. C. Chiu, J. P. Allebach, E. J. Delp, Sur-vey of Scanner and Printer Forensics at Purdue University BT - Computational Forensics: Second International Workshop, IWCF 2008, Washington, DC, USA,

350

August 7-8, 2008. Proceedings, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 22–34. doi:10.1007/978-3-540-85303-9_3.

(24)

[12] M.-J. Tsai, J.-S. Yin, I. Yuadi, J. Liu, Digital forensics of printed source identi-fication for chinese characters, Multimedia tools and applications 73 (3) (2014) 2129–2155. doi:10.1007/s11042-013-1642-2.

355

[13] M.-J. Tsai, C.-L. Hsu, J.-S. Yin, I. Yuadi, Japanese character based printed source identification, in: Circuits and Systems (ISCAS), 2015 IEEE International Symposium on, IEEE, 2015, pp. 2800–2803. doi:10.1109/ISCAS.2015. 7169268.

[14] M.-J. Tsai, C.-L. Hsu, J.-S. Yin, I. Yuadi, Digital forensics for printed

charac-360

ter source identification, in: Multimedia and Expo (ICME), 2016 IEEE Inter-national Conference on, IEEE, 2016, pp. 1–6. doi:10.1109/ICME.2016. 7552892.

[15] M.-J. Tsai, I. Yuadi, Source identification for printed arabic characters, in: PRO-CEEDINGS OF THE 9TH IEEE INTERNATIONAL CONFERENCE ON

UBI-365

MEDIA COMPUTING” UMEDIA-2016”, 2016, pp. 49–53.

[16] M.-J. Tsai, I. Yuadi, Digital forensics of microscopic images for printed source identification, Multimedia Tools and Applications (2017) 1–30doi:10.1007/ s11042-017-4771-1.

[17] R. A. Ulichney, Review of halftoning techniques, in: Electronic Imaging,

Inter-370

national Society for Optics and Photonics, 1999, pp. 378–391. doi:10.1117/ 12.373419.

[18] L. Velho, J. d. M. Gomes, Digital halftoning with space filling curves, in: ACM SIGGRAPH Computer Graphics, Vol. 25, ACM, 1991, pp. 81–90. doi:10. 1145/127719.122727.

375

[19] T. N. Pappas, D. L. Neuhoff, Model-based halftoning, in: Electronic Imaging’91, San Jose, CA, International Society for Optics and Photonics, 1991, pp. 244–255. doi:10.1117/12.44360.

(25)

[20] H. S. Baird, Document image defect models, in: Structured Docu-ment Image Analysis, Springer, 1992, pp. 546–556. doi:10.1007/

380

978-3-642-77281-8_26.

[21] M. Norris, E. H. B. Smith, Printer modeling for document imaging, in: Pro-ceedings of the 2004 International Conference on Imaging Science, Systems, and Technology (CISST’04), 2004.

[22] Q.-T. Nguyen, Y. Delignon, L. Chagas, F. Septier, Printer technology

authenti-385

cation from micrometric scan of a single printed dot, in: Proc.SPIE, Vol. 9028, 2014, pp. 9028 – 9028 – 7. doi:10.1117/12.2039989.

URL https://doi.org/10.1117/12.2039989

[23] T. Q. Nguyen, Y. Delignon, L. Chagas, F. Septier, Printer identification from micro-metric scale printing, in: Acoustics, Speech and Signal Processing

390

(ICASSP), 2014 IEEE International Conference on, IEEE, 2014, pp. 6236–6239. doi:10.1109/ICASSP.2014.6854803.

[24] Q. T. Nguyen, Y. Delignon, L. Chagas, F. Septier, Modélisation de points im-primés à l’échelle micro-métrique, in: XXVème Colloque GRETSI, 2015.

[25] D. Hilbert, Ueber die stetige abbildung einer line auf ein fl¨achenst¨uck,

395

Mathematische Annalen 38 (3) (1891) 459–460. doi:10.1007/ 978-3-662-25726-5_1.

[26] S.-i. Kamata, M. Niimi, E. Kawaguchi, Interactive analysis of multi-spectral im-ages using a hilbert curve, in: Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision & Image Processing., Proceedings of the 12th IAPR

Interna-400

tional Conference on, Vol. 1, IEEE, 1994, pp. 93–97. doi:10.1109/ICPR. 1994.576234.

[27] R. Fjortoft, Y. Delignon, W. Pieczynski, M. Sigelle, F. Tupin, Unsupervised clas-sification of radar images using hidden markov chains and hidden markov ran-dom fields, Geoscience and Remote Sensing, IEEE Transactions on 41 (3) (2003)

405

(26)

[28] N. Dridi, Y. Delignon, W. Sawaya, C. Garnier, Em-based joint symbol and blur estimation for 2d barcode, in: Image and Signal Processing and Analysis (ISPA), 2011 7th International Symposium on, IEEE, 2011, pp. 32–36.

[29] J. Zhang, S.-i. Kamata, Y. Ueshige, A pseudo-hilbert scan algorithm for

410

arbitrarily-sized rectangle region, in: Advances in Machine Vision, Image Pro-cessing, and Pattern Analysis, Springer, 2006, pp. 290–299. doi:10.1007/ 11821045_31.

[30] C. Robert, G. Casella, Monte Carlo Statistical Methods, Springer Science & Busi-ness Media, 2013. doi:10.1007/978-1-4757-4145-2.

415

[31] D. Arthur, S. Vassilvitskii, k-means++: The advantages of careful seeding, in: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algo-rithms, Society for Industrial and Applied Mathematics, 2007, pp. 1027–1035.

[32] P. Tichavsk`y, C. H. Muravchik, A. Nehorai, Posterior cram´er-rao bounds for discrete-time nonlinear filtering, Signal Processing, IEEE Transactions on 46 (5)

420

(1998) 1386–1396. doi:10.1109/78.668800.

[33] H. L. Van Trees, Detection, estimation, and modulation theory, John Wiley & Sons, 2004.

[34] G. M. Foody, N. Campbell, N. Trodd, T. Wood, Derivation and applications of probabilistic measures of class membership from the maximum-likelihood

clas-425

sification, Photogrammetric engineering and remote sensing 58 (9) (1992) 1335– 1341.

[35] A. Asmala, Analysis of maximum likelihood classification on multispectral data, Applied Mathematical Sciences 6 (129-132) (2012) 6425–6436.

(27)

Real Markov spatial model Spatial binary model

Conventional offset

Waterless Offset

Laser

Inkjet

Figure 7: The estimation from four printing processes and the realisations from the models.

(a) Real image (b) Regenerated image (c) Approximated centres

(28)

Binary quantization

MCMC using Algo.1

Estimate of 𝜃! = !𝛽!, 𝜂̂, 𝜆!!

Captured image

Printer Source Identification

using Maximum Likelihood

Printed image

(29)

0 1 2 3 4 5 6 7 8 9 10 β 0 0.2 0.4 0.6 0.8 1 λ Conventional offset Waterless offset Laser Inkjet 0 1 2 3 4 5 6 7 8 9 10 β 0 0.2 0.4 0.6 0.8 1 η Conventional offset Waterless offset Laser Inkjet 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 λ 0 0.2 0.4 0.6 0.8 1 η Conventional offset Waterless offset Laser Inkjet

Figure 10: Relation between the estimated parameters of the samples from four printers with four printing technologies.

(30)

0.5 1 1.5 2 2.5 3 β 0.06 0.08 0.1 0.12 0.14 0.16 λ HP Dell 0.5 1 1.5 2 2.5 3 β 0 0.2 0.4 0.6 0.8 1 η HP Dell 0.06 0.08 0.1 0.12 0.14 0.16 λ 0 0.2 0.4 0.6 0.8 1 η HP Dell

Figure 11: Relation between the estimated parameters of the samples from a HP laser printer and a Dell laser printer.