• Aucun résultat trouvé

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

N/A
N/A
Protected

Academic year: 2022

Partager "Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch"

Copied!
23
0
0

Texte intégral

(1)

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if

there is a cover source-mismatch

Lionel PIBRE 2,3, J´erˆome PASQUET 2,3, Dino IENCO 2,3, Marc CHAUMONT1,2,3

(1) University of Nˆımes, France (2) University Montpellier, France

(3) CNRS, Montpellier, France

March 3, 2016

Media Watermarking, Security, and Forensics,

IS&T Int. Symp. on Electronic Imaging, SF, California, USA, 14-18 Feb. 2016.

(2)

The big promise of CNN...

Superlatives: lots of enthusiasm, fresh ideas, amazing results,...

RM+EC CNN FNN

Max 24.93% 7.94% 8.92%

Min 24.21% 7.01% 8.44%

Variance 0.14 0.12 0.16 Average 24.67% 7.4% 8.66%

Table: Steganalysis results (PE) with S-UNIWARD, 0.4 bpp, clairvoyant scenario, for RM+EC, CNN, and FNN

But ... experimental setup was artificial...

(3)

Outline

1 CNN

2 Story

3 Experiences

4 Conclusion

(4)

An example of Convolution Neural Network

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

Inspired from Krizhevskyet al. 2012 Network,

Detection percentage only 3% to 4% lower than EC + RM.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ” ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, NIPS’2012.

Yinlong Qian, Jing Dong, Wei Wang, and Tieniu Tan, ”Deep Learning for Steganalysis via Convolutional Neural Networks,” in Proceedings of SPIE Media Watermarking, Security, and Forensics 2015.

(5)

Convolution Neural Network: Preliminary filter

F(0) = 1 12

−1 2 −2 2 −1

2 −6 8 −6 2

−2 8 −12 8 −2

2 −6 8 −6 2

−1 2 −2 2 −1

CNNs converge much slower without this preliminary high-pass filtering.

(6)

Convolution Neural Network: Layers

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

Inside one layer; successive steps:

a convolution step,

the application of an activation function, a pooling step,

a normalization step.

(7)

Convolution Neural Network: Convolutions

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

First layer:

k(1) =I(0)?Fk(1). (1) Other layers:

˜Ik(l) =

i=K(l−1)

X

i=1

Ii(l−1)?Fk,i(l), (2)

(8)

Convolution Neural Network: Activation

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

Possible activation functions:

absolute function f(x) =|x|, sine function f(x) =sinus(x),

Gaussian function as in the Qian et al. network f(x) = e−x

2

σ2 , ReLU (for Rectified Linear Units): f(x) =max(0,x) as in our work, Hyperbolic tangent: f(x) =tanh(x)

(9)

Convolution Neural Network: Pooling

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

Pooling is a local operation computed on a neighborhood:

local average (preserve the signal),

or local maximum (translation invariance).

+ a sub-sampling operation.

For our ”artificial” experiments, the pooling was not necessary.

(10)

Convolution Neural Network: Normalization

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

Case where normalization is done across the maps:

norm(Ik(1)(x,y)) = Ik(1)(x,y) 1 +sizeα

k0=min(K,k−bsize/2c+size)

P

k0=max(0,k−bsize/2c)

(Ik(1)0 (x,y))2

!β

(11)

Convolution Neural Network: Fully Connected Network

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels

3x3 16 feature maps

29 x 29

pooling pooling

Layer 4 16 kernels

3x3 16 feature maps

13 x 13 Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

three layers.

a softmax function normalizes values between [0,1].

the network delivers a value for cover (resp. for stego).

(12)

Our CNN

image 256

256 F(0)

Kerrnel Filtered image 252

252

Layer 1 16 kernels

5x5 Label=0/1

16 feature maps 124 x 124 pooling

Gaussian

pooling

Layer 2 16 kernels

3x3 16 feature maps

61 x 61 Fully connected

layers

ReLU ReLU

Softmax

Convolutional layers Classification

128 neurons

128 neurons

Layer 3 16 kernels 3x3

16 feature maps 29 x 29

pooling pooling

Layer 4 16 kernels 3x3

16 feature maps 13 x 13

Layer 5 16 kernels

5x5 pooling

Gaussian Gaussian Gaussian Gaussian

16 feature maps 4 x 4

Figure: Qianet al. Convolutional Neural Network.

image 256

256 F(0)

Kernel Filtered

image 252

252

Layer 1 64 kernels

7x7 Label=0/1

64 feature maps 127 x 127

ReLU ReLU

Layer 2 16 kernels

5x5 16 feature maps

127 x 127 Fully connected layers

ReLU ReLU

Softmax Convolutional layers Classification

1000 neurons

1000 neurons

stride 2

Figure: Our Convolutional Neural Network.

(13)

Outline

1 CNN

2 Story

3 Experiences

4 Conclusion

(14)

The story of that paper...

(15)

Outline

1 CNN

2 Story

3 Experiences

4 Conclusion

(16)

Experiences

40 000 images of size 256×256 from BOSSBase, S-UNIWARD at 0.4 bits per pixels,

Same embedding key and use of the simulator, learning on 60 000 images,

Why using the same key ? We did not want to do that...

Documentation error in the C++ S-UNIWARD software, Qian et al. 2015 have also misled.

⇒ We discovered this key problem the 23th of December 2015...

(17)

About the simulator

Probabilities for modifying a pixel xi with i ∈ {1...n} are:

p(−)i = exp(−λρ

(−)

i )

Z , for a−1 modification, p(0)i = exp(−λρ

(0)

i )

Z , for no modification, p(+)i = exp(−λρ

(+)

i )

Z , for a +1 modification, where

(−)i },{ρ(0)i }, and{ρ(+)i }are the changing costs, λis obtained in order to respect the payload constraint, Z =exp(−λρ(−)i ) +exp(−λρ(0)i ) +exp(−λρ(+)i ).

(18)

Using the same key...

Probabilities for modifying pixelxi:pi(−)=exp(−λρ (−)

i )

Z ,pi(0)=exp(−λρ (0) i )

Z , andp(+)i = exp(−λρ (+) i )

Z .

What happen when using the same key...

The embedding key initialize the Pseudo-Random-Number-Sequence Generator,

Whatever the image, the Pseudo Random NumberSequence ∈[0,1]n is the same,

The sequence is used to sample the distribution (see probabilities), Whatever the image, some position i will be most of the time always modified, and always with the same ”polarity” (-1 or +1)...

This situation is artificial !!!

(19)

Illustration on the cropped BOSSBase database.

Figure: Probability of change. In white the most probable sites and in black the less probable ones.

(20)

Our best CNN in that ”artificial scenario”

40 000 images of size 256×256 from BOSSBase, S-UNIWARD at 0.4 bits,

Same embedding key and use of the simulator, learning on 60 000 images,

RM+EC CNN FNN

Max 24.93% 7.94% 8.92%

Min 24.21% 7.01% 8.44%

Variance 0.14 0.12 0.16 Average 24.67% 7.4% 8.66%

Table: Steganalysis results (PE) with S-UNIWARD, 0.4 bpp, clairvoyant scenario, for RM+EC, CNN, and FNN

But ... experimental setup was artificial...

Note that with different embedding keys, the same CNN structure in 2 layers, and with more neurons, the probability of error is 38.1%... There is still hope!

(21)

Conclusion on that story

Be careful to the software’s implementations!

Be careful to use different keys for embedding!

Be careful: the simulator only does a simulation (different from STC), Rich Models are under-efficient to detect the spatial phenomenons,

You will also find in the paper:

Explanation/discussion on CNN, behavior of a CNN, A discussion on embedding keys,

The presentation of the LIRMMBase.

(22)

End of talk

CNN is not dead...

... there is still things to do...

(23)

About LIRMMBase

” LIRMMBase: A database built from a mix of Columbia, Dresden, Photex, and Raise databases, and whose images do not come from the same cameras as the BOSSBase database. ”,

L. Pibre, J. Pasquet, D. Ienco, and M. Chaumont, LIRMM Laboratory, Montpellier, France, June 2015, Website: www.lirmm.fr/ chaumont/LIRMMBase.html.

Références

Documents relatifs

– Convolution layers & Pooling layers + global architecture – Training algorithm + Dropout Regularization.. • Useful

ASDL-GAN [Tang et al. 2017] “Automatic steganographic distortion learning using a generative adversarial network”, W. Huang, IEEE Signal Processing Letter, Oct. ArXiv 2018]

The results are even better when we apply both augmentation techniques (database enhancement and virtual augmentation); the error probability is reduced by 5-7% compared to

Problematic: Real life case scenario Limitation of existing data augmentation solutions I Due to storage limitations, RAW images are not usually available, besides not easy to

Shi, IH&MMSec’2016 [4]: ”Deep Learning Hierarchical Representations for Image Steganalysis”, J... Letters The first reference network

Hinton, ”ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, NIPS, 2012.. Tan, ”Deep Learning for Steganalysis

In this source-cover mismatch scenario, we demonstrate that an Ensemble Classi- fier with Features Selection (EC-FS) allows the steganalyst to obtain the best

Holub, “Ensemble classifiers for steganalysis of digital media,” IEEE Transactions on Information Forensics and Security, TIFS’2012..