• Aucun résultat trouvé

Video waterscrambling: towards a video protection scheme based on the disturbance of motion vectors

N/A
N/A
Protected

Academic year: 2022

Partager "Video waterscrambling: towards a video protection scheme based on the disturbance of motion vectors"

Copied!
14
0
0

Texte intégral

(1)

Video Waterscrambling: Towards a Video Protection Scheme Based on the Disturbance of Motion Vectors

Yann Bodo

TECH/IRIS/CIM, France Telecom R&D, 4 rue du Clos Courtel, 35512 Cesson S´evign´e Cedex, France Email:yann.bodo@wanadoo.fr

Nathalie Laurent

TECH/IRIS/CIM, France Telecom R&D, 4 rue du Clos Courtel, 35512 Cesson S´evign´e Cedex, France Email:nathalie.laurent@francetelecom.com

Christophe Laurent

TECH/IRIS/CIM, France Telecom R&D, 4 rue du Clos Courtel, 35512 Cesson S´evign´e Cedex, France Email:christophe2.laurent@francetelecom.com

Jean-Luc Dugelay

Multimedia Communication Department, Institut EURECOM, 2229 Route des Cretes, BP 193, 06904 Sophia-Antipolis Cedex, France

Email:jean-luc.dugelay@eurecom.fr

Received 31 March 2003; Revised 19 December 2003

With the popularity of high-bandwidth modems and peer-to-peer networks, the contents of videos must be highly protected from piracy. Traditionally, the models utilized to protect this kind of content are scrambling and watermarking. While the former protects the content against eavesdropping (a priori protection), the latter aims at providing a protection against illegal mass distribution (a posteriori protection). Today, researchers agree that both models must be used conjointly to reach a sufficient level of security. However, scrambling works generally by encryption resulting in an unintelligible content for the end-user. At the moment, some applications (such as e-commerce) may require a slight degradation of content so that the user has an idea of the content before buying it. In this paper, we propose a new video protection model, calledwaterscrambling, whose aim is to give such a quality degradation-based security model. This model works in the compressed domain and disturbs the motion vectors, degrading the video quality. It also allows embedding of a classical invisible watermark enabling protection against mass distribution. In fact, our model can be seen as an intermediary solution to scrambling and watermarking.

Keywords and phrases:content protection, video scrambling, watermarking, motion estimation.

1. INTRODUCTION

With the fast proliferation of high-bandwidth personal modems (especially ADSL and cable modems), the exchange of digital multimedia contents has drastically increased. This exchange is also greatly facilitated by the emergence of digital communities that share many files across peer-to-peer net- works. Among these shared files, many are copyrighted, and in this context, it is necessary to control their distribution in an open network such as the Internet.

This occurred recently with the MP3 revolution in digital audio contents. From this time, many MP3 processing soft- wares including CD rippers, MP3 encoders, and MP3 play- ers have been posted for free on the Web allowing end-users

to build their own MP3 record collections from their own CDs. Inevitably, this situation has caused an incredible piracy activity and Web sites have begun to stream and provide copyrighted MP3 music for free. In response to this piracy situation, the Recording Industry Association of America (RIAA) created the Secure Digital Music Initiative (SDMI, http://www.sdmi.org) working group to explore technologi- cally secure alternatives to the MP3 format. This group aimed at protecting online music from illegal duplication and mass distribution. To test the proposed solutions, on September 6th 2000, it issued the SDMI challenge to the digital com- munity inviting people to crack their system. Unfortunately, students from the Princeton University successfully hacked the SDMI technology.

(2)

This digital audio situation shows that with the digital era comes the need for the adaption of business practices.

Traditional methods are often not successful when imple- mented online. However, technology can evolve drastically faster than the business world and it has become increas- ingly difficult for the entertainment industry to adapt at the same rate as the fast changing world of digital innovations.

The proposal of digital rights management (DRM) technol- ogy has been initiated in an attempt to overcome these prob- lems and to initiate new working practices. The DRM sys- tems generally provide two essential functions: management of digital rights by identifying, describing, and setting the rules of content usage, and digital management of rights by securing the content and enforcing usage rules. However, a recent report from the Commission of European Commu- nities [1] shows that today, DRM systems are neither widely deployed nor widely accepted, mainly due to the reduction of ease of use, the prevention of generally accepted uses (e.g., private copy of content), and a lack of flexibility and inter- operability between existing systems. Therefore, while piracy practices are very active, new digital businesses are not able to take place and peer-to-peer communities can still oc- cur.

In fact, two different problems arise: the content protec- tion and copy protection. While content protection aims at protecting the content itself against eavesdropping, the ob- jective of copy protection is to avoid the illegal mass distri- bution of copyrighted contents.

Content protection is an old issue in the digital TV envi- ronment and new cryptographic tools have been proposed [2], namely conditional access, in this context. Conditional access systems work by scrambling the content, that is, en- crypting the content with keys that change frequently. This kind of protection has been adopted by all digital TV broad- casting standards, such as digital video broadcasting (DVB) in European countries.

The problem of copy protection has been tackled in the analogue TV world by Macrovision with the proposal of the APS copy protection scheme based on the differences in the way VCRs and TVs operate. However, copy protection in the analogue world is of limited importance due to the degrada- tion of video quality along with the copy generations. Con- versely, this issue is crucial in the digital environment in which digital content can be cloned without loss of quality.

In this way, CD technology has been the first victim with the advent of CD writers. Consequently, in 1996, the Mo- tion Picture Association of America (MPAA), the Consumers Electronics Manufacturers Association (CEMA), and mem- bers of the computer industry put together anad hocgroup called Copy Protection Technical Working Group (CPTWG) to discuss the technical problems of protecting digital video from piracy, particularly in the domain of digital versatile disk (DVD) [3]. This working group has addressed four key problems: content protection, analogue copy protection, digital copy generation management, and exchange of con- tents across digital networks. Unfortunately, the content- scrambling-system (CSS), chosen to encrypt the DVD con- tent, used a weak 40-bit key and the algorithm was quickly

hacked into by Stevenson [4] in 1999, making it possible to extract the contents of DVDs in unscrambled form.

All these aforementioned facts show that the content protection against eavesdropping and illegal copy is a chal- lenging task and we cannot always be sure that a proposed method will be totally secure. On the other hand, there is a difficult tradeoffbetween system complexity and cost. In fact, manufacturers often accept a limited amount of piracy by adopting the well-known mantra “keeping honest people honest.”

Among methods proposed in literature to protect video contents, two approaches are classically utilized: scrambling and watermarking. As scrambling is generally based on old and proven cryptographic tools [5], it efficiently ensures con- fidentiality, authenticity, and integrity of messages when they are transmitted over an open network. However, it does not protect against unauthorized copying after the message has been successfully transmitted and decrypted [6]. This kind of protection can be handled by watermarking [7], which is a more recent topic that has attracted a large amount of re- search and is perceived as a complementary aid in encryp- tion. A digital watermark is a piece of information inserted and hidden in the media content. This information is im- perceptible to a human observer but can be easily detected by a computer. Moreover, the main advantage of this tech- nique concerns the nonseparability of the information to be hidden and the content. A watermark system consists of an embedding algorithm and a detector function. The embed- ding algorithm inserts a message inside media, the detector function is then used to verify the authenticity of the media by detecting the mark. The most important properties of a watermarking scheme include [8] robustness, fidelity, tam- per resistance, and payload. More details regarding the com- mon watermarking properties can be found in various pa- pers, such as [8,9]. Finally, a wide number of watermarking technologies have been developed and deployed today for a wide variety of applications as discussed in [8].

In this paper, we present an alternative video protection model that we callwaterscrambling. This new model is moti- vated by the following observations:

(i) a scrambling-based protection scheme totally prevents the end-user from seeing the content. However, it can be useful, for some applications such as e-commerce, to show the content under a degraded form in order to provoke an impulsive buying action;

(ii) a video protection solution based solely on a water- marking approach does not prevent the propagation of the content. The watermark must be coupled with another secure scheme to prevent illegal copy. In fact, a watermark-based video protection scheme needs a watermark compliant video player in order to be effec- tive.

Our waterscrambling solution can be seen as an intermediary solution between scrambling and watermarking. By disturb- ing the video sequence motion vectors in compressed form, our approach degrades the video quality, but still enabling video content to be perceived by an end-user, giving him an

(3)

idea of the original content. In this sense, our approach is a scrambling variant. By also being able to embed invisible in- formation in the motion vectors, our approach satisfies the previously recalled watermarking requirements.

After presenting an overview of the classical video pro- tection schemes in Section 2, our waterscrambling process will be detailed inSection 3. Finally, our conclusions and per- spectives will be discussed inSection 4.

2. AN OVERVIEW OF VIDEO CONTENT PROTECTION SCHEMES BASED ON WATERMARKING

TECHNIQUES

2.1. Video content protection problem statement As underlined in the previous section, two different prob- lems have to be considered when protecting video content:

the protection of the content itself and the prevention of ille- gal copy.

Content protection is an old issue in the digital TV area and works by scrambling (i.e., encrypting) video content [2].

To achieve a sufficient security level, due to the huge amount of data giving rise to specific attacks, the heart of the scram- bling security is a combination of a proven encryption algo- rithm with a frequent change of keys.

Obviously, the topic of content protection has also been discussed in the CPTWG with the aim of protecting the con- tent of DVDs [3]. For this purpose, the CSS algorithm devel- oped by Matsushita has been adopted.

The CPTWG has also considered the copy protection problem by embedding a pair of bits in the header of the MPEG stream. This protection scheme, called copy genera- tion management system (CGMS), encodes one of the three possible rules for copying: “copy freely” (i.e., the video may be freely copied), “copy never” (i.e., the video may never be copied), and “copy once” (i.e., only a first generation copy is authorized).

The arrival of new digital networks, thanks to powerful high-bandwidth digital buses such as IEEE1394, also needs new security specifications. In these networks, all devices are connected through digital links and it must be ensured that video content is not transported in clear text, nor can be illegally copied during its transfer between devices. This problem is generally resolved by providing a mechanism that strongly authenticates all network devices and that al- lows the content encryption key exchange between authen- ticated devices. Today, two competing solutions tackle this kind of problem: the oldest one is digital transmission con- tent protection (DTCP) that was developed by 5C (a consor- tium of five companies, including Hitachi, Intel, Matsushita, Sony, and Toshiba). The second solution, SmartRight, was designed by Thomson Multimedia and is today supported by eight other companies (Canal+ Technologies, Nagravi- sion, Gemplus, SchlumbergerSema, ST, Pioneer, Micronas, and SCM Microsystems). The protection of video content over digital networks is of prime importance and for this rea- son, the CPTWG has created the Digital Transmission Dis- cussion Group (DTDG) to explore the issue.

2.2. Watermarking protection schemes

A video watermark technique consists of the hiding of in- formation into a video sequence to protect the video con- tent as a whole. One way of embedding a watermark into a video is to independently mark all the video frames by us- ing techniques from the still image watermarking area. An- other way is to use the temporal information of the video.

Consequently, we can classify video watermarking schemes into two main categories: still image-based techniques and video-adapted techniques. Today, most of the video water- marking approaches rely on the extension of still image algo- rithms. However, these algorithms generally lack robustness since they do not fully consider the video temporal axis. Lit- erature has provided few watermarking algorithms that con- sider temporal information as a key advantage to propose a more robust solution. Effectively, it seems natural to consider that the robustness of a watermark can be greatly improved by considering the following two video properties.

(i) Information amount. A video sequence represents a larger amount of information than a still image.

Therefore, the insertion space of the watermark is in- creased and can be exploited to insert a more robust mark.

(ii) Motion information. The object motion increases the visibility of the mark.

However, the insertion of the watermark is also constrained by the following.

(i) Runtime complexity. The complexity of the mark inser- tion scheme should be small, and ideally, the algorithm should run in real time.

(ii) Compression constraint. The mark embedding process should not produce a compressed marked bitstream larger than the unmarked one.

(iii) New class of attacks. Video watermarking leads to somewhat different attacks than those used in im- age watermarking. Moreover, the mark should be de- tectable even after a loss of synchronization due to temporal subsampling or to the selection of a subse- quence.

2.2.1. Still image-based techniques

Primarily, watermarking algorithms for video were simply an adaptation of still image techniques. Langelaar et al. [10], Nikolaidis and Pitas [11], and O’Ruanaidh et al. [12] have each proposed good overviews of still image watermarking techniques that can be used to mark video content if we con- sider the video sequence as a succession of independent still images. To embed a watermark, we can work in the spa- tial domain or in a transform domain. In the same way, we can work with compressed or original uncompressed data.

Finally, to increase the invisibility of the inserted mark, re- searchers often use a psychovisual mask. Effectively, regard- ing the properties of the human visual system (HVS) in- crease the energy of the watermark without generating ad- ditional visual artifacts. Naturally, these possibilities are also

(4)

valid when marking video, explaining why many still image watermarking concepts are directly used in video.

One of the main techniques used in watermarking is the spread spectrum approach firstly introduced by Cox et al.

[13]. In this approach, the use of a pseudorandom bit gen- erator (PRBG) modulated with an oversampling version of the mark allows to generate redundancy and randomness in the embedding process, resulting in a largely increased ro- bustness. Based on this technique, an approach working in the spatial domain has been proposed by Hartung and Girod [14]. In this method, the video is considered as a 1D sig- nal. However, the authors do not really consider the intrinsic properties of the video because they store the video signal into a 1D vector, loosing thus the spatial information of the still frame as well as the temporal information that charac- terizes a video. It has to be noted that this scheme is among the first to deal with video watermarking.

Among watermarking approaches working in the com- pressed domain, 8×8 blocks are generally employed when embedding the watermark due to their use in compression standards such as MPEG or JPEG. Koch and Zhao [15] have developed a still image watermarking method using the JPEG compression scheme and working in the frequency domain.

They first apply a discrete cosine transform (DCT) on lu- minance blocks before quantizing them. Then, they pseudo- randomly select three of these quantized coefficients in the medium frequencies over which they apply an insertion rule.

This consists of imposing a pattern rule onto the three coef- ficients depending on the bit to be embedded. Dittman et al. [16] have proposed two watermarking algorithms: one adapted from this block-based technique and the other from the algorithm developed by Fridrich in [17]. In the first ap- proach, the embedding is performed by marking the 8×8 blocks in the DCT frequency domain. In the second one, the authors embed the mark in the spatial domain. The main ad- vantage of the second approach is that it is able to embed more than 250 bits and to withstand stirmark attack. The first algorithm is more suitable for video and improves the video quality. However, its complexity does not allow for the real-time constraint. It has to be noted that both approaches use the HVS properties to increase the robustness and the invisibility of the mark.

In [18], Wolfgang et al. proposed a still image watermark- ing scheme that they have adapted to video content by em- bedding the mark in the intra frames. In this work, the au- thors work in the DCT domain and use a spatial masking approach.

One of the well-known techniques proposed in the video watermarking topic is the just another watermarking sys- tem (JAWS) algorithm developed by Kalker et al. that has been firstly designed for broadcast monitoring [19]. JAWS is based on simple operations allowing for the real-time re- quirement in which the video is considered as a succession of still images. The watermarking payload was initially one bit, but in [20], the authors achieved an embedding of 36 bits/s thanks to the symmetrical phase only matched filter- ing (SPOMF) algorithm. This improvement was presented in [19] where the authors generate a pseudorandom pat-

tern according to the message to be embedded. The water- mark is then perceptually shaped and scaled before being in- serted. Although this algorithm is based on still image tech- niques, it shows a good robustness and is today one of the major algorithms proposed in video watermarking. It is use- ful to note that the JAWS-based watermarking solution is proposed by Philips under the commercial name of Water- cast.1

Another powerful commercial solution is the one pro- vided by Nextamp.2Their algorithm is mainly based on the still image watermarking scheme developed by Koch and Zhao [21] and the approach proposed by Baudry et al. [22].

This algorithm meets the real-time constraint.

2.2.2. Video adapted technique

Langelaar et al. [10] and Do¨err and Dugelay [23] have pro- posed comprehensive overviews of video watermarking tech- niques. While the former deals with basic approaches, the latter proposes a good view of the actual watermarking prob- lem. If we consider the problem of digital broadcasting, for which the runtime complexity must be drastically reduced, there is a need of getting an algorithm that meets the real- time constraint and that embeds the mark in the compressed domain. Hartung and Girod [14] proposed a method that works in the compressed domain and that involves the em- bedding of the mark in the video intra frames. Then, they made a drift compensation for visibility purposes. The mark is transformed into a 2D signal before being embedded in the image. The authors consider the compression problem but they do not use the motion at all. Now, it seems natu- ral to employ the motion data since it embeds a high-value added information into the intrinsic video content. For this purpose, some techniques are based on temporal 3D trans- forms [24,25], while others use motion vectors obtained by a motion estimator [26,27]. However, it must be empha- sized that 3D approaches generally consider temporal dimen- sion in the same way as spatial ones, although they do not hold the same kind of information. The same drawback has been noted in the source coding field where this kind of ap- proach did not reach good results. In [25], Tewfik et al. use a temporal wavelet transform in order to identify the low and fast motion areas in the video. They first extract the differ- ent scenes of the video by applying a temporal segmentation, and then apply their watermarking algorithm. By doing so, they can embed two different watermarks depending on the motion activity, and then adapt the watermark to the con- tent. The temporal axis is performed here for discriminat- ing the content and not to embed the mark. Moreover, the temporal wavelet transform greatly increases the complex- ity of the algorithm. In [24], a 3D discrete Fourier trans- form (DFT) is used. Due to the separability property of this transform, it can be considered as the composition of a 2D spatial DFT and a 1D temporal DFT. The mark is embed- ded in the magnitude component of the DFT coefficients.

1http://www.watercast.com.

2http://www.nextamp.com.

(5)

Although the temporal aspect is used, the complexity of this design is a drawback. Most of the temporal transforms are processed in order to discriminate between the different characteristics of the video content. Most of the time, the dis- crimination is performed on a motion basis (static/dynamic area) and/or feature basis (edge/texture area) resulting in a high computational cost. Some research works deal with watermarking schemes using motion vectors to embed the mark. This approach seems to be more appropriate to this media, but the preferred method is the inclusion of a psycho- visual mask in order to separate the dynamic zones from the static ones. Marking motion vectors was first introduced by Jordan et al. in [26], in which the authors select a set of mo- tion vectors over which they apply a parity rule to embed the mark. Later, Zhang et al. [27] used this principle and adapted the insertion rule by selecting the vector components that have the greatest magnitude. In [28], Lancini et al. embed the mark in the spatial domain. They first design a mask composed of three different components, one for luminance masking, one for texture masking, and the third for tem- poral masking, then they apply a classical spread-spectrum technique to embed the mark as in [13]. As mentioned in [29], JAWS is one of the main algorithms available to protect and control the illegal copy of DVDs. The main constraint discussed in this DVD protection topic is the real-time con- straint, needed for the detection algorithm in order to be in- corporated in the DVD decoding process. To reach this goal, the detection is performed directly on the MPEG stream re- sulting in a drastic reduction of the complexity at the cost of a slight reduction in performance. Finally, an adaptation of the techniques present in JAWS was designed in [30], resulting in a new algorithm that can be used to protect the digital cin- ema area. In this last design, the temporal axis is the only one used due to different constraints. Indeed, the handled cam- era used to make a screener of the projected movie introduces filtering and serious geometrical distortions. Thus, in order to be resistant to geometric attacks, they adapt their tech- nique to mark only the temporal axis. More recently, with the growth of the MPEG4 standard, some watermarking al- gorithms have been designed to protect the MPEG4 objects.

In [31,32], the procedure consists first of extracting the ob- jects from the stream and then embedding a watermark to protect each of these objects.

In conclusion, transform domain algorithms make the watermarking algorithms complex and thus costly. For broadcasting applications, real time is necessary. The best method in this context is to embed and detect in the com- pressed domain, or in the spatial (or temporal) domain.

Finally, it can be noted that some authors have proposed hybrid methods to protect digital contents by combining cryptography and watermarking. In this way, Macy et al. use in [33] a multilevel scrambling approach together with wa- termarking: the video content scrambling is based on the disturbance of DCT coefficients and the watermarking per- forms a classical spread spectrum in the spatial domain. In the same way, Bao [34] proposes to mix public key cryptog- raphy and watermarking. In [35], Cheng and Li perform a partial encryption of the content by using a wavelet trans-

form and a quadtree data structure. More recently, Zeng and Lei [36] have proposed a video protection technique com- bining a selective bit scrambling scheme in the frequency do- main, block shuffling, and block rotation of the transform coefficients and motion vectors.

3. THE VIDEO WATERSCRAMBLING APPROACH

3.1. Introduction

Until now, most watermarking systems were designed to protect a media content by inserting a robust and invisible copyright mark. Our approach is slightly different, since we use watermarking techniques to insert a visible mark thus

“scrambling” the video content. As underlined in the previ- ous section, scrambling is commonly employed to prevent unauthorized access to video data and works by distorting the data such that the video appears unintelligible to a viewer.

In our mind, this kind of approach is the most effective one to protect the video against eavesdropping. However, in some cases, it can be useful to show the video content un- der a degraded form until the end-user subscribes to the cor- responding service. In fact, a video protection scheme that gives the user an idea of the content can lead to impulsive subscription action, more than a pure scrambling approach.

In this section, we propose such a scheme that we call video waterscrambling. Contrary to classical scrambling sys- tems, our process distorts the video quality and is able to regulate the video visibility from the original to unintelligi- ble quality. Moreover, it does not disturb the video statistics as much as other schemes and it is not difficult to keep a good compression ratio by tuning the waterscrambling level.

Finally, contrary to most existent watermarking and scram- bling techniques, our waterscrambling system can run in real time during an MPEG compression phase (because it uses motion vectors computed during the compression process) or after the compression by extracting motion vectors from the MPEG bitstream.

Few research works have proposed adjustable video qual- ity schemes for security purposes. An access control system based on fractal coding theory was proposed in [37]. The au- thors use a fractal coding scheme to adaptively and partially encrypt an image. In fact, they present an approach based on iterated function system coding (IFSC) providing both com- pression and hierarchical access control for images at various resolution levels. This hierarchical access control scheme al- lows the terminals to display an image at a low resolution level. The higher resolution levels (which correspond to a better image quality) are displayed according to the receiver access rights that are usually determined by the subscription agreement.

In our approach, the distortion level is more flexible. Ef- fectively, contrary to [37] that proposes a coarse granularity by using only eight encryption levels, we propose a scheme with fine and continuous granularity. Moreover, our process is easy to implement and runs in real time. In order to reach this fine granularity, we build a visible marking system based on the use of the video motion vectors. As mentioned in

(6)

Section 2.2.2, only two important watermarking techniques based on motion vectors have been proposed in literature [26,27]. However, both methods suffer from serious draw- backs. The approach presented in [26] is based on the parity of the motion vector components which is not robust. Ef- fectively, filtering can destroy their watermarks by changing the parity of some motion vectors. Moreover, both methods are not reversible, which becomes a problem when the re- construction of the original video is needed as for our con- cept. Thus, the goal of the waterscrambling approach pro- posed in this paper consists in finding a reversible “pseu- doscrambling” solution which uses and modifies the MPEG motion vectors. However, if our major idea consists of de- signing a new kind of a pseudoscrambler, another interest of this approach concerns the possibility of inserting a water- mark during the scrambling process in real time (allowing us to build a complete protection system). To anticipate this, our waterscrambling solution must be compatible with a wa- termarking solution. In fact, the insertion rule of our sys- tem must resist manipulations usually performed on video data (e.g., compression, filtering, etc.). The marked motion vectors must be maintained in a local space determined by the insertion rule to resist attacks aiming at displacing them around their initial position.

3.2. Embedding scheme

Our waterscrambling procedure is included in an MPEG compression scheme. The first step consists of extracting the motion vectors to be marked and two different approaches can be envisaged for this purpose. The first method uses a syntactic analyzer to extract motion vectors from the MPEG compressed bitstream, and in this case, the waterscrambling system is an independent module. The second one consists of directly modifying motion vectors to be waterscrambled during the MPEG compression scheme. In this case, we must use a module compliant with the standard compression one.

To waterscramble the video, a visible mark, defined by a binary vectorW ∈ {−1, 1}N (whereNdenotes the size of the mark) is added to a set of chosen motion vectors. In order to increase the robustness of the mark, we apply a permuta- tionσf(W) onW at each frame f. First of all, as proposed in [13] and by analogy to spread spectrum communications, the mark is spread over many frequency bins so that the en- ergy in each one is very small. Thus, we extract from each frame f corresponding to an MPEG P or B frame, the set of themf motion vectors denoted byVf = {dif, 1≤i≤mf}. Then, a setVf (Vf ⊆Vf) ofkf ≤mf selected motion vec- tors is used to superpose the digital mark signalσf(W) onto the original signal of the selected motion vectors:

∀df =

dxf,dyfT ∈Vf, dWf =dfα,σf(W),Kσf(W)

, (1) wheredf is a motion vector belonging toVf,dfW is the re- sulting marked motion vector,αdenotes the mark strength (which could be different in various data samples), andΦis

a reversible function depending onWandK,Kbeing a wa- terscrambling secret key that may be used to enforce security.

To determine the setVf of chosen motion vectors, we use the waterscrambling keyKσf(W)to initialize a pseudoran- dom number generator (PRNG) which outputskfindexeskif (i∈[1,mf]) denoting the indexes of the motion vectors of interest inVf:Vf = {dfj, j∈ {kif}i[1,mf]}.

We point out that a PRNG is a cryptographic algorithm used to generate numbers that must appear random [5]. It has a secret state and it must generate outputs that are indis- tinguishable from random numbers to an attacker who does not know and cannot guess the secret state. In this sense, it is very similar to a stream cipher. Additionally, a PRNG must be able to alter its secret state by processing input values that may be unpredictable. A PRNG often starts in a guessable state and must process many inputs to reach a secure state.

Yarrow [38] is an example of a secure PRNG that can be used in our waterscrambling scheme because it has been proven to be more robust than other PRNGs. The major design prin- ciple of the Yarrow system is that its components are more or less independent, so that systems with various design con- straints can still use the general Yarrow design. The use of algorithm-independent components in the top level design is a key concept in Yarrow. The goal is not to increase the number of security primitives that a cryptography system is based on, but to leverage existing primitives as much as pos- sible. Hence, Yarrow relies on one-way hash functions and block ciphers cryptographic primitives.

To enforce the security level, the waterscrambling key Kσf(W)is changed for each new video to be waterscrambled.

This key represents the initial state of the PRNG. Our wa- terscrambling system uses this secret waterscrambling key in a similar way to a symmetric cipher, that is, the key must be shared between the content provider and the end-user to enable the “dewaterscrambling” process. Consequently, a se- cure channel must be set up between both parties to securely transfer this key.

Moreover, as the “dewaterscrambling” process needs to know the strengthαthat the provider used to waterscramble the video, the keyKσf(W)must be decomposed into two parts:

the seven first bits of the key contain the strengthαand the other bits denote the key itself used to initialize the PRNG.

The following waterscrambling embedding rule is then used:

∀df =

dxf,dyfT ∈Vf,

dWf =















dW,xf =



dxf+α×Υ(σf(W),Kσf(W)), ifσif(W)=+1,

dxf, otherwise,

dW,yf =



dyf+α×Υ(σf(W),Kσf(W)), ifσif(W)=−1,

dyf, otherwise, (2)

whereσif(W) denotes theith component of the vectorσf(W) andWthe visible mark. We can thus see thatW is scattered in the image by embedding only one bit in each chosen mo- tion vector ofVf.

(7)

1000 500

2000 0 200 400 600 800

1000 500

1000 0 100 200 300 400

(a)

1000 500

2000 0 200 400 600 800

1000 500

1000 0 100 200 300 400

(b)

1000 500

2000 0 200 400 600 800

1000 500

2000 0 200 400 600

(c)

Figure1: Modification of motion vectors distribution after the application of the waterscrambling. (a) Distribution of thexcomponent (right) andycomponent (left) of the original motion vectors. (b) Modification of the distribution with a waterscrambling strengthα=20.

(c) Modification of the distribution with a waterscrambling strengthα=100.

Υis in this case a reversible function allowing for the tun- ing of the degree of quality degradation.

Finally, to spread the waterscrambling effect, we can in- sert the visible mark in the transform domain instead of the spatial one. For this purpose, we perform two 1D DCTs, the first one on thexcomponents and the second one on they components of a global vectorV =(Vx,Vy)T R2kf with Vx=(dxf1,dxf2,. . .,dxfk f) andVy=(dyf1,dyf2,. . .,dyfk f).

By working in the transform domain, we are able to con- trol the global energy added to the motion vectors by, for example, only disturbing high or middle frequencies. More- over, we are able to keep the statistics of the motion vec- tors distribution, thus avoiding to increase the compres- sion ratio. To reach this goal, we can define the functionΥ such that it corresponds to a pseudohomothetic deforma- tion of the coded motion vectors distribution (seeFigure 1).

Figure 1shows an example of such motion vectors distribu-

tion. The distribution of the original motion vectors is illus- trated onFigure 1ain which the left graph (resp., the right graph) shows the distribution of the x (resp., y) compo- nents. The modifications of these distributions with a wa- terscrambling strength α = 20 and α = 100 are, respec- tively, shown on Figures1band1c. As it can be noted, pro- tecting a video with a strength α = 20 does not signifi- cantly change the distributions, thus allowing it to maintain approximatively the same compression ratio while degrad- ing sufficiently the video quality. Conversely, for a strength α = 100, the distribution of vector amplitudes in theydi- rection is greatly affected and the compression ratio is con- sequently degraded. Although most video codecs code mo- tion vectors using a differential approach, these curves show nevertheless that the coding cost does not change signifi- cantly. Indeed, the motion vectors are slightly modified by the waterscrambling scheme and thus remain in a restricted

(8)

100 80

60 40

20

Waterscrambling strength 110

120 130 140 150

Compressionratioincrease (%oftheoriginalsequencesize)

Stefan sequence Ping-pong sequence

Figure2: Variation of compression ratios according to the water- scrambling strength on Stefan and Ping-pong sequences.

spatial area. Consequently, by keeping approximatively the same distribution of motion vectors, we ensure that the cod- ing cost is close to be the same as the original video con- tent. In the case where the compression ratio must remain the same as the original compressed video, we have to choose the functionΥadequately. In addition,Figure 2shows the compression ratio variation according to the waterscram- bling strength applied to the video. As mentioned before, we can note that a strengthα=100 increases the compression ratio of about 35% when averaged over two video sequences.

However, a strength α = 20 generally suffices to degrade the video quality while keeping a sufficient level of visibil- ity (see the second row ofFigure 6). In this case, the increase of compression ratio is only of 10%, which is largely accept- able.

Once the visible mark is inserted, a classical watermark- ing approach can follow. A mix of scrambling and water- marking was first proposed in [39] in which two alterna- tives are presented. The first one embeds the watermark be- fore scrambling the content. In this way, the content receiver descrambles only the content and the mark remains. The second alternative proposes to send a scrambled video with- out embedding a watermark. At the receiver side, the content is descrambled and conjointly watermarked.

In our process, both approaches can be envisaged. Effec- tively, we can add an invisible watermark W on the same chosen motion vectors. Converse to the waterscrambling ap- proach, the strengthαmay be chosen in order to maintain the invisibility of the markWonto the original signal. This watermarking process can be performed in the compressed or in the uncompressed domain. In our case, we have chosen to work in the uncompressed domain to avoid the drift effect.

Thus, waterscrambling and watermarking processes are ap- plied in a similar embedding system, but in a different man- ner. For a watermarking scheme, the embedding rule is de-

R2

R1

H

R4 R3

K

E C

Figure3: Construction of a reference grid to embed a watermark on motion vectors.

δ2

δ1

H

k

h K

Z1

Z2

Figure4: Block element partitioning to embed the mark.

fined by

∀df=

dxf,dyfT∈Vf, dfW=dfα,σf(W),Kσf(W)

, (3) in whichΦ(α,σf(W),Kσf(W)) = α×Υ(σf(W),Kσf(W)), where Φ andΥ are nonreversible functions (e.g., one-way hash functions) ensuring that the mark can only be detected and not extracted, contrary to the waterscrambling scheme.

It is important to note that σf(W) is not necessarily the same permutation as the one used in the waterscrambling procedure. However, to improve the robustness of this ap- proach, the insertion rule must respect a spatial structure based on the construction of a reference gridGas illustrated inFigure 3. This rectangular grid is generated in the Carte- sian space and is associated to a referential (O,i,j). It repre- sents a block-based partitioning of the image compact sup- port resulting in a set of block elementsE, each of sizeH×K.

We denoteRias the intersection points between blocks that we call herereference points.

Each selected motion vector ofVf is first projected onG and this projection serves to compute its associated reference point. Figure 3illustrates this process: the extremity of the projected motion vector−−→OCbelongs to a blockEofG, from which four intersection pointsR1,R2,R3, andR4can be de- duced. The reference point associated to the motion vector is the one located at the smallest distance of the extremity of the vector (according to theL2distance). In the example of Figure 3, the reference point ofdf isR1.

(9)

O

C B D

Z1

Z2

(a)

O

C D B

Z1

Z2

(b)

O

C D B

Z1

Z2

(c)

O

C B D

Z1 Z2

(d)

Figure5: Computation of the watermarked vector.

Then, to embed the watermark, the motion vector is modified (see Figure 4) by constructing in each block ele- mentEas a rectangular element of sizeh×k(areaZ1), where h=H−2∗δ1,k=K−2∗δ2,δ1andδ2are chosen such that Z1andZ2cover the same area, andZ1∪Z2=E. Both zones Z1andZ2drive the mark embedding rule:Z1is associated to the bit1 andZ2to the bit +1.

Then, if we consider thatdf = −−→

OC is the vector to be watermarked (seeFigure 5) andWiis the bit to be inserted, the watermarked vectordfWis computed as follows:

(i) ifWi=+1 anddfis in the right place (i.e., in the zone Z2), thendfW =df; otherwise, a central symmetry of centerBmust be applied resulting indfW = −−→

OD(cf.

Figure 5b);

(ii) ifWi = −1 anddf is in the right place (i.e., in the zoneZ1), thendWf =df; otherwise, as theZ2area is not compact, three possibilities can appear to compute dWf :

(a)dWf is given by a central symmetry of centerBre- sulting indfW=−−→

OD(cf.Figure 5a);

(b)dWf is given by an axial symmetry parallel to they- axis and going throughBresulting indfW=−−→

OD(cf.

Figure 5c);

(c) dWf is given by an axial symmetry parallel to thex- axis and going throughBresulting indfW=−−→

OD(cf.

Figure 5d).

Note that the case illustrated inFigure 5b(i.e., modification of the motion vector from Z2 toZ1) only needs one kind of transformation. Indeed, due to the grid structure and the surface covered by both areasZ1andZ2, a motion vector lo- cated withinZ2will be automatically projected inZ1by ap- plying a central symmetry.

In fact, after computingdx=Cx−Bxanddy=Cy−By

(with B = (Bx,By)T andC = (Cx,Cy)T), the symmetry is chosen as follows:

(i) ifdx≤δ2anddy≤δ1, the central symmetry is applied;

(ii) ifdx ≤δ2, the axial symmetry parallel to thex-axis is applied;

(iii) ifdy≤δ1, the axial symmetry parallel to they-axis is applied.

In this paper, the first attempt to reach the invisibility con- straint has conducted us to minimize the distortion applied to the motion vectors, despite it is well known that this rule is not necessarily correlated with the visual aspect of the re- sulting modified video. To overcome this drawback, we have developed a second approach [40] that consists of choosing the best motion vector in the neighborhood of the original one and which is located in the area corresponding to the bit to be embedded. This last attempt was a significant improve- ment of the previous one.

(10)

For f=1–N{//Ndenotes the video frame number fori=1–kf{ifdfi Z1,thenσif(W)= −1;

else ifdif Z2,thenσif(W)=+1;}}

Algorithm1: Mark detection algorithm.

3.3. Retrieval Scheme

Our “dewaterscrambling” procedure is included in an MPEG decompression scheme. The goal of this procedure first con- sists of extracting the waterscrambled motion vectors, and secondly in dewaterscrambling them to allow the visualiza- tion of the original video onto which an invisible watermark has been embedded as detailed in the previous section. As for the embedding process, there are two different possible approaches. The first one uses a syntactic analyzer to extract the marked motion vectors from the MPEG compressed bit- stream, to correct them and to re-insert the corrected motion vectors in the MPEG compressed bitstream. The second one consists of directly correcting waterscrambled motion vec- tors during the decompression scheme. In this case, we must use a module compliant with the standard decompression one.

To dewaterscramble the video, the dewaterscrambling module has to extract the waterscrambling keyKσf(W)in or- der to initialize the PRNG and the strengthαthat has been used to waterscramble the video. We recall that the PRNG outputs thekf indexeskif of the marked motion vectors in Vf. In this way, we can extract the visible watermark from each frame f by applying the inverse function ofΦwith

∀df ∈Vf, df =dfW1α,σf(W),Kσf(W)

. (4)

For the classical watermarking system, the original video content is not used to detect the watermark presence. As for the waterscrambling system, the key Kσf(W) is used to ini- tialize the PRNG resulting in the knowledge of the marked motion vectors. The watermark bit inserted in each of thekf

marked vectorsdWf can then be detected. For this purpose, we apply the rule illustrated onAlgorithm 1.

Once a candidate markWis detected byAlgorithm 1, we must decide if it corresponds to the real embedded markW. For this purpose, we compute the correlationCf at frame f betweenWandWby the following recursion:

Cf =Cf1×(f 1) +1−d(W,W)/N

f , (5)

where d(W,W) denotes the Hamming distance between WandWandNis the mark length. IfCf ≥θ, whereθis a predefined correlation threshold,Wis considered to corre- spond toW.

3.4. Experimental results

Figure 6illustrates the video degradation obtained with three different mark strengthsαon the Stefan sequence. The first row shows frames of the original sequence (α=0), the sec- ond row shows the same frames waterscrambled withα = 40, and the last row shows the waterscrambling results with α=60. As expected, the strengthαallows the manipulation of the degree of video degradation: the higher the value ofα, the higher the degree of video degradation.

We have also conducted some experiments on many se- quences to check the robustness of our proposed watermark- ing scheme. For this purpose, we first performed some of the classical sequence manipulations including Divx3 lossy compression, blurring with a uniform kernel and rotation.

Additionally, we have also tested the robustness of our al- gorithm on the new codec that appears nowadays, namely H264. The H264 retained model was IBP with quantization steps of 10 and 20. Moreover, all optimization parameters (eighth pixel motion estimation, five reference images, etc.) have been used. The compression ratio used for experiments was 1 : 23 for the Divx codec, 1 : 28 for H264 IBP10, and 1 : 123 for H264 IBP20.

The correlation results (cf. (5)) obtained with these at- tacks on the Stefan sequence are plotted onFigure 7a. We re- call that the correlation level for a frame index f tells us if the mark has been detected in f. In this figure, the correla- tion thresholdθhas been set toθ=0.875. These results show that the mark is quickly detected, whatever the transform ap- plied onto the sequence. The best result was obtained with the Divx compression that needed only 6 images to detect the mark. The H264 IBP10 compression followed next where the mark was detected after 7 images. Then, the blur attack needed 16 images to detect the mark and the H264 IBP20 model 32 images. Finally, the worst result was obtained with the rotation attack that resulted in a detection at the 82nd frame. We can also remark that the mark was detected in the first frame for the original unattacked sequence. By analyzing these results, we can conclude that our watermark process is particularly robust to all tested attacks and few sequence im- ages are needed to detect the embedded mark.

We then checked the attack consisting in slightly displac- ing the marked motion vectors. Due to the grid structure used by the embedding rule, when the displaced motion vec- tor remains in the same area (i.e.,Z1orZ2), the watermark is right detected. In the other case, the permutation used in the embedding rule ensures that we are able to retrieve the watermark. Indeed, each bit ofσi(W), never being located in the same place thanks to the statistical accumulation prin- ciple, enables the watermark to be retrieved. In both cases, our watermarking scheme proved its robustness against sta- tistical attacks which try to estimate the mark to remove it.

Finally, another kind of attack may consist in recovering the original video content from the waterscrambled one. For this purpose, two kinds of attack can be envisaged:

3http://www.divx.com/.

(11)

Figure6: Waterscrambling results on the Stefan sequence according to the strengthα. The first row represents original sequence (α=0), the second row represents results withα=20, the third row usesα=40, and the fourth row usesα=60.

(1) from predicted blocks, each being composed of a wa- terscrambled motion vector and an error block com- puted from the original motion vector, one can try to correct the waterscrambled motion vectors by retriev- ing an intra block in the corresponding intra frame that better corresponds to the predicted block. For this purpose, we project back the predicted block into the corresponding intra frame before scanning a search window by looking for an appropriate intra block that maximizes the PSNR with the current predicted block;

(2) from intra blocks located in P or B frames (which are not waterscrambled since not being predicted), one can try to retrieve the original associated motion vec- tors of neighboring predicted blocks by increasing the block smoothness between the unwaterscrambled in- tra blocks and the waterscrambled predicted blocks.

The implementation of these two attacks led to the following conclusions:

(i) the first attack generates many block artifacts on the block boundaries. These artifacts can be explained by the fact the attacker only gets the error block to re- trieve the original intra block corresponding to the predicted block. Therefore, he can only find an ap- proximation of the original block and an approxima- tion of the original motion vector. This approximation error causes highly visible artifacts that are drastically increased with time, being due to the drift effect;

(ii) the second attack has two drawbacks. The first one is explained by the fact that there are generally few in- tra blocks in a predicted frame. Thus, we are only able to approximatively correct the motion vectors of pre- dicted blocks located in the immediate neighborhood of intra blocks, but we are unable to ensure the correc- tion of motion vectors when we walk far from the ini- tial intra block because the approximation error grows in the spatial domain each time we consider a new pre- dicted block to process. The other drawback concerns the accumulation of the same approximation error in the temporal axis generating a drift effect.

During our experiments, the first attack gave better results than the second one and to give an idea about the visual effect of this attack, we have plotted PSNR curves inFigure 7b. This figure shows the PSNR of the compressed video sequence, the unattacked waterscrambled sequence, and the attacked waterscrambled sequence. As we can see, the attack allows to slightly increase the PSNR, which remains nevertheless below the PSNR of the original sequence, proving that this attack is not effective.

4. CONCLUSION AND PERSPECTIVES

In this paper, we have presented an alternative video protec- tion model calledwaterscrambling. This new model has been motivated by the proposal of a new solution able to show a degraded form of content in order to provoke an impulsive

(12)

200 180 160 140 120 100 80 60 40 20 0

Frame index 0

0.2 0.4 0.6 0.8 1

Correlation

No attack Rotation 1 IBP20 IBP10

Divx compression Blur

Correlation threshold

(a)

100 80

60 40

20 0

Frame index 10

15 20 25 30 35 40

PSNR

Waterscrambled video Attacked waterscrambled video Compressed video

(b)

Figure7: (a) Correlation results on the Stefan sequence and (b) PSNR gain offered by the proposed attack on the same sequence.

buying action in some applications such as e-commerce. The developed solution consists of disturbing the video sequence motion vectors in the compressed domain. Then, a com- plementary watermarking solution based on a similar ap- proach has been added to our process in order to prevent illegal copy. We have created a new rule of embedding pro- cess by adding the invisible mark to previously waterscram- bled motion vectors in order to maintain a real-time process.

Therefore, while the motion vectors are visually disturbed for the waterscrambling scheme, we add only a small pertur- bation on them for the watermarking approach, making the unwaterscrambled content similar to the original. Moreover, while the waterscrambling scheme uses a reversible function during the mark embedding, the watermarking scheme uses a nonreversible function. The results reached by these two processes show their respective potential. Effectively, water- scrambling appears to be a good intermediate between the scrambling and watermarking process. Although our water- marking process first shows good results, it may not be robust against format changes, like scaling or shrinking. But our first goal was to resist compression and filter attacks. Now, future works will be concentrated on the improvement of our wa- termarking scheme in order to increase its robustness against other classes of attacks (particularly geometrical attacks). Ad- ditionally, the rule used to scramble the motion vectors has to be improved to keep their statistical distribution and con- sequently to keep the same level of compression. Finally, a fully compressed domain system has to be developed by tak- ing care of the constraints brought by this domain on the watermarking scheme (e.g., drift effect).

REFERENCES

[1] Commission of the European Communities, “Digital rights:

background, systems, assessment,” Comission StaffWorking Paper SEC(2002)197, Brussels, February 2002.

[2] B. M. Macq and J.-J. Quisquater, “Cryptology for digital TV broadcasting,”Proceedings of the IEEE, vol. 83, no. 6, pp. 944–

957, 1995.

[3] J. A. Bloom, I. J. Cox, T. Kalker, J.-P. M. G. Linnartz, M. L.

Miller, and C. B. S. Traw, “Copy protection for DVD video,”

Proceedings of the IEEE, vol. 87, no. 7, pp. 1267–1276, 1999.

[4] F. A. Stevenson, “Cryptanalysis of contents scrambling sys- tem,” online paper, http://www-2.cs.cmu.edu/dst/decss/

frankstevenson/analysis/html, November 1999.

[5] B. Schneier, Applied Cryptography, John Wiley & Sons, New York, NY, USA, 2nd edition, 1996.

[6] A. Piva, F. Bartolini, and M. Barni, “Managing copyright in open networks,” IEEE Internet Computing, vol. 6, no. 3, pp.

18–26, 2002.

[7] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn, “Informa- tion hiding—a survey,”Proceedings of the IEEE, vol. 87, no. 7, pp. 1062–1078, 1999, Special issue on protection of multime- dia content.

[8] I. J. Cox, M. L. Miller, and J. A. Bloom, “Watermarking appli- cations and their properties,” inProc. IEEE International Con- ference on Information Technology: Coding and Computing, pp.

6–10, Las Vegas, Nev, USA, March 2000.

[9] S. Katzenbeisser and F. A. P. Petitcolas, Eds., Infor- mation Hiding. Techniques for Steganography and Digital Watermarking, Artech House, London, UK, 2000.

[10] G. C. Langelaar, I. Setyawan, and R. L. Lagendijk, “Wa- termarking digital image and video data. A state-of-the-art overview,”IEEE Signal Processing Magazine, vol. 17, no. 5, pp.

20–46, 2000.

Références

Documents relatifs

Avec le logiciel AntRenamer  (Windows, fonctionne avec Wine sous Linux - je n’ai pas trouvé d’équivalent), choisir de renommer les photos à partir d’une liste (le fichier

In this paper we addressed the issue of automatic video genre categorization and we have proposed four categories of content descriptors: block-level audio fea- tures,

Wang, “Robust Digital Video Watermarking Scheme for H.264 Advanced Video Coding Standard,” Journal of Electronic Imaging 16(4), 2007... H.264 video watermarking - Marc Chaumont

Most studies begin with an “experiment of nature.” Some individuals spend a significant amount of time playing action video games as part of their daily routines, while others

The template is a key based grid and is used to detect and invert the eect of an ane geometric transformation (rotations, scaling and/or aspect ratio change) or in the case of the

We worked at algorithm level to merge the watermark with an MPEG2 codec [6] in order to mark video sequences during MPEG2 coding and extract watermarks The main differences with

In literature the task is known as content-based video retrieval (CBVR) [1], content- based video search [2], near-duplicate video matching [3] or content based video copy

In this research, we propose a method for embedding the hidden and fragile digital watermarks in the frames of video sequences, which can be applied for digital watermarking process