Current state of art - Mise en œuvre d’un système de détection de fraude et de falsification de

In this section, we discuss existing techniques and which are used for the detection of dif-ferent types of falsification.

5.2.1 Copy-move forgery detection

Copy-move forgery detection usually goes through the following steps : features extraction, matching, filtering and post-treatment.

5.2.1.1 Features extraction

Feature extraction techniques identify significant properties of the image that will act as representatives of the entire image. These features allow to better account for certain visual properties of the image. They can be used for subsequent treatments such as image classifica-tion.

There are two alternatives for this step : either we use the key points based method or blocks based method.

Key points based method

Key points are spatial locations or points in the image that define what is interesting or what stands out in the image. The peculiarity of the key points is that regardless of the orientation direction of the image that is to say if the image is rotated or enlarged or has undergone a distortion, we also find these same points in the modified image. Among the key point detection algorithms we have SURF and SIFT.

The methods that use key points are :

• Bo et al [7] conducted a study on copy-based counterfeit detection using the SURF algo-rithm. This involves the detection and description of key points. They used the Hessian matrix to detect key points and wavelets of Haar to assign the orientation. They assessed the dominant orientation and described the orientation of the key point descriptor. By ex-tracting square areas around these key points, they constructed SURF descriptors aligned with the dominant orientation. By weighting the responses with the Haar wavelets, they increased the robustness against the errors of localization and the geometric deforma-tions. SURF descriptors are then used for matching and they used a threshold to increase robustness and avoid false detections. Their method succeeds in locating altered regions even when post-processing is performed on images and is robust and fast for detection.

However, they could not find the exact boundaries of the altered region

• Zheng et al [8] propose a new method of key point matching based on the positional rela-tionship of key points. Key points in the falsified region and in the region of origin must be consistent and evenly distributed throughout the image. This ensures that similar large textures also produce a considerable number of key points. Their algorithm is designed to analyze and delete key points for the first time. This ensures that the noise has no impact on them. They developed a new algorithm to find the features that are stored in a matrix.

Their algorithm differs from SIFT in how to determine the characteristics. Their algorithm finds a pair of consistent key points and marks them as candidate key points only when they meet certain conditions. They also set a threshold value to reduce the number of false detections. Their algorithm does not detect alterations involving post-processing such as rotation and scaling.

Block based method

This method is based on the fact that there is a relationship between the region of the copied original image and the pasted region. For the extraction of characteristics, the method based on blocks subdivide the image into rectangular regions. For each of these regions, a feature vector is calculated. Similar feature vectors are then matched. Block-based methods can be grouped into four categories : time-based, dimension-reduction, intensity, and domain-based frequency.

The work done for feature extraction by the block method are :

• Bashar et al [13] propose a duplication detection approach that can adopt two robust fea-tures based on Discrete Wavelet Transformation (DWT) and Core Core Component Ana-lysis (KPCA). They used these methods because of their robust block matching functio-nality. They divided the image into several small blocks. They calculated KPCA-based vectors and DWT vectors for each block. These vectors are then arranged in a matrix for lexicographic sorting. Sorted blocks are used to find similar points and their offset fre-quencies are calculated. To avoid false detections, they set a threshold value for the offset frequency. They developed a new algorithm to detect counterfeit types by rotation using

a technique labeling and a geometric transformation. This algorithm has shown promi-sing improvements over the conventional PCA approach. It also detects fakes that have additive noise and lossy JPEG conversation.

• The method of Ghorbani and Firouzmand [21] which presents a new method of detection of copy-displacement fake. They performed a decomposition of quantization coefficients on coefficients of discrete cosine transformation and the wavelet transform. After conver-ting the image to grayscale, they applied DWT at the beginning to get four sub-bands.

They used only the low frequency sub-band for false detection. Then they split the image into several blocks of the same size that overlap and they applied DCT to obtain the DCT feature vectors. These feature vectors are arranged in a matrix. To reduce the complexity of the calculations, they made a lexicographic sort on the matrix. For each pair of adjacent lines in the matrix, they calculated the normalized offset vector. They counted the num-ber of times an offset vector appears. A threshold value is set for the count value and the blocks must be forged only if the count value exceeds this threshold value. Their method is effective for detecting fake compared to other methods. However, this method does not detect counterfeits when the altered region undergoes post-processing such as rotation, scaling and significant compression. In addition, this method imposes some restrictions on forged areas.

• Wang et al [22] developed this algorithm to be more efficient and resistant to various post-processing techniques, such as fuzzy and lossy JPEG compression. After reducing the size of the image using a Gaussian pyramid, they split the image into several fixed-sized blocks that overlap each other. They then applied the Hu moments to the blocks and calculated the eigenvalues. They used a tri-lexicographic to sort the vectors and a surface threshold was selected to reduce the false detections. They searched for corresponding blocks using mathematical morphological techniques. Their method detects forgery by copy transfer even after post-processing.

5.2.1.2 Matching

The great similarity between two feature descriptors is interpreted as an indication for a duplicate region. To find this similarity, we go through the matching step. Several methods are used : lexicographic sorting ([11], [14], [16], [17], [22], [19], [21], [23]), kd-tree algorithm ([9], [12]), euclidean distance ([20], [25]).

5.2.1.3 Filtering

The filtering step makes it possible to reduce the probability of false matches. For example, a common measure of noise suppression involves deleting matches between spatially close regions. Neighboring pixels often have similar intensities, which can lead to false detection.

Different distance criteria have also been proposed, in order to filter the weak matches. For example, several authors have proposed the Euclidean distance between matched feature vec-tors ([9], [10]). On the other hand, Bravo-Solorio and Nandi [16] proposed the correlation coef-ficient between two characteristic vectors as a similarity criterion.

5.2.2 Splicing forgery detection

Splicing falsification is a very common and simple trick in falsification and poses a threat to the integrity and authenticity of images. Therefore, the detection of this type of falsification is of great importance in digital forensics.

Minyoung et al [27] proposed a method that uses a learning algorithm for the detection of visual image manipulations, formed only with the aid of a large set of real-world data. The algorithm uses exchangeable image file format (EXIF) metadata of photos automatically recor-ded as a supervisory signal to form a classification model to determine if an image is consistent.

EXIFs are camera specifications that are digitally etched in an image file at the time of capture and are available everywhere. They apply this self-consistent model to detect and locate spli-cing cases on an image. The model is self-supervised in that only real photographs and their EXIF metadata are used for training. A coherency classifier is learned for each EXIF tag separa-tely using pairs of photographs, and the resulting classifiers are combined together to estimate the consistency of the pairs of photographs in a new input image.

5.2.3 Scanned document text forgery detection

Digitized documents are a direct accessory to many criminal and terrorist acts. Examples include falsification or modification of scanned documents used for identity, security or tran-saction recording purposes.

• Ramzi M. Abed [28] proposed a forgery detection system developed based on the iden-tification of the scanner used to scan the document. For him, this technique depends on the identification of the scanner signature. It is observable that the quality of the edges of the characters in the scanned documents varies according to the scanner used during the scanning process in the sense that the high resolution scanners produce stronger black lines with sharper edges, while the scanners with low resolution produce characters re-presented by black lines composed of variations of black and gray, and the edges of the characters are gradual. These differences result in texture changes.

• The method proposed by Romain et al [29] which uses intrinsic functionalities of the do-cument, it is based on the comparison of the forms of the characters (similarity or dissi-milarity of the characters) and on the detection of the peripheral characters.

Dans le document Mise en œuvre d’un système de détection de fraude et de falsification de documents scannés. (Page 66-70)