• Aucun résultat trouvé

MediaEval 2016: A multimodal system for the Verifying Multimedia Use task

N/A
N/A
Protected

Academic year: 2021

Partager "MediaEval 2016: A multimodal system for the Verifying Multimedia Use task"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01844003

https://hal.inria.fr/hal-01844003

Submitted on 19 Jul 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

MediaEval 2016: A multimodal system for the Verifying Multimedia Use task

Cédric Maigrot, Vincent Claveau, Ewa Kijak, Ronan Sicre

To cite this version:

Cédric Maigrot, Vincent Claveau, Ewa Kijak, Ronan Sicre. MediaEval 2016: A multimodal system for the Verifying Multimedia Use task. MediaEval 2016, Oct 2016, Hilversum, Netherlands. �hal-01844003�

(2)

MediaEval 2016: A multimodal system for the

Verifying Multimedia Use task

C´edric Maigrot

Vincent Claveau

Ewa Kijak

Ronan Sicre

{firstname}.{lastname}@irisa.fr

MediaEval 2016: A multimodal system for the

Verifying Multimedia Use task

C´edric Maigrot

Vincent Claveau

Ewa Kijak

Ronan Sicre

{firstname}.{lastname}@irisa.fr

Why use a multimodal system ? Because there are several types of hoax !

ú False information present in the text content ú Forged image ú Image reused for an other event

Global Hypotheses

ú Prediction is first made at the image-level, then propagated to the tweets that contain the image ú Translation if the detected language is different than english

Text-based approach

(run-T)

Detect if the message is style-wise

similar to known hoax

Ü Capture similar comments between an unknown image and an image from the training set (e.g. It’s photoshopped ) and similar genres of com-ments (e.g. presence of smileys)

Ü Prediction made by a k -Nearest-Neighbor ap-proach (in this case k = 1)

Source-based approach

(run-S)

Detect if the message is related to a

trustworthy source

Ü 2 type of sources searched: news-related organ-isms (e.g. press agencies) and explicit citations of the source of the image (e.g. the pattern pho-tographed by + Name)

Ü Predict real if a trustworthy source is detected, fake else

Example

Image-based approach

(run-I)

Detect a known image

ú Compare an unknown image to an image database of 8 000 known images (7 500 fake and 500 real images)

ú Database images extracted from 5 specialized websites

ú Description of an image by a deep CNN layer output (4096-dimensional descriptor)

ú Predict real (resp. fake) if a real (resp. fake) similar image is found in the database, uncertain else

Combination approach (run-C)

Combine the three previous predictions

ú Late fusion: learn the best combination

ú Boosting algorithm (adaboost.MH, parameters of the machine learning algorithm are set by cross-validation on the training data)

Results

run-T run-I run-S run-C

92 .23% 34 .07% 94 .63% 91 .22% 63 .98% 49 .18% 90 .3% 75 .25% 75 .57% 40 .25% 92 .42% 82 .47% Approaches S co re in %

ú 2 228 messages to classify, corresponding to 130 images

ú 86 % to the test tweets are associated with one or more images (the rest is associated with video)

Conclusion

ú Text-based approach: competes with the source-based approach in terms of recall but tends to classify every tweet as fake

ú Image-based approach: low precision compared with estimations on the training set. This may be due to: (1) small and unbalanced reference database; (2) original image and forged ones are sometimes very similar; (3) presence of stamps

ú Combination-based approach: does not offer any gain due to overfitting

Acknowledgements

Références

Documents relatifs

The classifiers are built using three types of features: a) tweet-based (TB), which use the post’s metadata, b) user- based (UB), which use the user’s metadata, c) multimedia

Regular expressions are used in the order as listed above. If a regex matches the query, data received from the regular expression are saved into the resulting object.

The seventh edition of the Placing Task at MediaEval fo- cuses on two challenges: (1) estimation-based placing, which addresses estimating the geographic location where a photo or

It combines textual, audio, and/or visual descriptors by applying rank aggre- gation and ranked list density analysis to combine multi- modal information encoded in ranked lists..

In this system, the task is approached as a person recogni- tion problem: there is an enrolment stage, where the voice and face of each discovered person are detected and, for

The Multimodal Person Discovery in Broadcast TV task can be split into subtasks: speaker diarization, face diariza- tion, optical character recognition (OCR), speech transcrip-

The UPC system works by extracting monomodal signal seg- ments (face tracks, speech segments) that overlap with the person names overlaid in the video signal1. These segments

There are two main processes in our method: (i) identify characters of interest and their names; and (ii) discover person using face recognition with deep features.. In the first