• Aucun résultat trouvé

IDENTIFY, LOCATE AND SEPARATE: AUDIO-VISUAL OBJECT EXTRACTION IN LARGE VIDEO COLLECTIONS USING WEAK SUPERVISION

N/A
N/A
Protected

Academic year: 2021

Partager "IDENTIFY, LOCATE AND SEPARATE: AUDIO-VISUAL OBJECT EXTRACTION IN LARGE VIDEO COLLECTIONS USING WEAK SUPERVISION"

Copied!
6
0
0

Texte intégral

(1)

HAL Id: hal-02380780

https://hal.telecom-paris.fr/hal-02380780

Submitted on 26 Nov 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

IDENTIFY, LOCATE AND SEPARATE:

AUDIO-VISUAL OBJECT EXTRACTION IN LARGE

VIDEO COLLECTIONS USING WEAK

SUPERVISION

Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Duong, Patrick Pérez, Gael

Richard

To cite this version:

Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Duong, Patrick Pérez, et al.. IDENTIFY, LOCATE AND SEPARATE: AUDIO-VISUAL OBJECT EXTRACTION IN LARGE VIDEO COLLECTIONS USING WEAK SUPERVISION. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2019, New Paltz, United States. �hal-02380780�

(2)
(3)
(4)
(5)
(6)

Références

Documents relatifs

The annotation vocabulary for empirical film studies and semantic annotations of audio-visual material based on Linked Open Data principles enables the publication, reuse,

Most spent sources in research institutes would normally be expected to be located in a dedicated storage area. There may however be many separate storage areas in a typical

Moving towards the more challenging task of event retrieval, our approach efficiently retrieves instances of events in a large collection of videos, as shown for the EVVE

Abstract: We propose a method to count and estimate the mixing directions and the sources in an underdetermined multichannel mixture. Like DUET-type methods, the approach is based

Our goal is to rethink the design of audible alarms for surveillance by focusing on redundant modalities: instead of conceiving visual information and audible alarms

CROSS-MODAL ANIMATIONS DESIGNS Before designing new systems for surveillance activities, we first wanted to explore congruent audio-visual mappings for simple

This spatio-temporal representation used in the memory module is extracted from two streams—the appearance stream which describes static features of objects in the video, and

`a percevoir comme r´eels les objets du monde virtuel durant son exp´erience ; en- suite, la pr´esence de soi ou pr´esence personnelle qui intervient lorsque l’utilisateur per¸coit