Cross-modal interaction in deep neural networks for audio-visual event classification and localization

Partager "Cross-modal interaction in deep neural networks for audio-visual event classification and localization"

N/A

Protected

Année scolaire: 2021

Info

Télécharger

Protected

Academic year: 2021

Partager "Cross-modal interaction in deep neural networks for audio-visual event classification and localization"

Copied!

243

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 243 Page )

Texte intégral

Figure

Table 3.3 summarizes the diﬀerent audio-visual datasets.

Table 4.1. List of class with respective number of subclasses, possible positions in room 1 and 2.

Références

Télécharger maintenant ( PDF - 243 Page - 3.27 MB )

Documents relatifs

Audio, visual, and audio-visual egocentric distance perception in virtual environments.

The audio modality is therefore as precise as the visual modality for the perception of distances in virtual environments when rendered distances are between 1.5 m and 5

Regression versus classification for neural network based audio source localization

In the context of DOA estimation of audio sources, although the output space is highly structured, most neural network based systems rely on multi-label binary classification on

Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy

In Experiment 1, 4.5- and 6-month-old infants’ audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and

Training Deep Neural Networks for Visual Servoing

• a novel training process is introduced, based on a single image (acquired at a reference pose), which includes the fast creation of a dataset using a simulator allowing for

A Computer Vision Model for Visual-Object-Based Attention and Eye Movements

Visual attention mechanisms have also been broadly used in a great number of active vision models to guide fovea shifts [20]. Typically, log-polar or foveated imaging techniques

Eye-in-hand / eye-to-hand cooperation for visual servoing

In our system (see Figures 1 and 2), the global camera is static and controls the translating degrees of freedom of the robot effector to ensure its correct positioning while the

Efficient neural models for visual attention

The computational cost is dependent of the input image: a complex image (in the meaning of filter used in the SNN) induced a large number of spikes and the simulation cost is high..

A CBIR-based evaluation framework for visual attention models

The computational models of visual attention, originally pro- posed as cognitive models of human attention, nowadays are being used as front-ends to numerous vision systems like

Téléchargez tous les documents en téléchargeant vos documents d'étude.

Votre document sera enrichi, partagé sur 123dok FR pour vous aider à étudier.

Documents relatifs

Infrastructure décentralisée et sémantique pour l'apprentissage tout au long de la vie

Reactivation of Variscan Structures in Northern France

A Comparative Error Analysis of Audio-Visual Source Localization

The Pros and cons of protect-in-place

Quantitative description and local structures of trivalent metal ions Eu(III) and Cm(III) complexed with polyacrylic acid

New fluorescein precursors for live bacteria detection

L'accompagnement du maître G : pour un élève sujet et acteur de sa scolarité