IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking
Texte intégral
Documents relatifs
Dealing with data originating from multiple modalities often requires to either combine them, fuse them into a joint multimodal representation or translate from one modality
The study focused on a home user scenario, and for this, to reflect the current wide usage of computer tablets, partici- pants used a version of the AXES video search system [9]
A two step approach is implemented: relying on a topic segmentation technique, the first step consists in generating potential target seg- ments; then, for each anchor, the best
The Search and Hyperlinking Task at MediaEval 2013 envisioned the following scenario: a user is searching for a segment of video that they know to be contained in a video
Concepts were inferred from the query text and the visual information mod- ule was used for query expansion, through the detection of visually similar segments to
In this paper, we further investi- gate multimodal representations of video segments in a hyper- linking system based on bidirectional deep neural networks, which
Several such diagrams are activated, using various word embeddings, image analysis models, and weights for the average, so that for given a topic and keyframe, we have
We have proposed a system that exploits (i) automatic speech recognition transcripts [5, 7], (ii) visual concepts, (iii) the entities extracted by means of a Name Entity