• Aucun résultat trouvé

Learning visual representations with neural networks for video captioning and image generation

N/A
N/A
Protected

Academic year: 2021

Partager "Learning visual representations with neural networks for video captioning and image generation"

Copied!
154
0
0

Texte intégral

Loading

Figure

Fig. 2.1. Left: A feed-forward neural network with one input layer, one hid- hid-den layer, and one output layer
Fig. 2.2. Left: An RNN with one recurrent layer, one input layer and one output layer
Fig. 2.3. Left: A standard auto-encoder must have an information bottleneck in the hidden layer to avoid learning the trivial identity reconstruction function.
Fig. 2.4. Left: An example of directed graphical models. Edges are directed.
+7

Références

Documents relatifs

To evaluate how our transfer method performs on this very different target task, we use a network pre-trained on 1512 ImageNet object classes and apply our transfer methodology to

In Table 1, we report our single-image autoencoder-based results on this dataset along with those of the following state- of-the-art single image-based methods: KDE regression from

To evaluate how our transfer method performs on this very different target task, we use a network pre-trained on 1512 ImageNet object classes and apply our transfer methodology to

In our experiments, this version worked better in the Unsupervised and Transfer Learning Challenge, characterized by a non-discriminant linear classifier and very few labeled

We apply the random neural networks (RNN) [15], [16] developed for deep learning recently [17]–[19] to detecting network attacks using network metrics extract- ed from the

Deep learning architectures which have been recently proposed for the pre- diction of salient areas in images differ essentially by the quantity of convolution and pooling layers,

Model-free algorithms (lower branch), bypass the model-learning step and learn the value function directly from the interactions, primarily using a family of algorithms called

• a novel training process is introduced, based on a single image (acquired at a reference pose), which includes the fast creation of a dataset using a simulator allowing for