• Aucun résultat trouvé

A Vocal Learning Model: Development And Analysis

N/A
N/A
Protected

Academic year: 2021

Partager "A Vocal Learning Model: Development And Analysis"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02413140

https://hal.inria.fr/hal-02413140

Submitted on 16 Dec 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A Vocal Learning Model: Development And Analysis

Silvia Pagliarini, Arthur Leblois, Xavier Hinaut

To cite this version:

Silvia Pagliarini, Arthur Leblois, Xavier Hinaut. A Vocal Learning Model: Development And Analysis.

IMOL 2019 - Fourth International Workshop on Intrinsically Motivated Open-ended Learning, Jul 2019, Frankfurt, Germany. �hal-02413140�

(2)

A Vocal Learning Model: Development And Analysis

Silvia Pagliarini 1,2,3 , Arthur Leblois 3 , Xavier Hinaut 1,2,3

1) Mnemosyne, Inria Bordeaux Sud-Ouest, UMR 5800, CNR, Bordeaux INP, Talence, France 2) LaBRI, UMR 5800, CNRS, University of Bordeaux, Talence, France

3) IMN, UMR 5293, CNRS, University of Bordeaux, Bordeaux, France

INTRODUCTION

Sensorimotor learning: control problem which maps a sensory input into a motor output.

Imitation: the ability to replicate another’s behaviour, i.e. to generate an appropriate motor command to reproduce a sensory stimulus.

Inverse model learning: use the desired and actual motor configuration to estimate the motor commands needed to reach the desired configuration (to produce the

appropriate sensory stimulus).

Reinforcement learning: learn an action policy to maximize the expected reward (which encode the goal of the learning).

AIM : build a vocal learning model underlying song learning in birds, and understand how to make it biologically plausible.

Da Cunha et al.,2010

K E Y W O R D S

SENSORIMOTOR MODEL COMPONENTS [1]

I) Architecture III) Exploration

strategy II) Learning

rule

MOTOR CONTROL MODEL

VOCAL LEARNING IN BIRDS

AKNOWLEDGMENT: this work was supported by the Inria CORDI-S PhD fellowship grant.

BIBLIOGRAPHY PERSPECTIVES

PERCEPTUAL SPACE

Sensory phase

Building neural representation of the song.

Crystallization T

U T O

R a. Adult zebra finch song

b. Subsong (babbling).

Sensorimotor phase

c. Plastic song (temporally structured).

Learning phases:

Deep scattering transform [6]

● From MFCC coefficients introducing wavelets.

● Scattering coefficients can be seen as a convolutional neural networks.

WaveGAN [3]

● Two player minimax game: generator VS discriminator [4].

● Inspired by DCGAN architecture [5].

Motor control model

● Respiratory system and vocal organs.

● Anatomical properties and small size of birds make the investigation of vocal fold mechanisms difficult.

● The vocal output is driven by a complex

gesture-dependent control scheme, and the brain does not control each motor control parameter independently [2].

[1] Oudeyer, P. Y. (2005). The self-organization of speech sounds. Journal of Theoretical Biology, 233(3), 435-449.

[2] Elemans, C. P. H., et al. Universal mechanisms of sound production and control in birds and mammals. Nature communications, 2015, 6:

8978.

[3] Donahue, C., McAuley, J., & Puckette, M. (2019). Adversarial Audio Synthesis. ICLR.

[4] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

[5] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

[6] Andén, J., & Mallat, S. (2014). Deep scattering spectrum. IEEE Transactions on Signal Processing, 62(16), 4114-4128.

Brainard and Doupe,2002

AIM: learn the semantic modes in high-dimensional audio signals, and build a latent space useful for exploration.

In addition, have sound production in the model.

AIM: have a lower dimensional representation of the physical acoustic space, which is the sensory space (sound).

Motor control model

● Explore the possibility to use a dynamical system model.

● More exaustive evaluation of the generative modle.

Exploration strategy

● Intrinsic motivation to drive motor exploration.

● Investigate the geometrical properties of latent space generated with the GAN.

Perceptual space

● How the brain encodes sensory stimulus.

● Highly nonlinear dynamics.

● Low dimensional representation of the sensory space.

DATASET

● Recordings from an adult zebra finch with sampling rate 44100.

● Downsampled single syllables. N = 4946 syllables.

GENERATED SOUND DURING TRAINING

● Batch size = 64

● Epochs = 1662

Amador et al.,2013

Random input

0

1

GENERATOR

DISCRIMINATOR

● Discriminator trained 5x generator update.

DATASET

● Recordings from an adult zebra finch with sampling rate 16000.

● Averaging window = 800

● 1st and 2nd order coefficients.

● Cell array: elements correspond to layers in the scattering transform.

PCA analysis

Ep. 2

Ep. 20

Ep. 38

Ep.

1662 Ep.

1272 Ep.

522 Ep.

386

GENERATED SOUND AFTER TRAINING

● Batch size = 64

● Epochs = 517

Inception Score (IS) = 1.95 ±0.02

Second-order, translation-invariant scattering transform of one-dimensional signals

1st order coefficients 1st order coefficients

2nd order coefficients 2nd order coefficients

Time Time

Power

● N = 52 syllables.

● PCA analysis.

● k-means algorithm.

X = centroids

Power

Ep.

119

Références

Documents relatifs

Altogether, many studies provide convincing data that suggest (i) a high prevalence of HCV infection in CKD patients, with a high risk of contamination in kidney facilities, (ii) a

Sur la base de ces différences de faciès et d’altitude, on peut distinguer deux séquences, celles de Dönemeç (unités I-II : Cycle 1) et des Abricots (unités III : Cycle 2),

We have shown that goal representations, learned or not, have an impact on the agent capacity to learn a curriculum. In our works, goal representation are set in advance, and in

Recently, we have begun to extend the efficient coding hypothesis to active perception through a form of intrinsically motivated learning: a sensory model learns an efficient code

In the former case (Authentication ∈ AUTH ERRORS), the session manager’s state is updated: the session table records, for this user identifier and session identifier, the right to

Par cons´equent, les lecteurs de cette revue s’attendent `a juste titre – et ils le m´eritent – `a ce que Mat´eriaux et Construc- tions traite des d´eveloppements les plus

Par exemple, la vulnérabilité économique des professionnels du droit est faible (0,32 sur une échelle de 0 à 1) et 13 % d’entre eux cumulent un travail habituel à leur domicile,

Various authors ([11,12,13]) group the processes proposed in each model under four stages; namely the stages of creation, storage, dissemination and utilization with