Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

A Vocal Learning Model: Development And Analysis

Partager "A Vocal Learning Model: Development And Analysis"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "A Vocal Learning Model: Development And Analysis"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

HAL Id: hal-02413140

https://hal.inria.fr/hal-02413140

Submitted on 16 Dec 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A Vocal Learning Model: Development And Analysis

Silvia Pagliarini, Arthur Leblois, Xavier Hinaut

To cite this version:

Silvia Pagliarini, Arthur Leblois, Xavier Hinaut. A Vocal Learning Model: Development And Analysis.

IMOL 2019 - Fourth International Workshop on Intrinsically Motivated Open-ended Learning, Jul 2019, Frankfurt, Germany. �hal-02413140�

(2)

A Vocal Learning Model: Development And Analysis

Silvia Pagliarini ^1,2,3 , Arthur Leblois ³ , Xavier Hinaut ^1,2,3

1) Mnemosyne, Inria Bordeaux Sud-Ouest, UMR 5800, CNR, Bordeaux INP, Talence, France 2) LaBRI, UMR 5800, CNRS, University of Bordeaux, Talence, France

3) IMN, UMR 5293, CNRS, University of Bordeaux, Bordeaux, France

INTRODUCTION

Sensorimotor learning: control problem which maps a sensory input into a motor output.

Imitation: the ability to replicate another’s behaviour, i.e. to generate an appropriate motor command to reproduce a sensory stimulus.

Inverse model learning: use the desired and actual motor configuration to estimate the motor commands needed to reach the desired configuration (to produce the

appropriate sensory stimulus).

Reinforcement learning: learn an action policy to maximize the expected reward (which encode the goal of the learning).

AIM ^: build a vocal learning model underlying song learning in birds, and understand how to make it biologically plausible.

Da Cunha et al.,2010

K E Y W O R D S

SENSORIMOTOR MODEL COMPONENTS [1]

I) Architecture III) Exploration

strategy II) Learning

rule

MOTOR CONTROL MODEL

VOCAL LEARNING IN BIRDS

AKNOWLEDGMENT: this work was supported by the Inria CORDI-S PhD fellowship grant.

BIBLIOGRAPHY PERSPECTIVES

PERCEPTUAL SPACE

Sensory phase

Building neural representation of the song.

Crystallization T

U T O

R a. Adult zebra finch song

b. Subsong (babbling).

Sensorimotor phase

c. Plastic song (temporally structured).

Learning phases:

Deep scattering transform [6]

● From MFCC coefficients introducing wavelets.

● Scattering coefficients can be seen as a convolutional neural networks.

WaveGAN [3]

● Two player minimax game: generator VS discriminator [4].

● Inspired by DCGAN architecture [5].

Motor control model

● Respiratory system and vocal organs.

● Anatomical properties and small size of birds make the investigation of vocal fold mechanisms difficult.

● The vocal output is driven by a complex

gesture-dependent control scheme, and the brain does not control each motor control parameter independently [2].

[1] Oudeyer, P. Y. (2005). The self-organization of speech sounds. Journal of Theoretical Biology, 233(3), 435-449.

[2] Elemans, C. P. H., et al. Universal mechanisms of sound production and control in birds and mammals. Nature communications, 2015, 6:

8978.

[3] Donahue, C., McAuley, J., & Puckette, M. (2019). Adversarial Audio Synthesis. ICLR.

[4] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

[5] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

[6] Andén, J., & Mallat, S. (2014). Deep scattering spectrum. IEEE Transactions on Signal Processing, 62(16), 4114-4128.

Brainard and Doupe,2002

AIM: learn the semantic modes in high-dimensional audio signals, and build a latent space useful for exploration.

In addition, have sound production in the model.

AIM: have a lower dimensional representation of the physical acoustic space, which is the sensory space (sound).

Motor control model

● Explore the possibility to use a dynamical system model.

● More exaustive evaluation of the generative modle.

Exploration strategy

● Intrinsic motivation to drive motor exploration.

● Investigate the geometrical properties of latent space generated with the GAN.

Perceptual space

● How the brain encodes sensory stimulus.

● Highly nonlinear dynamics.

● Low dimensional representation of the sensory space.

DATASET

● Recordings from an adult zebra finch with sampling rate 44100.

● Downsampled single syllables. N = 4946 syllables.

GENERATED SOUND DURING TRAINING

● Batch size = 64

● Epochs = 1662

Amador et al.,2013

Random input

0

1

GENERATOR

DISCRIMINATOR

● Discriminator trained 5x generator update.

DATASET

● Recordings from an adult zebra finch with sampling rate 16000.

● Averaging window = 800

● 1st and 2nd order coefficients.

● Cell array: elements correspond to layers in the scattering transform.

PCA analysis

Ep. 2

Ep. 20

Ep. 38

Ep.

1662 Ep.

1272 Ep.

522 Ep.

386

GENERATED SOUND AFTER TRAINING

● Batch size = 64

● Epochs = 517

Inception Score (IS) = 1.95 ±0.02

Second-order, translation-invariant scattering transform of one-dimensional signals

1st order coefficients 1st order coefficients

2nd order coefficients 2nd order coefficients

Time Time

Power

● N = 52 syllables.

● PCA analysis.

● k-means algorithm.

X = centroids

Power

Ep.

119

Références

Télécharger maintenant ( PDF - 2 Page - 2.28 MB )

Documents relatifs

Hepatitis C virus infection and chronic kidney disease: Time for reappraisal

Altogether, many studies provide convincing data that suggest (i) a high prevalence of HCV infection in CKD patients, with a high risk of contamination in kidney facilities, (ii) a

Indicateurs morphosédimentaires des variations de niveau lacustre dans les terrasses du lac de Van (Turquie)

Sur la base de ces différences de faciès et d’altitude, on peut distinguer deux séquences, celles de Dönemeç (unités I-II : Cycle 1) et des Abricots (unités III : Cycle 2),

Intrinsically Motivated and Interactive Reinforcement Learning: a Developmental Approach

We have shown that goal representations, learned or not, have an impact on the agent capacity to learn a curriculum. In our works, goal representation are set in advance, and in

Robust Active Binocular Vision through Intrinsically Motivated Learning

Recently, we have begun to extend the efficient coding hypothesis to active perception through a form of intrinsically motivated learning: a sensory model learns an efficient code

Verifying a signature architecture: a comparative case study

In the former case (Authentication ∈ AUTH ERRORS), the session manager’s state is updated: the session table records, for this user identiﬁer and session identiﬁer, the right to

Editorial/Éditorial

Par conséquent, les lecteurs de cette revue s’attendent à juste titre – et ils le méritent – à ce que Matériaux et Construc- tions traite des développements les plus

Les métiers au temps du corona, Note d’analyse de France Stratégie, No 88-Avril 2020

Par exemple, la vulnérabilité économique des professionnels du droit est faible (0,32 sur une échelle de 0 à 1) et 13 % d’entre eux cumulent un travail habituel à leur domicile,

A Model for Assessing Organizational Learning in Software Development Organizations

Various authors ([11,12,13]) group the processes proposed in each model under four stages; namely the stages of creation, storage, dissemination and utilization with

Téléchargez tous les documents en téléchargeant vos documents d'étude.

Votre document sera enrichi, partagé sur 123dok FR pour vous aider à étudier.

Documents relatifs

Mutation burden of rare variants in schizophrenia candidate genes

Mutation burden of rare variants in schizophrenia candidate genes

11

0

0

CHA Secretaries / Secrétaires de la SHC

CHA Secretaries / Secrétaires de la SHC

2

0

0

Conférences et demandes de communications

Conférences et demandes de communications

1

0

0

Dynamique et controle d'une presse d'injection

Dynamique et controle d'une presse d'injection

140

0

0

Douze ans de recherche en alphabétisation des adultes en français au Canada : 1994-2005

Douze ans de recherche en alphabétisation des adultes en français au Canada : 1994-2005

137

0

0

Atelier n°23 : Mise en œuvre de la classe inversée dans le supérieur - L'approche par « Learning Outcomes » et par compétences associée au dispositif de la Classe Inversée

Atelier n°23 : Mise en œuvre de la classe inversée dans le supérieur - L'approche par « Learning Outcomes » et par compétences associée au dispositif de la Classe Inversée

4

0

0

The Evolving Impact of Continental Accessibility on Local Employment Growth, 1971-2001

The Evolving Impact of Continental Accessibility on Local Employment Growth, 1971-2001

47

0

0

Les restaurations postérieures partielles indirectes collées : choix du matériau

Les restaurations postérieures partielles indirectes collées : choix du matériau

72

0

0