Tongue motor control: deriving articulator trajectories and muscle activation patterns from an optimization principle

(1)

HAL Id: hal-03215820

https://hal.archives-ouvertes.fr/hal-03215820

Submitted on 4 May 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Tongue motor control: deriving articulator trajectories and muscle activation patterns from an optimization

principle

Pierre Baraduc, Tsiky Rakotomalala, Pascal Perrier

To cite this version:

Pierre Baraduc, Tsiky Rakotomalala, Pascal Perrier. Tongue motor control: deriving articulator tra-jectories and muscle activation patterns from an optimization principle. Neural Control of Movement, Apr 2021, Virtual, France. 2021. �hal-03215820�

(2)

500

Postural goal Acoustic goal

500 1000 1500 2000 2500 F2 (Hz) 200 400 600 800 F1 (Hz) i a o e 1000 1500 2000 2500 F2 (Hz) 2000 2500 3000 3500 F3 (Hz) i a o e 1000 2000 F2 (Hz) 200 400 600 800 F1 (Hz) i a o e 1000 2000 F2 (Hz) 2000 2500 3000 3500 F3 (Hz) i a o e Data, perception [5]:

Pierre Baraduc, Tsiky Rakotomalala, Pascal Perrier

GIPSA-lab, UMR 5216 CNRS / Univ. Grenoble-Alpes / Grenoble-INP

{pierre.baraduc, ny-tsiky.rakotomalala, pascal.perrier}@gipsa-lab.grenoble-inp.fr

Tongue motor control: deriving articulator trajectories

and muscle activation patterns from an optimization principle

Pierre Baraduc, Tsiky Rakotomalala, Pascal Perrier

GIPSA-lab, UMR 5216 CNRS / Univ. Grenoble-Alpes / Grenoble-INP

{pierre.baraduc, ny-tsiky.rakotomalala, pascal.perrier}@gipsa-lab.grenoble-inp.fr

Key features of sensorimotor systems:

• Multisensory integration

• Use of internal models to predict the sensory outcomes of actions

• Comparison of the sensory input with the internal prediction to optimally update the internal estimate of the system

Speech production

• Coordination task: lips, jaw, tongue

• Resistance to external disturbances (inertial forces, objects in mouth, distorted audio feedback...)

• Can optimal feedback control theory illuminate the control

of tongue movements during speech (tongue kinematics, coarticulation, use of feedback...)?

• Minimization of effort produces plausible tongue trajectories

(kinematics, EMG)

• Part of phonemic variability linked to aspects of sensorimotor control?

• Toy model suggests coarticulation can be tackled by this method

• Model predictions should be validated with formant tracking, EMA recordings

and intramuscular EMG

METHODS

INTRODUCTION

RESULTS

CONCLUSIONS

REFERENCES

[1] Payan, Y., and Perrier, P. (1997). Synthesis of VV sequences with a 2D biomechanical tongue model controlled by the Equilibrium

Point Hypothesis. Speech Comm 22, 185–205.

[2] Badin, P., Elisei, F., Bailly, G. and Tarabalka, Y. (2008). An audiovisual talking head for augmented speech generation: models and

animations based on a real speaker's articulatory data. In Vth Conference on Articulated Motion and Deformable Objects, pp. 132–143.

[3] Badin, P., and Fant, G. (1984). Notes on vocal tract computation. STL-QPSR 25, 53–108. [4] Bryson A.E. (1999) Dynamic optimization. Addison-Wesley.

[5] Patri, J.-F., Diard, J., and Perrier, P. (2015). Optimal speech motor control and token-to-token variability: a Bayesian modeling

approach. Biol Cybern 109, 611–626.

Funded by

MIAI grant

Optimal controller

motor commands

acoustic feedback (formant positions) proprioceptive feedback

(tongue position/velocity)

vocal tract shape system state

acoustic goal Biomechanical model_{of tongue} Vocal tract_acoustics

Optimal estimator

Brain

Cost function

Tongue biomechanics:

• Finite element (FE) model of tongue deformation (sagittal 2D model)

• Seven muscles modelled: anterior genioglossus, posterior genioglossus, hyoglossus, styloglossus, verticalis, inferior longitudinalis, superior longitudinalis [1]

• Hill-type muscle model

• Activity-dependent tissue elasticity, small deformation approximation

• Fixed tongue floor

• Contacts with palate, velum and pharyngeal wall modeled as elastic interaction (wall: high stiffness)

• System continuous-time ODE solved with robust Runge-Kutta integration

Vocal tract, from tongue shape to acoustics:

• For a given external tongue contour, a fixed jaw position, and a fixed lip aperture, we deduce the shape of the complete vocal tract using anatomical reference data (MRI, [2])

• We then compute the resonances of the vocal tract with an harmonic model following [3] after discretization of the tract in 44 tubes of identical length, and keep the first three formants.

0 2 4 6 8 10 12 14 16 18

Distance from glottis (cm)

0 1 2 3 4 5 6 Area (cm 2) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Frequency (kHz) -20 -10 0 10 20 30 40 Amplitude of Transfer fn (dB) F1 F2 F3

Optimal control model:

Plant:

• Continuous time dynamics

over fixed time T • Proprioceptive feedback p modeled as 4D projection on principal axes of position and velocity, 3D acoustic feedback a (formant values)

• Acoustic F1-F2 goal • Variability study:

- linear reduced plant model identified over ~ 50,000 FEM simulations

- only motor noise: additive (SD σ_A) and multiplicative

(SD σ_M) Gaussian white noise on motor command

- Internal state estimate via extended Kalman filtering

Optimization:

• Unconstrained optimization: cost function includes neuromotor effort and precision penalty:

• Indirect optimal control (Pontryagin based), gradient descent and/or Newton-Raphson method [4]

• Some checks of sensitivity to initial parameters

muscle fibers

hyoid bone

X (mm)

Y (mm)

activity-dependent muscle tissue elasticity

Anterior Genioglossus Posterior Genioglossus Verticalis Inferior Longitudinalis Superior Longitudinalis Hyoglossus Styloglossus

Key to simulation results

_{Exploring the control of contacts:}

• One degree of freedom corresponding to principal component • Tube model with auditory, proprioceptive and tactile feedback • Muscular redundancy, inertia, elasticity towards neutral

• Intuition for more complex models, while convergence easier

• Coarticulation emerges from effort optimization • Delayed effects of earlier constraints

• (and large difference in optimization algorithms)

• Effort optimization leads to loopy trajectories in formant space

• Though goals are acoustic, intermediate postures (red) seem very similar to final postures from /ə/ muscle A muscle B x 0 0 50 100 150 200 250 300 0 2 4 6 8 10 12 14 16 18 20 Position (mm) Time (ms) ə i k i baseline

first /i/ extended after first /i/ extended before

From /ǝ/ to /i/

From /ǝ/ to /e/

From /ǝ/ to /a/

Control of contacts (VCV):

Vowel sequences (not center-out):

Acoustic variability (linear reduced model):

Trajectory

Muscle activation (A.U.)

Time (ms) 0 05 100 150 200 250 300 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 muscle A muscle B Muscle activation ɔ i e ɐ ə ə → i → ɐ → ɔ → e ə → i → ɐ → ɔ → e ɔ i e ɐ ə

Vocal tract slicing _{to acoustic transfer function (bottom)}From tube areas (top)

Predicted EMG

Trajectory, acoustic _{Tongue surface points} Trajectory, sagittal

Predicted EMG

Trajectory, acoustic

Tongue surface points Trajectory, sagittal

Predicted EMG

Trajectory, acoustic _{Tongue surface points} Trajectory, sagittal

Predicted EMG

Trajectory, acoustic

Tongue surface points Trajectory, sagittal

From /ǝ/ to /ɔ/

96 98 100 102 104 106 108 110 112 Y (mm) 50 60 70 80 90 100 110 X (mm) 0 0.05 0.1 0.15 Time (s) 0 0.05 0.1 0.15 Time (s) 300 400 500 600 700 F1 (Hz) F2 (Hz) 800 1200 1600 2000 2400 1500 2000 2500 3000 3500 F3 (Hz) Time (s) 0 1 2 3 4 5 6 7 8

0 0.05 0.1 0.15 0 .0 0.1 0.15 Time (s) 0 1 2 3 4 5 6 7

50 60 70 80 90 100 110 120 X (mm) 90 88 92 94 96 98 100 102 104 106 108 Y (mm) 0 0.05 0.1 0.15 Time (s) 0 0.05 0.1 0.15 Time (s) 300 400 500 600 700 F1 (Hz) F2 (Hz) 800 1200 1600 2000 2400 1500 2000 2500 3000 3500 F3 (Hz) 300 400 500 600 700 F1 (Hz) F2 (Hz) 800 1200 1600 2000 2400 1500 2000 2500 3000 3500 F3 (Hz) 0 0.05 .1 0.15 0.2 Time (s) 50 60 70 80 90 100 110 X (mm) 92 94 96 98 100 102 104 106 108 Y (mm) 0 0.05 .1 0.15 0.2 Time (s) 0 1 2 3 4 5 6 7 8 9 10

0 0.04 0.08 0.12 Time (s) 0.16 0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

300 400 500 600 700 F1 (Hz) F2 (Hz) 800 1200 1600 2000 2400 1500 2000 2500 3000 3500 F3 (Hz) Time (s) 0 0.05 0.1 0.15 50 60 70 80 90 100 110 X (mm) 98 96 100 102 104 106 108 110 Y (mm) 0 0.05 0.1 0.15 Time (s) 0 0.05 0.1 0.15 Time (s) 800 1000 1200 1400 1600 1800 2000 2200 2400 300 350 400 450 500 550 600 650 700 F1 (Hz) 800 1000 1200 1400 1600 1800 2000 2200 2400 F2 (Hz) F2 (Hz) 1500 2000 2500 3000 3500 F3 (Hz) loop loop acoustic loop