• Aucun résultat trouvé

The PEASS Toolkit - Perceptual Evaluation methods for Audio Source Separation

N/A
N/A
Protected

Academic year: 2021

Partager "The PEASS Toolkit - Perceptual Evaluation methods for Audio Source Separation"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: inria-00545477

https://hal.inria.fr/inria-00545477

Submitted on 7 Dec 2011

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

The PEASS Toolkit - Perceptual Evaluation methods for Audio Source Separation

Valentin Emiya, Emmanuel Vincent, Niklas Harlander, Volker Hohmann

To cite this version:

Valentin Emiya, Emmanuel Vincent, Niklas Harlander, Volker Hohmann. The PEASS Toolkit - Perceptual Evaluation methods for Audio Source Separation. 9th Int. Conf. on Latent Variable Analysis and Signal Separation, Sep 2010, Saint-Malo, France. 2010. �inria-00545477�

(2)

9th Int. Conf. on Latent Variable Analysis and Signal Separation, September 27-30, 2010, St. Malo, France

[3] E. Vincent, R. Gribonval, C. Févotte, Performance measurement in blind audio source separation, IEEE Trans. on Acoustics, Speech and Signal Proc., 14 (4), 2006.

[4] R. Huber, B. Kollmeier, PEMO-Q – A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception, IEEE Trans. on Acoustics, Speech and Signal Proc., 14 (6), 2006.

BSS eval vs. PEASS: scatter plots of the SiSEC 2010 results for the set of Professionally produced music recordings.

0 50 100

−5 0 5 10 15

Accuracy = 0.23; Monotonicity = 0.21;

OPS (0−100)

SDR (dB)

0 50 100

−10 0 10 20 30

Accuracy = 0.29; Monotonicity = 0.31;

TPS (0−100)

ISR (dB)

0 50 100

−10 0 10 20 30

Accuracy = 0.64; Monotonicity = 0.64;

IPS (0−100)

SIR (dB)

0 50 100

−30

−20

−10 0 10 20

Accuracy = 0.16; Monotonicity = 0.34;

APS (0−100)

SAR (dB)

Ex.1

Ex.2 Ex.2 Ex.3

Ex.4 Ex.5 Ex.1

DEMO SiSEC 2010

Prediction results (cross-validation on the PEASS database) for the 4 tasks: curves are various combinations of the old/new decompositions with the energy ratio/PEMO-Q measures, as a function of the number of features.

1 2 3 4 0

0.5 1

Accuracy

Task 2

1 2 3 4 0

0.5 1

Monotonicity

1 2 3 4 0

0.5 1

Consistency

1 3 4

0 0.5 1

Task 3

1 3 4

0 0.5 1

1 3 4

0 0.5 1

1 2 3 4 0

0.5 1

Task 4

1 2 3 4 0

0.5 1

Feature vector size0 1 2 3 4 0.5

1

Old+SxR New+SxR Old+PSM New+PSM

1 3 4

0 0.2 0.4 0.6 0.8 1

Accuracy

1 3 4

0 0.2 0.4 0.6 0.8 1

Feature vector size Monotonicity

1 3 4

0 0.2 0.4 0.6 0.8 1

Consistency

Task1:predictionofglobalquality

EVALUATION RESULTS: prediction performance and evaluation at SiSEC 2010

A better distortion decomposition is achieved by:

splitting the signals into subbands using gammatone filters;

segmenting each subband signal into overlapping frames;

decomposing each frame into distortion components using a matched FIR filter

reconstructing the full distortion components

Some auditory-motivated features are derived using PEMO-Q/PSM [4]:

qjoverall , PSM(bsj, sj) (2)

qjtarget , PSM(bsj, bsj etarget

j ) (3)

qjinterf , PSM(bsj, bsj einterf

j ) (4)

qjartif , PSM(bsj, bsj eartif

j ) (5)

By combining the 4 features in a non-linear way to predict subjective scores (T1) (T4), a set of objective measures is finally output:

OPS: the Overall Perceptual Score,

TPS: the Target-related Perceptual Score,

IPS: the Interference-related Perceptual Score,

APS: the Artifacts-related Perceptual Score.

PROPOSED OBJECTIVE MEASURES: what’s new?

Proposed multi-criteria listening test protocol

A series of 4 MUSHRA tests including several dedicated anchors:

(T1) Rate the global quality compared to the reference.

(T2) Rate the quality in terms of preservation of the target source.

(T3) Rate the quality in terms of suppression of other sources.

(T4) Rate the quality in terms of absence of additional artificial noise.

Existing model for distortion decomposition [3]:

sbj(t) sj(t) = etargetj (t) + einterfj (t) + eartifj (t) (1)

etargetj denotes the error component related to the target distortion,

einterfj denotes the interference from concurrent sources,

eartifj is the remaining distortion component (artifacts and noise).

Defining and estimating the distortion components etargetj , einterfj , eartifj is not trivial. Due to the allowed distortions in use today (time-invariant spatial and filtering distortions), the decomposition is not satisfying.

Existing quality measures: energy ratios SDR, ISR, SIR, SAR are poorly correlated with subjective scores.

MOTIVATION: the need for a multi-criteria perceptually- based evaluation

The PEASS Toolkit is freely available at

http://bass-db.gforge.inria.fr/peass/

[1] Subjective and objective quality assessment of audio source separation, V. Emiya, E. Vincent, N. Harlander, V. Hohmann, IEEE Trans. on Audio, Speech and Language Processing, submitted, 2010.

[2] Multi-criteria subjective and objective evaluation of audio source separation, V. Emiya, E. Vincent, N. Harlander, V. Hohmann, AES 38th Int. Conf. on Sound Quality Evaluation, Pitea, Sweden, June 2010.

A toolkit for the perceptual evaluation of audio source separation

The PEASS Software: a set of objective measures to predict the perceptual quality of the source/image estimates

The PEASS Listening Test GUI: a Matlab MUSHRA GUI realized for the proposed test protocol

The PEASS Subjective Database: a set of subjective measures resulting from listening tests (20 subjects × 80 sounds × 4 rating criteria)

s1

s2

sb2

sb3

sb1

s3

THE PEASS TOOLKIT: overview

The PEASS Toolkit - Perceptual Evaluation methods for Audio Source Separation

Valentin EMIYA, Emmanuel VINCENTMETISS, INRIA Rennes - Bretagne Atlantique (France) Niklas HARLANDER, Volker HOHMANNCarl von Ossietzky Universität Oldenburg (Germany)

Références

Documents relatifs

Performance of joint declipping and source separation (”Joint”), of sequential declipping and source separation (”Sequential”) and of source separation only using clipped

This chapter presents a multichannel audio source separation framework where deep neural networks (DNNs) are used to model the source spectra and combined with the

Therefore, before making the mix, each source signal is labeled into non-silent / silent sections, and for each MDCT frame (i.e.. Girin et al. Informed Audio Source Separation

Duong, Emmanuel Vincent, and R´emi Gribonval, “Under-determined reverberant audio source separation using a full-rank spatial covariance model,” IEEE Transactions on Audio, Speech,

In this paper we introduce a general modular audio source separation frame- work based on a library of flexible source models that enable the in- corporation of prior knowledge

We derive two conjugate gradient algorithms for the computation of the filter coefficients and show improved audio source separation performance compared to the classical Wiener

Gribonval, “Adaptation of Bayesian models for single channel source separation and its application to voice / music separation in popular songs,” IEEE Transactions on Audio, Speech

David, “Phase-dependent anisotropic gaussian model for audio source separation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), (New