• Aucun résultat trouvé

Scale and shift invariant time/frequency representation using auditory statistics: application to rhythm description

N/A
N/A
Protected

Academic year: 2021

Partager "Scale and shift invariant time/frequency representation using auditory statistics: application to rhythm description"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01368888

https://hal.archives-ouvertes.fr/hal-01368888

Submitted on 20 Sep 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Scale and shift invariant time/frequency representation using auditory statistics: application to rhythm

description

Ugo Marchand, Geoffroy Peeters

To cite this version:

Ugo Marchand, Geoffroy Peeters. Scale and shift invariant time/frequency representation using audi- tory statistics: application to rhythm description . IEEE International Workshop on Machine Learning for Signal Processing, Sep 2016, Vietri Sul Mare, Italy. 2016. �hal-01368888�

(2)

Rhythm description

Propositions:

- Two new 2D (time/frequency) representations of the audio content:

2DMSS and MASSS - New Dataset

Applications:

- use this representation to do auto-tagging, search by similarity Objectives:

- creating a representation of the audio signal that differentiate musical rhythms

Constraints:

- creating a representation which is invariant to tempo and to temporal-shifts

Outline

Tempo independence Shift-

invariance Auditory

frequency bands

Attack enhancement

...

...

...

...

...

Audio ... Rhythm

descriptor (MSS) Frame

analysis

Auto-

correlation Scale Transform Gammatone

Filters Onset

Frame analysis

Auto-

correlation Scale Transform

Frame analysis

Auto-

correlation Scale Transform Frame

analysis

Auto-

correlation Scale Transform Gammatone

Filters Onset

Onset

Onset Gammatone

Filters Gammatone

Filters

Band grouping

Band grouping Band grouping Band grouping

= 32 bands sr = 22 Hz

b = 4

c = 100 scale coefs

O(t,

i

) O(t, b

i

) O

u

(t, b

i

) Rxx

u

(⌧ , b

i

) M SS

u

(c, b

i

)

M SS (c, b

i

)

Cross-correlation coefficients (ccc)

MASSS = Modulation Scale Spectrum (MSS) + Auditory Statistics (ccc) [McDermott, 2013]

Auditory Statistics (ccc) = Cross-correlations between different auditory frequency bands

Late-fusion of MSS and ccc

Can differenciate:

Method MASSS

Tempo independence Shift-

invariance Auditory

frequency bands

Attack enhancement

...

...

...

...

...

Audio ... Rhythm

descriptor (2DMSS) Gammatone

Filters Onset

Gammatone

Filters Onset

Onset

Onset Gammatone

Filters Gammatone

Filters

2D Fourier Transform

2D Scale Transform

= 64 bands sr = 13 Hz

n

f f t,t

= 512 n

f f t,!

= 64

n

sc,t

= 4096 n

f f t,!

= 256 O(t,

i

)

Frame analysis

Frame analysis

Frame analysis Frame analysis

O

u

(t,

i

)

F

u

(!

t

, ! ) S

u

(c

t

, c )

S(c

t

, c )

Pros: models the relationship between frequency bands and time bins with shift-invariance and scale-invariance

Cons: produces also shift-invariance over frequencies which is an undesired property

S (c t , c ! ) =

Z 1

1

Z 1

1

⇣ X (e t , e ! )e ! /2 e t/2

e jc ! ! jc t t d! dt 2DMSS= 2D Fourier Transform, followed by a 2D Scale Transform known as Fourier-Mellin Transform in Image processing

2D Scale Transform:

But not:

Can differenciate:

Method 2DMSS

Pros/Cons

Ugo MARCHAND, Geoffroy PEETERS

Scale and Shift Invariant Time/Frequency Representation

Using Auditory Statistics: Application to Rhythm Description.

Acknowledement: this work was funded by the French PIA Bee Music project and the H2020-ICT-2015 ABC DJ project (688122).

Results

Classification

- SVM (MSS, 2DMSS, ccc models) - Logistic Regression (late-fusion) Analysis of results:

- 2DMSS is not sufficient - MASSS

-- improves state-of-the-art method by 3% on Ballroom -- equals state-of-the-art on Cretan dances dataset.

Conclusion:

- modeling frequency bands inter- relationship through auditory

statistics improves rhythm description

New dataset : Extended Ballroom

Ballroom:

- 698 audio tracks - 30sec low quality - 8 rhythm classes

Extended Ballroom:

- 4.180 audio tracks

- 30sec high-quality

- 9+4 rhythm classes

- similarity annotations

Références

Documents relatifs

defects is difficult to visible on the stator phase currents, this observation emphasizes the interest of the inspection of the raw phase current signals based on

Because previous psychoacoustical studies mostly focused either on temporal or on spectral masking and used stimuli with tem- porally and/or spectrally broad supports, the spread of

Figure 3 : Mean error (a) and variance (b) of the burst width using the spectrogram and the wavelet estimators The wavelet estimator is evaluated on an LDA signal with a Gaussian

reconnait certaines propriétés géométriques qu'elle a étudiées en classe mais oubliées.. Figure 1 figure3 F N c

Chinese factory workers were important in California especially during the Civil War (Chinese provided a quarter of California’s labour force). They worked in different

In order to reduce this dimensionality, we could group the values of the various scale-bands c (as McKinney and Whit- man did for the FFT frequencies). Another possibility starts

Abstract — In a previous work, we have defined the scale shift for a discrete-time signal and introduced a family of linear scale-invariant systems in connection with

 Pour la structure à base isolée par les isolateurs de type LRB, La réduction du déplacement est de l’ordre de 20% du déplacement absolu de la structure à base fixe, par